Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep households selectively

    Dear Statalist,

    I would like to get some help with keeping household datasets selectively.

    a_hidp is Household unique number, pidp is unique personal number, a_dvage is age and a_sex is sex.

    I want to keep households that contain teenagers aged 16~18 only.
    So, for example I want to keep both row 3 and 4 (a_hidp: 68006123) as there is a 17 year-old girl in this household and row 18 & 19 (household: 68014283) as there is a 16 year-old girl in their household
    However, I want to drop row 1 (a_hidp: 68001363), row 5~17 and so on.

    I cannot use "drop if a_dvage>=19 | a_dvage<=15" code as it will delete the other individuals living in that house. For example, it will drop row 3 and row 18 as well.
    Do you see what I mean? I want to keep those aged 16~18 + their other family members in the same household.
    In other words, I want to keep families that include 16~18 year-old teenagers.

    Is there any way to do this quickly?
    Click image for larger version

Name:	1.jpg
Views:	1
Size:	143.2 KB
ID:	1455523

    Last edited by sladmin; 08 Apr 2019, 09:12. Reason: anonymize original poster

  • #2
    Please post a sample of your data using the -dataex- command. See 12.2 of the FAQ: https://www.statalist.org/forums/help#stata
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Dear Carole,

      Thank you for your reply

      . dataex a_hidp pidp a_sex a_dvage in 1/20

      ----------------------- copy starting from the next line -----------------------
      [CODE]
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(a_hidp pidp) byte a_sex int a_dvage
      68014283 68014291 2 16
      68028563 68028575 2 17
      68125123 68125131 1 18
      68141443 68141455 2 17
      68155043 68155063 1 16
      68197883 68197891 1 18
      68293083 68293099 1 18
      68336603 68336619 2 17
      68347483 68347503 2 16
      68530403 68530411 2 16
      68872443 68872451 2 16
      68912563 68912575 1 17
      68933643 68933655 2 18
      69049923 69049935 1 18
      69062843 69062859 1 16
      69070323 69070327 2 18
      69070323 69070347 2 16
      69080523 69080535 2 17
      69426643 69426663 2 16
      69440923 69440931 2 18

      Comment


      • #4
        For others who may be tempted, as I was, to respond in the absence of the usable sample data that Carole requested in post #2, let me point out that, as Carole likely saw, a careful look at the screenshot shows that that the variable a_dvage is displayed in blue in the Stata Data Browser window, a giveaway that it is a numeric variable, but the values shown are value labels rather than the actual values of the variable. (Just like a_sex is shown in blue.) So there is no knowing, without further information, whether the value of a_dvage in observation 4 is actually 17, or is some other number with a value label of "17".

        A common source of this problem is when a string variable containing numbers is converted to a numeric variable using encode rather than destring.

        A tip of the hat to Carole for referring Guest back to the Statalist FAQ that the respondent to an earlier post requested Guest to review. Some of us are (or at least, I am) too eager to show off in the absence of a thorough understanding of the problem.

        Guest, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

        Please be sure to use the dataex command to show your example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use dataex.

        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
        Last edited by sladmin; 08 Apr 2019, 09:13. Reason: anonymize original poster

        Comment


        • #5
          Try the following (save data before dropping):

          Code:
          gen teen=1 if inrange(a_dvage, 16, 18)
          bysort a_hidp: egen has_teen=total(teen)
          drop if has_teen==0
          Note that you use <=15 in one part of your description and then 16 in another. Adjust the -inrange()- statement as needed.
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Posts #3 and #4 crossed in cyberspace.

            The data in post #3 is not a complete copy of what dataex provided, and it contains different observations than the screenshot in post #1. For the screenshot, the data appears to have been sorted by a_hidp and pidp, as it should be to ensure that sample data includes full households. The output of help dataex shows that you can use an "in clause" to limit the number of observations shown to less than the default 100, and we don't need 100 observations of sample data.

            But it is crucual that you copy all the material that dataex provides. starting with the line
            Code:
            [CODE]
            through the line
            Code:
            [/CODE]
            Carole's code in post #5 will solve your problem if a_dvage is correctly expressed so that for example a displayed value of "17" represents the numeric value in the data. I am unconvinced that that is the case.
            Last edited by William Lisowski; 28 Jul 2018, 14:56.

            Comment


            • #7
              Dear Carole and William,

              thank you very much for your help and advice.
              I will try to be more careful and productive next time.

              Comment

              Working...
              X