Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with keep/drop

    Dear Statalisers

    My dataset is longitudinal following people from 2008 to 2019. I would like to specifically follow those who were aged between 25 and 34 in 2008 over time as they age, but I'm having trouble with the keep/drop commands.

    I have tried things like:

    drop if (hgage <35 & year==2008) | (hgage >20 & year==2008)

    I can't quite get it to work as whatever I try either includes people outside of this age range or I lose observations from 2009 onwards.

    Any help is appreciated

    best
    Brendan

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year long pid int hgage
    2008 100003 55
    2009 100003 56
    2010 100003 57
    2011 100003 58
    2012 100003 59
    2013 100003 60
    2014 100003 61
    2015 100003 62
    2016 100003 63
    2017 100003 64
    2018 100003 65
    2019 100003 66
    2008 100005 23
    2009 100005 24
    2010 100005 25
    2011 100005 26
    2012 100005 27
    2013 100005 28
    2014 100005 29
    2015 100005 30
    2016 100005 31
    2017 100005 32
    2018 100005 33
    2019 100005 34
    2008 100006 21
    2009 100006 22
    2010 100006 23
    2011 100006 24
    2012 100006 25
    2013 100006 26
    2014 100006 27
    2015 100006 28
    2016 100006 29
    2017 100006 30
    2018 100006 31
    2019 100006 32
    2008 100007 16
    2009 100007 17
    2010 100007 18
    2011 100007 19
    2012 100007 20
    2008 100010 43
    2009 100010 44
    2010 100010 45
    2011 100010 46
    2012 100010 47
    2013 100010 48
    2014 100010 49
    2015 100010 50
    2016 100010 51
    2017 100010 52
    2018 100010 53
    2019 100010 54
    2008 100011 38
    2009 100011 39
    2008 100014 54
    2009 100014 55
    2010 100014 56
    2011 100014 57
    2012 100014 58
    2013 100014 59
    2014 100014 60
    2015 100014 61
    2016 100014 62
    2018 100014 64
    2019 100014 65
    2008 100015 51
    2009 100015 52
    2010 100015 53
    2011 100015 54
    2012 100015 55
    2013 100015 56
    2014 100015 57
    2015 100015 58
    2016 100015 59
    2017 100015 60
    2018 100015 61
    2019 100015 62
    2008 100016 26
    2009 100016 27
    2010 100016 28
    2011 100016 29
    2012 100016 30
    2013 100016 31
    2014 100016 32
    2015 100016 33
    2016 100016 34
    2017 100016 35
    2018 100016 36
    2019 100016 37
    2008 100018 47
    2009 100018 48
    2010 100018 49
    2011 100018 50
    2012 100018 51
    2013 100018 52
    2014 100018 53
    2015 100018 54
    2016 100018 55
    2017 100018 56
    end
    label values hgage VHGAGE



  • #2
    Brendan:
    the best way (for me, at least) to handle this issue is to get a precise idea of how many observations satisfy the -if- condition:
    Code:
    . tab hgage if year==2008
    
          hgage |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             16 |          1       10.00       10.00
             21 |          1       10.00       20.00
             23 |          1       10.00       30.00
             26 |          1       10.00       40.00
             38 |          1       10.00       50.00
             43 |          1       10.00       60.00
             47 |          1       10.00       70.00
             51 |          1       10.00       80.00
             54 |          1       10.00       90.00
             55 |          1       10.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00
    In 2008, only one observation in your data excerpt fits the -if- condition.
    Therefore, instead of -drop-ping variables, you may want to build an indicator to be used as an -if- condition in subsequent analyses:
    Code:
    . gen wanted=1 if hgage<35 & hgage>24 & year==2008
    
    . list if wanted==1
    
         +--------------------------------+
         | year      pid   hgage   wanted |
         |--------------------------------|
     79. | 2008   100016      26        1 |
         +--------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo for the solution. Unfortunately when I do this and then i do keep if wanted==1 then I only have observations for 2008. I want 25 to 34 year olds in 2008 and then to follow that cohort over ten years.

      Comment


      • #4
        The crunch here is that your syntax is about selecting individual observations, but you want to select people. That makes the problem a two-step problem. One way to see that you're missing part of the problem is that the syntax in #1 makes no use of the identifier pid.

        Here are two ways to do it. I have to save the example data so that I can show that they give the same result, but you only need one method.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int year long pid int hgage
        2008 100003 55
        2009 100003 56
        2010 100003 57
        2011 100003 58
        2012 100003 59
        2013 100003 60
        2014 100003 61
        2015 100003 62
        2016 100003 63
        2017 100003 64
        2018 100003 65
        2019 100003 66
        2008 100005 23
        2009 100005 24
        2010 100005 25
        2011 100005 26
        2012 100005 27
        2013 100005 28
        2014 100005 29
        2015 100005 30
        2016 100005 31
        2017 100005 32
        2018 100005 33
        2019 100005 34
        2008 100006 21
        2009 100006 22
        2010 100006 23
        2011 100006 24
        2012 100006 25
        2013 100006 26
        2014 100006 27
        2015 100006 28
        2016 100006 29
        2017 100006 30
        2018 100006 31
        2019 100006 32
        2008 100007 16
        2009 100007 17
        2010 100007 18
        2011 100007 19
        2012 100007 20
        2008 100010 43
        2009 100010 44
        2010 100010 45
        2011 100010 46
        2012 100010 47
        2013 100010 48
        2014 100010 49
        2015 100010 50
        2016 100010 51
        2017 100010 52
        2018 100010 53
        2019 100010 54
        2008 100011 38
        2009 100011 39
        2008 100014 54
        2009 100014 55
        2010 100014 56
        2011 100014 57
        2012 100014 58
        2013 100014 59
        2014 100014 60
        2015 100014 61
        2016 100014 62
        2018 100014 64
        2019 100014 65
        2008 100015 51
        2009 100015 52
        2010 100015 53
        2011 100015 54
        2012 100015 55
        2013 100015 56
        2014 100015 57
        2015 100015 58
        2016 100015 59
        2017 100015 60
        2018 100015 61
        2019 100015 62
        2008 100016 26
        2009 100016 27
        2010 100016 28
        2011 100016 29
        2012 100016 30
        2013 100016 31
        2014 100016 32
        2015 100016 33
        2016 100016 34
        2017 100016 35
        2018 100016 36
        2019 100016 37
        2008 100018 47
        2009 100018 48
        2010 100018 49
        2011 100018 50
        2012 100018 51
        2013 100018 52
        2014 100018 53
        2015 100018 54
        2016 100018 55
        2017 100018 56
        end
        label values hgage VHGAGE
        
        save safecopy
        
        gen tag = year == 2008 & inrange(hgage, 25, 34)
        
        bysort pid (tag) : keep if tag[_N]
        
        list, sepby(pid)
        
        use safecopy, clear
        
        egen tag = max(year == 2008 & inrange(hgage, 25, 34)), by(pid)
        
        keep if tag
        
        list, sepby(pid)
        Code:
        . * Example generated by -dataex-. For more info, type help dataex
        . clear
        
        . input int year long pid int hgage
        
                 year           pid     hgage
          1. 2008 100003 55
          2. 2009 100003 56
          3. 2010 100003 57
          4. 2011 100003 58
          5. 2012 100003 59
          6. 2013 100003 60
          7. 2014 100003 61
          8. 2015 100003 62
          9. 2016 100003 63
         10. 2017 100003 64
         11. 2018 100003 65
         12. 2019 100003 66
         13. 2008 100005 23
         14. 2009 100005 24
         15. 2010 100005 25
         16. 2011 100005 26
         17. 2012 100005 27
         18. 2013 100005 28
         19. 2014 100005 29
         20. 2015 100005 30
         21. 2016 100005 31
         22. 2017 100005 32
         23. 2018 100005 33
         24. 2019 100005 34
         25. 2008 100006 21
         26. 2009 100006 22
         27. 2010 100006 23
         28. 2011 100006 24
         29. 2012 100006 25
         30. 2013 100006 26
         31. 2014 100006 27
         32. 2015 100006 28
         33. 2016 100006 29
         34. 2017 100006 30
         35. 2018 100006 31
         36. 2019 100006 32
         37. 2008 100007 16
         38. 2009 100007 17
         39. 2010 100007 18
         40. 2011 100007 19
         41. 2012 100007 20
         42. 2008 100010 43
         43. 2009 100010 44
         44. 2010 100010 45
         45. 2011 100010 46
         46. 2012 100010 47
         47. 2013 100010 48
         48. 2014 100010 49
         49. 2015 100010 50
         50. 2016 100010 51
         51. 2017 100010 52
         52. 2018 100010 53
         53. 2019 100010 54
         54. 2008 100011 38
         55. 2009 100011 39
         56. 2008 100014 54
         57. 2009 100014 55
         58. 2010 100014 56
         59. 2011 100014 57
         60. 2012 100014 58
         61. 2013 100014 59
         62. 2014 100014 60
         63. 2015 100014 61
         64. 2016 100014 62
         65. 2018 100014 64
         66. 2019 100014 65
         67. 2008 100015 51
         68. 2009 100015 52
         69. 2010 100015 53
         70. 2011 100015 54
         71. 2012 100015 55
         72. 2013 100015 56
         73. 2014 100015 57
         74. 2015 100015 58
         75. 2016 100015 59
         76. 2017 100015 60
         77. 2018 100015 61
         78. 2019 100015 62
         79. 2008 100016 26
         80. 2009 100016 27
         81. 2010 100016 28
         82. 2011 100016 29
         83. 2012 100016 30
         84. 2013 100016 31
         85. 2014 100016 32
         86. 2015 100016 33
         87. 2016 100016 34
         88. 2017 100016 35
         89. 2018 100016 36
         90. 2019 100016 37
         91. 2008 100018 47
         92. 2009 100018 48
         93. 2010 100018 49
         94. 2011 100018 50
         95. 2012 100018 51
         96. 2013 100018 52
         97. 2014 100018 53
         98. 2015 100018 54
         99. 2016 100018 55
        100. 2017 100018 56
        101. end
        
        . label values hgage VHGAGE
        
        .
        . save safecopy
        .
        . gen tag = year == 2008 & inrange(hgage, 25, 34)
        
        .
        . bysort pid (tag) : keep if tag[_N]
        (88 observations deleted)
        
        .
        . list, sepby(pid)
        
             +-----------------------------+
             | year      pid   hgage   tag |
             |-----------------------------|
          1. | 2013   100016      31     0 |
          2. | 2009   100016      27     0 |
          3. | 2010   100016      28     0 |
          4. | 2015   100016      33     0 |
          5. | 2014   100016      32     0 |
          6. | 2019   100016      37     0 |
          7. | 2018   100016      36     0 |
          8. | 2012   100016      30     0 |
          9. | 2016   100016      34     0 |
         10. | 2017   100016      35     0 |
         11. | 2011   100016      29     0 |
         12. | 2008   100016      26     1 |
             +-----------------------------+
        
        .
        . use safecopy, clear
        
        .
        . egen tag = max(year == 2008 & inrange(hgage, 25, 34)), by(pid)
        
        .
        . keep if tag
        (88 observations deleted)
        
        .
        . list, sepby(pid)
        
             +-----------------------------+
             | year      pid   hgage   tag |
             |-----------------------------|
          1. | 2008   100016      26     1 |
          2. | 2009   100016      27     1 |
          3. | 2010   100016      28     1 |
          4. | 2011   100016      29     1 |
          5. | 2012   100016      30     1 |
          6. | 2013   100016      31     1 |
          7. | 2014   100016      32     1 |
          8. | 2015   100016      33     1 |
          9. | 2016   100016      34     1 |
         10. | 2017   100016      35     1 |
         11. | 2018   100016      36     1 |
         12. | 2019   100016      37     1 |
             +-----------------------------+
        For more on the technique see e.g. https://www.stata.com/support/faqs/d...ble-recording/

        https://journals.sagepub.com/doi/pdf...867X1101100210
        Last edited by Nick Cox; 14 May 2024, 03:55.

        Comment

        Working...
        X