Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Writing loop to identify individuals with consecutive positive values (n, n+1, n+2, ...)

    Hi,

    First time Statalist poster, long time follower. Thanks for this great resource.

    I am working with wage data that has 24 time periods. The variables twcwage1, twcwage2, twcwage3,.... twcwage24 represent the time periods and the values are wages that individuals earned. Below is a sample of my data. The variable id is a unique individual identifier.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id twcwage1 twcwage2 twcwage3 twcwage4 twcwage5 twcwage6 twcwage7 twcwage8 twcwage9 twcwage10 twcwage11 twcwage12 twcwage13 twcwage14 twcwage15 twcwage16 twcwage17 twcwage18 twcwage19 twcwage20 twcwage21 twcwage22 twcwage23 twcwage24)
     1 14424.97  5454.47  2172.38  7475.95  3623.67  8266.98 11483.35 13923.77        0  4101.16 10501.73  6994.87 14258.09  7892.02 14677.78 13268.03  2151.58  1632.03 10939.14  8044.03 14966.76 12301.58  6528.78  4526.62
     2  4853.76  11903.3 12568.52  8264.54  2947.44 12073.08   6502.1 14233.04  2129.83  8858.49 14747.98  7006.83  6516.13  9847.42 13853.07  8457.29  9351.71  2314.42  4455.06 12297.85  7553.74 12679.37 14903.33  2053.81
     3  14598.3  5555.92 12205.94     9665        0  6742.81  1823.49        0  8389.34        0  5454.41  8668.01  6547.18 14457.23  12463.6  6413.48  6634.47 13204.18 12011.79  3101.11  6495.12  4445.22  8546.96  4130.96
     4  1932.64  2813.21  3926.87  4650.35  7102.57  4024.89  3343.85  2998.88  3575.33  14957.5 11875.44 10075.93 13262.63  4582.33 14298.61  6373.88  6675.03  5901.78  3962.03 10664.47 12129.49 12443.99  4218.34  8501.03
     5 13524.21   2452.5        0  9328.19  6095.36  6538.84 14607.75 10607.04 10080.55 14981.65  1617.54   9529.7  8482.97 13861.68  8083.54  6739.79 12853.41   2370.7  9294.33  2769.99 13772.91  6044.32  7015.51 13075.92
     6  7441.03 13554.42  9841.29 13084.98  6017.36  7191.84  2678.67        0 11430.44 10894.58  6929.55 10585.47  5021.61 10027.56 11200.41  4117.87 11682.88  13141.5 11617.82  4261.87 10287.36 14587.11 12735.18  9065.99
     7 14334.46 14203.28 13721.12 11842.04 10169.21  5357.79        0 13079.71 13700.71 12950.95 11169.27  3369.16  2625.83 11944.33        0  3927.54  1946.54  9544.32  7571.48  5597.54  9915.25 10330.47    10627 12150.96
     8 10374.71  3817.62 11306.05  11455.5  3296.03  9739.45        0  4519.59  7800.11 10514.02        0 11850.34  2562.66  4347.68  5415.03  3034.74  6535.96   7395.3        0  2512.28  5778.15 13878.02 11935.85  8307.78
     9 12803.48 13301.84   2004.6  5114.32 13063.29  9699.24  4678.55 14494.23  5102.92  8471.06   7154.6 14570.28        0  7489.07  6247.36  2818.76  6716.95 14427.57  9216.64        0  1626.04 14307.02  5364.81 11395.73
    10 13638.55  13023.8  4481.94 13272.87        0  5672.35  6235.91  5701.39  6264.48 12928.25 10047.37 10933.26 10074.58  7893.65 12362.12  7390.65  8461.34 13726.15   7249.6  7075.94  9896.47 11279.99    10610  8096.06
    11 14657.92        0  4781.83 14775.16 12370.37        0  6022.28  2563.49  8435.25  9696.16  9035.82   3774.2  2200.07  2335.71  5071.95  4653.26  5219.47  10548.7  13572.9  2468.76 11361.58 14340.73  13982.3  5194.59
    12  2758.25        0  6773.04  3081.94        0  2746.19 13877.89  12829.2 12676.14   2211.1  2193.75  9299.16  4549.23  9607.71  1542.08 14655.06  2232.72 13275.68        0  7044.37 12671.71  4263.46  8925.02  8981.86
    13  7792.97   7528.4  6251.25  9333.24  9193.49        0  8193.82 13986.84  2028.61 12245.43 14939.55  9983.74  4280.02  3079.87 11707.39  2919.55        0        0  2319.09  6888.25        0  5582.77 13334.77 12475.53
    14  6796.29 10506.12 11018.06  8860.01  2954.22  6893.12 13298.65        0  7218.22  2293.77  2592.77  1593.46  6399.95 14825.68  3853.68  7819.25        0  5574.17        0  1624.19  6440.36  5657.44  7928.91 11223.04
    15   8689.6        0  8692.99 10680.21  2880.66 11664.42        0 11123.44  11722.5   3595.7 14557.11  2296.03  4935.62 10692.08        0 14927.27 11363.96  3271.04  5536.67 10847.73   7327.8  7010.88 12786.93 11230.23
    16  9303.43  9137.99 11439.58   3825.3  7624.42        0  11070.5 10673.43  9135.99  7739.39  2557.49 13181.07  2645.68  6786.95  4265.29  5635.35  4368.09  13144.7  3862.68  4859.06 14449.64 11696.09  5595.21  10853.6
    17  2155.19  5470.53  2517.65 11666.24  4765.71 12700.25  5033.77  6397.98 10113.89   8720.9  12151.2  8068.58  4966.89 14398.62 11645.72 12367.92  9802.44  5480.99  5311.36   7278.5 14804.43   8072.8  3511.41 12752.78
    18  3886.64        0   5645.1 10753.37  7095.96        0  4477.53 13155.43  9014.54  8202.76  2598.47 14735.47   6442.4 10309.98 11826.95  6354.62  3673.36  4460.37 14855.72  11778.3        0 10759.08  6896.12  8234.72
    19  12852.5 10789.16        0  2773.76  1870.67        0   1571.7  7922.01  4682.85 14588.06 12053.67  8589.41 10001.18  3071.72  7899.94 10771.72  1661.51  2132.91 14219.73  1832.53  1595.01  3912.33  4360.58  5302.18
    20        0   6876.8        0  8361.33  1892.24  5091.36  6995.07 12046.62  6783.31  2418.21 11333.68  2302.99 12451.49 13446.51  4863.94  6320.14  7015.25  7280.54  6557.88  7571.45  13381.5 14725.07  6633.58  1871.72
    end

    I am trying to identify individuals who had positive wages for at least 12 consecutive time periods. For example, an individual who had positive wages from twcwage1 to twcwage12 has positive wages for 12 consecutive time periods. Likewise, someone who with positive wages from twcwage3 to twcwage14 also has positive wages for 12 consecutive time periods. Below is the code that I manually wrote to estimate the different possible scenarios.

    Code:
    *
    generate time_period0 = 1 if twcwage1>0 & twcwage2>0 & twcwage3>0 & twcwage4>0 & twcwage5>0 & twcwage6>0 & twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0
     
    generate time_period1 =1 if twcwage2>0 & twcwage3>0 & twcwage4>0 & twcwage5>0 & twcwage6>0 & twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0
     
    generate time_period2 =1 if twcwage3>0 & twcwage4>0 & twcwage5>0 & twcwage6>0 & twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0
     
    generate time_period3 =1 if twcwage4>0 & twcwage5>0 & twcwage6>0 & twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0
     
     
    generate time_period4 =1 if twcwage5>0 & twcwage6>0 & twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0
     
    generate time_period5 =1 if twcwage6>0 & twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0
     
    generate time_period6 =1 if twcwage7>0 & twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0
     
    generate time_period7 =1 if twcwage8>0 & twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0 & twcwage19>0
     
     
    generate time_period8 =1 if twcwage9>0 & twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0 & twcwage19>0 & twcwage20>0
     
    generate time_period9 =1 if twcwage10>0 & twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0 & twcwage19>0 & twcwage20>0 & twcwage21>0
     
    generate time_period10 =1 if twcwage11>0 & twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0 & twcwage19>0 & twcwage20>0 & twcwage21>0 & twcwage22>0
     
    generate time_period11 =1 if twcwage12>0 & twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0 & twcwage19>0 & twcwage20>0 & twcwage21>0 & twcwage22>0 & twcwage23>0
     
     
    generate time_period12 =1 if twcwage13>0 & twcwage14>0 & twcwage15>0 & twcwage16>0 & twcwage17>0 & twcwage18>0 & twcwage19>0 & twcwage20>0 & twcwage21>0 & twcwage22>0 & twcwage23>0 & twcwage24>0
    I'm trying to learn how to write a -for loop to streamline my code and also to reduce human error on my part. I tried the code below but I had no luck.


    Code:
    foreach qtr in 0/12 {
        local m1 =`qtr'+1
            local m2 =`qtr'+2
            local m3 =`qtr'+3
            local m4 =`qtr'+4
            local m5 =`qtr'+5
            local m6 =`qtr'+6
            local m7 =`qtr'+7
            local m8 =`qtr'+8
            local m9 =`qtr'+9
            local m10 =`qtr'+10
            local m11 =`qtr'+11
            local m12 =`qtr'+12
            generate start`qtr'=1 if twcwage`m1'>0 & twcwage`m2'>0 & twcwage`m3'>0 & twcwage`m4'>0& twcwage`m5'>0 & twcwage`m6'>0 & twcwage`m7'>0 & twcwage`m8'>0 & twcwage`m9'>0 & twcwage`m10'>0 &twcwage`m11'>0 & twcwage`m12'>0
    }
    If anyone has a suggestion to correct my mistake, that would be great. If you could explain your thought process, I would really appreciate it.

  • #2
    I wouldn't approach the problem that way. If we reshape long we have a standard panel structure and then can just look for spells of positive values. Here I use tsspell from SSC.


    Code:
    . reshape long twcwage, i(id) j(time)
    (j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24)
    
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations               20   ->   480         
    Number of variables                  25   ->   3           
    j variable (24 values)                    ->   time
    xij variables:
            twcwage1 twcwage2 ... twcwage24   ->   twcwage
    -----------------------------------------------------------------------------
    
    . su twcwage
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
         twcwage |        480    7600.333    4413.416          0   14981.65
    
    . tsset id time
    
    Panel variable: id (strongly balanced)
     Time variable: time, 1 to 24
             Delta: 1 unit
    
    . tsspell , pcond(twcwage)
    
    . tabstat _seq, by(id) stat(max)
    
    Summary for variables: _seq
    Group variable: id 
    
          id |       Max
    ---------+----------
           1 |        15
           2 |        24
           3 |        14
           4 |        24
           5 |        21
           6 |        16
           7 |         9
           8 |         7
           9 |        12
          10 |        19
          11 |        18
          12 |        13
          13 |        10
          14 |         8
          15 |         9
          16 |        18
          17 |        24
          18 |        14
          19 |        18
          20 |        21
    ---------+----------
       Total |        24
    --------------------
    Code:
    
    


    Here
    _seq is a counter within spells of positive wage, so its maximum is the length of the longest spell. To identify individuals with values at least 12, you could go


    Code:
    egen wanted = max(_seq), by(id)
    and then
    wanted >= 12 is your criterion.

    Hundreds of posts here spell out the advice that a problem will be easier after
    reshape long.

    Comment


    • #3
      Hi Nick. Thank you so much for your help. I love your idea of having the data in long format. I am using a computer that doesn't allow me to download tsspell from SSC. Do you have any suggestions for implementing your strategy without tsspell? Thanks.

      Comment


      • #4
        See https://www.stata-journal.com/articl...article=dm0029 for the principles. Here's an approach without tsspell


        Code:
        reshape long twcwage , i(id) j(time)
        
        tsset id time 
        
        gen work = twcwage > 0 & (L.twcwage <= 0 | L.time == .)  
        
        bysort id : gen _spell = sum(work)
        
        bysort id _spell (time) : gen _seq = 1 if _n == 1 
        
        by id _spell : replace _seq = _seq[_n-1] + 1 if twcwage > 0 & _n > 1 
        
        replace _spell = 0 if twcwage <= 0 
        
        replace _seq = 0 if twcwage <= 0 
        
        egen wanted = max(_seq), by(id)
        
        tabdisp id, c(wanted)

        Comment


        • #5
          Thank you so much. I really appreciate that you took the time to share your code. This works perfectly. Thanks for introducing me to L.variable option. I am still new to times series analysis and you simultaneously solved my problem and taught me something new. Thank you also for the Stata Journal resource. I will look at it. I would never have even used the term "spells" and so I appreciate you helping me use an appropriate term.

          Comment


          • #6
            Many people say “runs”. Also epochs, etc. I suppose my use of spells owes much to meteorology although I think it’s common to talk of spells of unemployment, etc,

            Comment

            Working...
            X