Identifying types of spells within a certain time period

Jack Prado

Join Date: Sep 2021
Posts: 15

Identifying types of spells within a certain time period

08 Nov 2021, 09:46

Hi everyone,

I have wage data for 24 consecutive quarters. If someone has positive wages, then they are employed. If someone has a wage of zero, they are deemed unemployed.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id twcwage1 twcwage2 twcwage3 twcwage4 twcwage5 twcwage6 twcwage7 twcwage8 twcwage9 twcwage10 twcwage11 twcwage12 twcwage13 twcwage14 twcwage15 twcwage16 twcwage17 twcwage18 twcwage19 twcwage20 twcwage21 twcwage22 twcwage23 twcwage24)
 1 14424.97  5454.47  2172.38  7475.95  3623.67  8266.98 11483.35 13923.77        0  4101.16 10501.73  6994.87 14258.09  7892.02 14677.78 13268.03  2151.58  1632.03 10939.14  8044.03 14966.76 12301.58  6528.78  4526.62
 2  4853.76  11903.3 12568.52  8264.54  2947.44 12073.08   6502.1 14233.04  2129.83  8858.49 14747.98  7006.83  6516.13  9847.42         0          0     9351.71  2314.42  4455.06 12297.85  7553.74 12679.37         0           0
 3  14598.3  5555.92 12205.94     9665        0  6742.81  1823.49        0  8389.34        0  5454.41  8668.01  6547.18 14457.23  12463.6  6413.48  6634.47 13204.18 12011.79  3101.11  6495.12  4445.22  8546.96  4130.96
 4  1932.64  2813.21  3926.87  4650.35  7102.57  4024.89  3343.85  2998.88  3575.33  14957.5 11875.44 10075.93  3262.63  4582.33  5298.61  6373.88  7675.03  8901.78  9962.03 10664.47 12129.49 13443.99 14218.34 15501.03
 5 13524.21   2452.5        0  9328.19  6095.36  6538.84 14607.75 10607.04 10080.55 14981.65  1617.54   9529.7  8482.97 13861.68  8083.54  6739.79 12853.41   2370.7  9294.33  2769.99 13772.91  6044.32  7015.51 13075.92
 6  7441.03 13554.42  9841.29 13084.98  6017.36  7191.84  2678.67        0 11430.44 10894.58  6929.55 10585.47  5021.61 10027.56 11200.41  4117.87 11682.88  13141.5 11617.82  4261.87 10287.36 14587.11 12735.18  9065.99
 7 14334.46 14203.28 13721.12 11842.04 10169.21  5357.79        0 13079.71 13700.71 12950.95 11169.27  3369.16  2625.83 11944.33        0  3927.54  1946.54  9544.32  7571.48  5597.54  9915.25 10330.47    10627 12150.96
 8 10374.71  3817.62 11306.05  11455.5  3296.03  9739.45        0  4519.59  7800.11 10514.02        0 11850.34  2562.66  4347.68  5415.03  3034.74  6535.96   7395.3        0  2512.28  5778.15 13878.02 11935.85  8307.78
 9 12803.48 13301.84   2004.6  5114.32 13063.29  9699.24  4678.55 14494.23  5102.92  8471.06   7154.6 14570.28        0  7489.07  6247.36  2818.76  6716.95 14427.57  9216.64        0  1626.04 14307.02  5364.81 11395.73
10 13638.55  13023.8  4481.94 13272.87        0  5672.35  6235.91  5701.39  6264.48 12928.25 10047.37 10933.26 10074.58  7893.65 12362.12  7390.65  8461.34 13726.15   7249.6  7075.94  9896.47 11279.99    10610  8096.06
11 14657.92        0  4781.83 14775.16 12370.37        0  6022.28  2563.49  8435.25  9696.16  9035.82   3774.2  2200.07  2335.71  5071.95  4653.26  5219.47  10548.7  13572.9  2468.76 11361.58 14340.73  13982.3  5194.59
12  2758.25        0  6773.04  3081.94        0  2746.19 13877.89  12829.2 12676.14   2211.1  2193.75  9299.16  4549.23  9607.71  1542.08 14655.06  2232.72 13275.68        0  7044.37 12671.71  4263.46  8925.02  8981.86
13  7792.97   7528.4  6251.25  9333.24  9193.49        0  8193.82 13986.84  2028.61 12245.43 14939.55  9983.74  4280.02  3079.87 11707.39  2919.55        0        0  2319.09  6888.25        0  5582.77 13334.77 12475.53
14  6796.29 10506.12 11018.06  8860.01  2954.22  6893.12 13298.65        0  7218.22  2293.77  2592.77  1593.46  6399.95 14825.68  3853.68  7819.25        0  5574.17        0  1624.19  6440.36  5657.44  7928.91 11223.04
15   8689.6        0  8692.99 10680.21  2880.66 11664.42        0 11123.44  11722.5   3595.7 14557.11  2296.03  4935.62 10692.08        0 14927.27 11363.96  3271.04  5536.67 10847.73   7327.8  7010.88 12786.93 11230.23
16  9303.43  9137.99 11439.58   3825.3  7624.42    5,000  11070.5 10673.43  9135.99  7739.39  2557.49 13181.07  2645.68  6786.95  4265.29  5635.35  6368.09   7044.7  8862.68  9859.06 10449.64 11696.09  12595.21  13853.6
17  2155.19  5470.53  2517.65 11666.24  4765.71 12700.25  5033.77  6397.98 10113.89   8720.9  12151.2  8068.58  4966.89 14398.62 11645.72 12367.92        0            0        0  5311.36   7278.5 14804.43           0           0
18  3886.64        0   5645.1 10753.37  7095.96        0  4477.53 13155.43  9014.54  8202.76  2598.47 14735.47   6442.4 10309.98 11826.95  6354.62  3673.36  4460.37 14855.72  11778.3        0 10759.08  6896.12  8234.72
19  12852.5 10789.16        0  2773.76  1870.67        0   1571.7  7922.01  4682.85 14588.06 12053.67  8589.41 10001.18  3071.72  7899.94 10771.72  1661.51  2132.91 14219.73  1832.53  1595.01  3912.33  4360.58  5302.18
20        0   6876.8        0  8361.33  1892.24  5091.36  6995.07 12046.62  6783.31  2418.21 11333.68  2302.99 12451.49 13446.51  4863.94  6320.14  7015.25  7280.54  6557.88  7571.45  13381.5 14725.07  6633.58  1871.72
end

First, I am trying to find out individuals who had positive wages for the first 12 consecutive quarters. Then I am trying to identify two groups:

Of these individuals who were employed for the first 12 consecutive quarters, I am trying to find out who experienced at least two consecutive quarters of unemployment at any time between Quarter 13 through Quarter 24.
Of these individuals who were employed for the first 12 consecutive quarters, I am trying to find out who had positive wage growth between Quarter 13 through Quarter 24. This would imply that an individual had positive wages in Quarter 13 through Quarter 24 and their wages were also growing quarter to quarter.

I began by reshaping my data to long and identifying the data as time series data. I utilized the community-contributed program tsspell and I'm also able to identify how long someone is employed in the first 12 consecutive quarters using the following code:

Code:

*Reshape to long
reshape long twcwage, i(id) j(time)

*Tell Stata that it's time series data
tsset id time

*Use the tsspell package to identify spells
tsspell , pcond(twcwage)

*Identify length of employment in the first 12 consective quarters
egen wanted = max(_seq) if time<=12, by(id) 
replace wanted=0 if wanted==.

*Find the longest length of employment in the first 12 consective quarters
egen wanted2 = max(wanted), by(id)

*Keep if length of employment in the first 12 consective quarters is 12. 
keep if wanted2==12

Issue:
I was unable to identify individuals who were employed for the first 12 consecutive quarters and experienced at least two consecutive quarters of unemployment at any time between Quarter 13 through Quarter 24. I tried the following code but I can't get Stata to identify at least two consecutive quarters of unemployment at any time between Quarter 13 through Quarter 24.

Code:

*Drop _seq, _spell, _end because running tsspell again
drop _seq _spell _end
tsspell if time>12, pcond(twcwage)
egen wanted3 = min(_seq) if time>12, by(id)

I also tried to identify individuals who were employed for the first 12 consecutive quarters, I am trying to find out who had positive wage growth between Quarter 13 through Quarter 24. I tried to divide wags quarter over quarter starting in Quarter 13 but I am not sure how to identify that wages grew from Quarter 13 through Quarter 24. I used the following code.

Code:

*Divide wages quarter over quarter starting in Quarter 13
bysort id: gen increase_wage= (twcwage)/ (twcwage[_n-1]) if time>=13

I feel I am missing something. Thanks for your help.

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

08 Nov 2021, 10:18

Code:

reshape long twcwage, i(id) j(quarter)

gen byte employed = twcwage > 0

by id (quarter), sort: egen byte employed_all_first_12_quarters = min(cond(quarter <= 12, employed, .))
by id (quarter): gen byte wages_increasing = twcwage < twcwage[_n+1] if inrange(quarter, 13, 23)
by id (quarter): egen positive_wage_growth_13_24 = min(cond(inrange(quarter, 13, 23), wages_increasing, .))
gen byte wanted2 = employed_all_first_12_quarters & positive_wage_growth_13_24

by id (quarter): gen spell_num = sum(employed != employed[_n-1])
by id spell_num (quarter), sort: gen spell_duration = _N
by id (spell_num quarter): egen byte two_consec_qrtrs_unemp_13_24 = ///
    max(cond(quarter > 12, spell_duration >= 2 & !employed, .))
gen byte wanted1 = employed_all_first_12_quarters & two_consec_qrtrs_unemp_13_24

Last edited by Clyde Schechter; 08 Nov 2021, 10:22. Reason: Change code to avoid reliance on subscripting within an -egen- function.

Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

08 Nov 2021, 10:24

Another solution:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id twcwage1 twcwage2 twcwage3 twcwage4 twcwage5 twcwage6 twcwage7 twcwage8 twcwage9 twcwage10 twcwage11 twcwage12 twcwage13 twcwage14 twcwage15 twcwage16 twcwage17 twcwage18 twcwage19 twcwage20 twcwage21 twcwage22 twcwage23 twcwage24)
 1 14424.97  5454.47  2172.38  7475.95  3623.67  8266.98 11483.35 13923.77        0  4101.16 10501.73  6994.87 14258.09  7892.02 14677.78 13268.03  2151.58  1632.03 10939.14  8044.03 14966.76 12301.58  6528.78  4526.62
 2  4853.76  11903.3 12568.52  8264.54  2947.44 12073.08   6502.1 14233.04  2129.83  8858.49 14747.98  7006.83  6516.13  9847.42         0          0     9351.71  2314.42  4455.06 12297.85  7553.74 12679.37         0           0
 3  14598.3  5555.92 12205.94     9665        0  6742.81  1823.49        0  8389.34        0  5454.41  8668.01  6547.18 14457.23  12463.6  6413.48  6634.47 13204.18 12011.79  3101.11  6495.12  4445.22  8546.96  4130.96
 4  1932.64  2813.21  3926.87  4650.35  7102.57  4024.89  3343.85  2998.88  3575.33  14957.5 11875.44 10075.93  3262.63  4582.33  5298.61  6373.88  7675.03  8901.78  9962.03 10664.47 12129.49 13443.99 14218.34 15501.03
 5 13524.21   2452.5        0  9328.19  6095.36  6538.84 14607.75 10607.04 10080.55 14981.65  1617.54   9529.7  8482.97 13861.68  8083.54  6739.79 12853.41   2370.7  9294.33  2769.99 13772.91  6044.32  7015.51 13075.92
 6  7441.03 13554.42  9841.29 13084.98  6017.36  7191.84  2678.67        0 11430.44 10894.58  6929.55 10585.47  5021.61 10027.56 11200.41  4117.87 11682.88  13141.5 11617.82  4261.87 10287.36 14587.11 12735.18  9065.99
 7 14334.46 14203.28 13721.12 11842.04 10169.21  5357.79        0 13079.71 13700.71 12950.95 11169.27  3369.16  2625.83 11944.33        0  3927.54  1946.54  9544.32  7571.48  5597.54  9915.25 10330.47    10627 12150.96
 8 10374.71  3817.62 11306.05  11455.5  3296.03  9739.45        0  4519.59  7800.11 10514.02        0 11850.34  2562.66  4347.68  5415.03  3034.74  6535.96   7395.3        0  2512.28  5778.15 13878.02 11935.85  8307.78
 9 12803.48 13301.84   2004.6  5114.32 13063.29  9699.24  4678.55 14494.23  5102.92  8471.06   7154.6 14570.28        0  7489.07  6247.36  2818.76  6716.95 14427.57  9216.64        0  1626.04 14307.02  5364.81 11395.73
10 13638.55  13023.8  4481.94 13272.87        0  5672.35  6235.91  5701.39  6264.48 12928.25 10047.37 10933.26 10074.58  7893.65 12362.12  7390.65  8461.34 13726.15   7249.6  7075.94  9896.47 11279.99    10610  8096.06
11 14657.92        0  4781.83 14775.16 12370.37        0  6022.28  2563.49  8435.25  9696.16  9035.82   3774.2  2200.07  2335.71  5071.95  4653.26  5219.47  10548.7  13572.9  2468.76 11361.58 14340.73  13982.3  5194.59
12  2758.25        0  6773.04  3081.94        0  2746.19 13877.89  12829.2 12676.14   2211.1  2193.75  9299.16  4549.23  9607.71  1542.08 14655.06  2232.72 13275.68        0  7044.37 12671.71  4263.46  8925.02  8981.86
13  7792.97   7528.4  6251.25  9333.24  9193.49        0  8193.82 13986.84  2028.61 12245.43 14939.55  9983.74  4280.02  3079.87 11707.39  2919.55        0        0  2319.09  6888.25        0  5582.77 13334.77 12475.53
14  6796.29 10506.12 11018.06  8860.01  2954.22  6893.12 13298.65        0  7218.22  2293.77  2592.77  1593.46  6399.95 14825.68  3853.68  7819.25        0  5574.17        0  1624.19  6440.36  5657.44  7928.91 11223.04
15   8689.6        0  8692.99 10680.21  2880.66 11664.42        0 11123.44  11722.5   3595.7 14557.11  2296.03  4935.62 10692.08        0 14927.27 11363.96  3271.04  5536.67 10847.73   7327.8  7010.88 12786.93 11230.23
16  9303.43  9137.99 11439.58   3825.3  7624.42    5,000  11070.5 10673.43  9135.99  7739.39  2557.49 13181.07  2645.68  6786.95  4265.29  5635.35  6368.09   7044.7  8862.68  9859.06 10449.64 11696.09  12595.21  13853.6
17  2155.19  5470.53  2517.65 11666.24  4765.71 12700.25  5033.77  6397.98 10113.89   8720.9  12151.2  8068.58  4966.89 14398.62 11645.72 12367.92        0            0        0  5311.36   7278.5 14804.43           0           0
18  3886.64        0   5645.1 10753.37  7095.96        0  4477.53 13155.43  9014.54  8202.76  2598.47 14735.47   6442.4 10309.98 11826.95  6354.62  3673.36  4460.37 14855.72  11778.3        0 10759.08  6896.12  8234.72
19  12852.5 10789.16        0  2773.76  1870.67        0   1571.7  7922.01  4682.85 14588.06 12053.67  8589.41 10001.18  3071.72  7899.94 10771.72  1661.51  2132.91 14219.73  1832.53  1595.01  3912.33  4360.58  5302.18
20        0   6876.8        0  8361.33  1892.24  5091.36  6995.07 12046.62  6783.31  2418.21 11333.68  2302.99 12451.49 13446.51  4863.94  6320.14  7015.25  7280.54  6557.88  7571.45  13381.5 14725.07  6633.58  1871.72
end

    egen emp1_12 = rowmin(twcwage1-twcwage12)
    replace emp1_12 = emp1_12 > 0        //emp1_12 = 1 for the employed in the first 12 quarters

    reshape long twcwage, i(id) j(qtr)

* Find those unemployed for at least two quarters in 13-24 (une13_24 = 1)
    bys id (qtr): gen une13_24 = 1 if emp1_12 == 1 & qtr > 13 & twcwage == 0 & twcwage[_n-1] == 0 & qtr-qtr[_n-1] == 1 
    bys id (une13_24): replace une13_24 = une13_24[_n-1] if une13_24[_n-1] == 1
    replace une13_24 = 0 if une13_24 == .
    
* Find those whose wage keep growing in 13-24 (wagegr13_24 = 1)
    bys id (qtr): gen wagegr13_24 = 1 if emp1_12 == 1 & qtr > 13 & twcwage > twcwage[_n-1] & twcwage[_n-1] > 0
    bys id: egen tempvar = total(wagegr13_24)
    replace wagegr13_24 = tempvar == 24-13
    drop tempvar
    
* Reshape wide
    reshape wide twcwage, i(id) j(qtr)

Comment

Jack Prado

Join Date: Sep 2021

Posts: 15
#4

08 Nov 2021, 14:25

Professor Schechter -Thank you so much for your help. Your codes worked wonderfully. The codes are written so elegantly and it took a while for me to unpack your thought process. I was wondering how to interpret the following code you shared below. The code works and I am trying to interpret the code and put it in laymen's terms but the exclamation mark and the conditional is throwing me. I want to be able to understand your code just to learn.

Code:

by id (spell_num quarter): egen byte two_consec_qrtrs_unemp_13_24 = /// max(cond(quarter > 12, spell_duration >= 2 & !employed, .))

Professor Wang - Thank you for sharing your code. Your code is also so clean and it was great to see the creativity. I learned so much just trying to understand your codes. I have never used a tempvar and it's going to be in my toolkit. The code below to create a dummy variable new to me and I am going to use it in the future. Thank you so much.

Code:

replace wagegr13_24 = tempvar == 24-13
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#5

08 Nov 2021, 14:54

Both your questions to me and to Fei Wang lead me to infer that you are not familiar with logical (Boolean) expressions in Stata.

Simple logical expressions can be either numerical constants, expressions involving variables, scalars, matrices, etc., or relational expressions such as -this == that- or -this < that- or -this >= that-, etc. In interpreting these simple logic expressions, 0 results are considered false, and any non-zero result (including missing value) is considered true.

Simple logical expressions can be combined to make more complicated ones with the &, |, and ! operators. & represents AND, | represents OR, and ! represents NOT. (Some people use ~ instead of ! for this purpose. They are interchangeable, but most people seem to prefer !.)

With that under the belt, let's unpack that code in #4, working from the inside out.

spell_duration >= 2 is a logical expression that asks Stata to look at each observation and determine whether or not the value of the spell_duration variable is at least 2. If so, the result is numerically 1 (true), and if not, it is 0 (false). Stata then looks at !employed in each observation. Where employed is 0 (false) !employed is true, and vice versa. These are then combined with logical AND. So, the combined expression -spell_duration >= 2 & !employed- evaluates to true (1) in an observation if, in that observation, the value of variable spell_duration is at least 2, and employed is 0. In terms of your original problem statement, this means that in this observation, the person in question is in the midst of a spell of unemployment that extends for 2 or more consecutive quarters.

Now, the cond() function is one of Stata's most useful. It takes three arguments, a logical expression and two numerical expressions and returns a numerical expression. -cond(a, b, c)- tells Stata to first evaluate, in each observation, logical expression a. If a is true, then return the value of numerical expression b, otherwise return the value of numerical expression c. In this particular context, that means that Stata first looks at each observation and decides whether or not the value of quarter is > 12 (which is the same thing as 13-24 in your data). If so, the result is the value of the logical expression -spell_duration >= 2 & !employed- (1 if true, 0 if false). If, however, this is an observation where quarter <= 12, then -cond()- will return the value of the second expression, which was here specified as a missing value. So putting that altogether, in any observation, the value of -cond()- will be missing value if quarter <= 12, 1 in quarters 13-24 if at that time the person is enmeshed in a spell of unemployment that lasts a total of 2 or more consecutive quarters, and 0 in quarters 13-24 otherwise. (Otherwise here boils down to: is currently employed, or is unemployed but the duration of the unemployment spell is shorter than 2 consecutive quarters.)

The outermost layer of this onion is the -egen, max()- function. As you know, this evaluates the expression inside the -max()- parentheses for each observation in the data set and returns the largest non-missing value encountered (or missing value if there are no non-missing values). So in this context it means we scan all of the observations for each person. We can ignore the first 12 quarters, because for those, the value of -cond()- is always missing. Among the quarters 13-24, there may be some in which the person is currently unemployed and the unemployed spell lasts 2 or more consecutive quarters---cond()- returns 1 for those, and there may be some in which the person is currently employed, or just transiently unemployed for one quarter--for these -cond()- returns 0. So -max()- will return 1 if there is any quarter between 13 and 24 during which the person is unemployed and the duration of the unemployment spell is 2 or more consecutive quarters, and 0 otherwise.
Comment
Jack Prado

Join Date: Sep 2021

Posts: 15
#6

08 Nov 2021, 16:26

Hi Professor Schechter. Thank you so much for walking me through your code. I get the intuition behind your code and I have better understanding of how to use logical expressions. Thank you.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#7

08 Nov 2021, 19:50

Many thanks to Clyde Schechter 's detailed instructions.

To Jack: the line of code you were asking means wagegr13_24 equals one if tempvar equals 11 (24-13), and the former equals 0 if tempvar doesn't equal 11. The "tempvar" is nothing special, just a name of a temporary variable, and you can call it anything.
Comment
Jack Prado

Join Date: Sep 2021

Posts: 15
#8

09 Nov 2021, 10:06

Thank you Professor Wang. Your explanation helps a lot. I see the logic behind your code.
Comment

Announcement

Identifying types of spells within a certain time period

Comment

Comment

Comment

Comment

Comment

Comment

Comment