So my dataset is in the long format. A snapshot of my data is shown below. (I'm having problems downloading dataex on research servers)
What I have been trying to do is calculate frequency of individuals in each home state over time, according to whether they received benefits or not. ideally I'm hoping to get two cumulative frequency tables, one for those who never received benefit and another for those who received benefit at a timepoint.
As such:
Benefits are only available at t=3 onwards so I expected an increase over time in the second table and thus a fall from t=3 onwards in the table of those who did not receive benefits at any time point.
What I've tried is generating new variable newvar which has 2 possible values: 1 if individual received benefit at any time point and 2 if they did not. Then I would do following:
While this produced frequency table, I did not obtain cumulative frequencies. The issue I'm having is in isolating distinct observations - since individuals could receive benefits at more than one time point, simply adding up would give me an incorrect total. This is what I'm stuck on and I would appreciate any insight. Thanks.
Code:
id t hs benefits 1 1 3 1 2 2 1 3 1 Y 1 4 1 Y 1 5 1 1 6 4 1 7 5 2 1 2 2 2 1 2 3 6 2 4 5 2 5 4 2 6 2 2 7 2 3 1 2 3 2 4 3 3 5 3 4 5 3 5 5 Y 3 6 5 3 7 5
As such:
Code:
t hs1 hs2 hs3 hs4 hs5 hs6 1 36 39 18 39 29 20 2 3 23 36 12 18 25 3 25 41 13 32 16 20 4 42 31 50 4 47 28 5 45 20 2 30 26 48 6 5 47 11 18 12 41 7 32 38 30 12 20 5
What I've tried is generating new variable newvar which has 2 possible values: 1 if individual received benefit at any time point and 2 if they did not. Then I would do following:
Code:
bysort newvar: tab t hs, col
Comment