Hello everyone,
sorry for choosing the bulkiest title I could find.
My data looks a bit like this (I choose one individual I call ID137 here as an example):
Member: 1 = Membership; 0 = No membership
The idea is to build a tenure variable that counts consecutive years of being a member of an organization, switch back to 0 if a person cancelled the membership, and start counting when that person becomes a member again. There are also IDs that never were a member and IDs that always were a member.
My (unsuccessful) tries to arrive at a "desired tenure variable" looked a bit like this:
gen tenure_variable = .
(1) bysort pid: replace tenure_variable= sum(syear - L_year) if member == 1 & L_member == 1
(2) bysort pid: replace tenure_variable= 1 if member== 1 & L_member == .
(3) bysort pid: replace tenure_variable= 1 if member== 1 & L_member == 0
(4) bysort pid: replace tenure_variable= union if member== 0
(1) is supposed to give me the sum of all consecutive years in which the person was a member. I have no idea why, but this worked kind of well for those individuals, who were a member from their first data point onwards. Problems arose when the membership status switched or the person became a member later.
(2) and (3) just insert 1 for the first membership year.
(4) puts a zero everywhere else.
The lagged variables L_year and L_member are created beforehand and refer to the respective value in the last available year. As my data is non-continuous I had to create these variables myself instead of just using L.year and L.member.
If you can help me create the desired tenure variable I would be incredibly thankful. I guess one should be able to do it without lagged values by using some kind of loop? In total, I have over ten years of data for some IDs, which would make creating lags (L, L2, L3....) manually extremely tedious.
Thank you!!!
sorry for choosing the bulkiest title I could find.
My data looks a bit like this (I choose one individual I call ID137 here as an example):
ID | Member | year | Desired tenure variable |
137 | 1 | 1986 | 1 |
137 | 1 | 1990 | 5 |
137 | 0 | 1994 | 0 |
137 | 1 | 1999 | 1 |
137 | 1 | 2002 | 4 |
137 | 1 | 2004 | 6 |
The idea is to build a tenure variable that counts consecutive years of being a member of an organization, switch back to 0 if a person cancelled the membership, and start counting when that person becomes a member again. There are also IDs that never were a member and IDs that always were a member.
My (unsuccessful) tries to arrive at a "desired tenure variable" looked a bit like this:
gen tenure_variable = .
(1) bysort pid: replace tenure_variable= sum(syear - L_year) if member == 1 & L_member == 1
(2) bysort pid: replace tenure_variable= 1 if member== 1 & L_member == .
(3) bysort pid: replace tenure_variable= 1 if member== 1 & L_member == 0
(4) bysort pid: replace tenure_variable= union if member== 0
(1) is supposed to give me the sum of all consecutive years in which the person was a member. I have no idea why, but this worked kind of well for those individuals, who were a member from their first data point onwards. Problems arose when the membership status switched or the person became a member later.
(2) and (3) just insert 1 for the first membership year.
(4) puts a zero everywhere else.
The lagged variables L_year and L_member are created beforehand and refer to the respective value in the last available year. As my data is non-continuous I had to create these variables myself instead of just using L.year and L.member.
If you can help me create the desired tenure variable I would be incredibly thankful. I guess one should be able to do it without lagged values by using some kind of loop? In total, I have over ten years of data for some IDs, which would make creating lags (L, L2, L3....) manually extremely tedious.
Thank you!!!
Comment