Hello,
I am working with a large data set (5 million + rows), of individuals. I have a column for every month of the year for their spouse in that month, identified by a string identifier (Column: S_M1 is for Spouse in month 1). Most individuals have the same spouse from months 1 to 12. However, some change their spouse between months. I want to change the data to have just max 3 columns of first spouse, second spouse , etc, during the year
Current format:
Preferred format:
If I can get it to the preferred format above, I will reshape it and it and every spouse will have their own row with all the associated households variables. I wonder if its possible to skip step 2 above. The ideal end format is as follows:
Please let me know if anyone is able to solve this problem. Thank you!
I am working with a large data set (5 million + rows), of individuals. I have a column for every month of the year for their spouse in that month, identified by a string identifier (Column: S_M1 is for Spouse in month 1). Most individuals have the same spouse from months 1 to 12. However, some change their spouse between months. I want to change the data to have just max 3 columns of first spouse, second spouse , etc, during the year
Current format:
Individual | S_M1 | S_M2 | S_M3 | S_M4 | S_M5 | S_M6 | S_M7 | S_M8 | S_M9 | S_M10 | S_M11 | S_M12 | All other variables related to household |
XD309 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | GG345 | |
DG23$ | WD111 | WD111 | WD111 | WD111 | . | TT343 | TT343 | TT343 | TT343 | . | QW123 | QW123 | |
SA098 | . | . | . | . | . | . | . | . | . | . | . | . | |
GF203 | FD123 | FD123 | . | . | . | . | AA999 | AA999 | AA999 | AA999 | AA999 | AA999 |
Preferred format:
Individual | Spouse1 | Spouse2 | Spouse3 | All other variables related to household |
XD309 | GG345 | . | . | |
DG23$ | WD111 | TT343 | QW123 | |
SA098 | . | . | . | |
GF203 | FD123 | AA999 | . |
If I can get it to the preferred format above, I will reshape it and it and every spouse will have their own row with all the associated households variables. I wonder if its possible to skip step 2 above. The ideal end format is as follows:
Individual | All other variables related to household |
XD309 | |
GG345 | |
DG23$ | |
WD111 | |
TT343 | |
QW123 | |
SA098 | |
GF203 | |
FD123 | |
AA999 |
Please let me know if anyone is able to solve this problem. Thank you!
Comment