Hello -
I have a dataset with records of clients that have performed a specific action. This action can be performed multiple times and each time it is performed, this appears as a new record. I have a unique clientid variable and there is a variable, sequence, that tells me the position/order of that specific record (among existing records for that clientid).
I have cleaned up my dataset to create new variables for date1, date2, date3, and date4. The first record (i.e., the record with sequence =1) will only have a date1; the record with sequence = 2 will only have a date 2, and so on. I dropped all records with a sequence > 4.
Now, I want to combine observations (using clientid) so that my dataset becomes a dataset of clients and not records. For each existing variable (var1, var2, var3, var4..... var18), I would like to keep data from the most recent observation. The most recent observation would have the most recent createdate. (The last 2 sentences are nice to have, but not absolutely necessary. I would like one record that has all the variables in one row.... most importantly date1, date2, date3, date4).
I am fairly new to stata and I know this involves some sort of bysort, egen code, _N.
Please help
----
A table for your understanding
OUTPUT
Thanks for your help!
Bri
I have a dataset with records of clients that have performed a specific action. This action can be performed multiple times and each time it is performed, this appears as a new record. I have a unique clientid variable and there is a variable, sequence, that tells me the position/order of that specific record (among existing records for that clientid).
I have cleaned up my dataset to create new variables for date1, date2, date3, and date4. The first record (i.e., the record with sequence =1) will only have a date1; the record with sequence = 2 will only have a date 2, and so on. I dropped all records with a sequence > 4.
Now, I want to combine observations (using clientid) so that my dataset becomes a dataset of clients and not records. For each existing variable (var1, var2, var3, var4..... var18), I would like to keep data from the most recent observation. The most recent observation would have the most recent createdate. (The last 2 sentences are nice to have, but not absolutely necessary. I would like one record that has all the variables in one row.... most importantly date1, date2, date3, date4).
I am fairly new to stata and I know this involves some sort of bysort, egen code, _N.
Please help
----
A table for your understanding
clientid | var1 | sequence | date1 (mdy) | date2 | date3 | date4 | createdate |
111111 | aaa | 1 | 1/1/2021 | . | . | . | 1/4/2021 |
222222 | aaa | 1 | 1/2/2021 | . | . | . | 1/4/2021 |
222222 | aaa | 2 | . | 2/2/2021 | . | . | 2/4/2021 |
333333 | bbb | 1 | 1/3/2021 | . | . | . | 1/4/2021 |
333333 | bbb | 2 | . | 2/3/2021 | . | . | 2/4/2021 |
333333 | ccc | 3 | . | . | 3/3/2021 | . | 3/4/2021 |
444444 | ddd | 1 | 1/4/2021 | . | . | . | 1/4/2021 |
444444 | ddd | 2 | . | 2/4/2021 | . | . | 2/4/2021 |
444444 | eee | 3 | . | . | 3/4/2021 | . | 3/4/2021 |
444444 | fff | 4 | . | . | . | 4/4/2021 | 4/4/2021 |
clientid | var1 | sequence | date1 (mdy) | date2 | date3 | date4 | createdate |
111111 | aaa | 1 | 1/1/2021 | . | . | . | 1/4/2021 |
222222 | aaa | 2 | 1/2/2021 | 2/2/2021 | . | . | 2/4/2021 |
333333 | ccc | 3 | 1/3/2021 | 2/3/2021 | 3/3/2021 | . | 3/4/2021 |
444444 | fff | 4 | 1/4/2021 | 2/4/2021 | 3/4/2012 | 4/4/2021 | 4/4/2021 |
Bri
Comment