Generating dummy observations to balance a panel

Sohini Mazumder

Join Date: Jun 2021

Posts: 21
#1

Generating dummy observations to balance a panel

28 Jun 2022, 13:34

I hope this request makes sense, as it is just to aid in my estimation. Below is the dataex of a dummy dataset resembling my original, and below that I will describe my problem.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 ID float(phase HasMembership) "A" 1 0 "A" 3 1 "B" 1 1 "B" 2 0 "B" 3 1 "C" 1 0 "C" 2 1 "C" 3 1 "D" 2 1 "D" 3 0 "E" 1 1 "E" 3 0 "F" 1 1 "F" 3 0 end

In my previous post, I had requested a way to track an individual's membership changes between phases. The advice given in that post was very good. I was able to generate variables which described whether an individual gained, lost, retained, or retained lack of a membership between any two consecutive phases.

The problem with my actual full fledged dataset is that there are individuals who don't always have consecutive phases. For example, in the given dataex, individual A has observations only in phase 1 and phase 3, we don't know anything about him in phase 2. Therefore with the solution code given in my previous post, the generated variables could not capture anything for individual A. It is my mistake that when I provided a dummy representative dataset, I made it balanced instead of unbalanced.

To counter this problem, is there any code or solution in stata by which I can generate dummy observations for individuals whose observations are not in every phase? And of course the values of Membership for those dummy variables would be the missing value. This is only to counter the problem that the solutions won't work for non consecutive periods. Hence since individual A has no observations in phase 2, his p2_p3 variable is missing. But I still want to capture the change that some time between phase 1 and phase 3, he did gain membership.

Otherwise if there is any other viable solution, I would be grateful to know.

EDIT: Thanks to Mr. Schechter for pointing out the mistake in the dataex, I have updated it

Last edited by Sohini Mazumder; 28 Jun 2022, 13:50.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

28 Jun 2022, 13:41

-help tsfill- will show the way.

BUT, you have a problem with the data. ID "B" has two observations for phase 1, and, worse, they are contradictory. You won't be able to -xtset- your data until you reduce it to a single observation per ID per phase. The fact that you have "surplus" observations like that is disturbing enough, that they contradict each other on another key variable is even worse. Before you move forward, go back and review the data management that created this data set. It seems to be significantly flawed and may contain other errors as well. Get the data right before you move on to analysis.
Comment
Sohini Mazumder

Join Date: Jun 2021

Posts: 21
#3

28 Jun 2022, 13:48

Originally posted by Clyde Schechter View Post

-help tsfill- will show the way.

BUT, you have a problem with the data. ID "B" has two observations for phase 1, and, worse, they are contradictory. You won't be able to -xtset- your data until you reduce it to a single observation per ID per phase. The fact that you have "surplus" observations like that is disturbing enough, that they contradict each other on another key variable is even worse. Before you move forward, go back and review the data management that created this data set. It seems to be significantly flawed and may contain other errors as well. Get the data right before you move on to analysis.

Thank you, Mr. Schechter. Actually the fault is mine, in editing a dummy dataset to provide as a dataex example, I made a hurried mistake. The original dataset is does not have these errors and has only a single observation per ID per phase.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 ID float(phase HasMembership) "A" 1 0 "A" 3 1 "B" 1 1 "B" 2 0 "B" 3 1 "C" 1 0 "C" 2 1 "C" 3 1 "D" 2 1 "D" 3 0 "E" 1 1 "E" 3 0 "F" 1 1 "F" 3 0 end

I have updated a corrected dataex with one observation per ID per phase.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#4

28 Jun 2022, 13:53

Thanks for the corrected example data.

Code:

encode ID, gen(n_ID) tsset n_ID phase tsfill list, noobs clean

Added: It dawns on me that it is unclear how you want to handle ID's like "D" where the "gap" is at the beginning. The code shown above deals with skips within the sequence, but does not deal with situations where the first or final wave is not instantiated. If you want to generate extra observations for those as well, then add the -full- observation to the -tsfill- command and it will do that.

Last edited by Clyde Schechter; 28 Jun 2022, 13:57.
Comment
Sohini Mazumder

Join Date: Jun 2021

Posts: 21
#5

28 Jun 2022, 13:59

Originally posted by Clyde Schechter View Post

Thanks for the corrected example data.

Code:

encode ID, gen(n_ID) tsset n_ID phase tsfill list, noobs clean

Added: It dawns on me that it is unclear how you want to handle ID's like "D" where the "gap" is at the beginning. The code shown above deals with skips within the sequence, but does not deal with situations where the first or final wave is not instantiated. If you want to generate extra observations for those as well, then add the -full- observation to the -tsfill- command and it will do that.

Thanks a lot for the help and the suggestion
Comment

Announcement

Generating dummy observations to balance a panel

Comment

Comment

Comment

Comment