Making a Balanced Panel Dataset in Stata 16 from (Almost) Scratch

Michael Lachanski

Join Date: Apr 2020

Posts: 7
#1

Making a Balanced Panel Dataset in Stata 16 from (Almost) Scratch

08 Aug 2020, 09:32

Hello, I'm a long-time lurker and first time poster.

I have a dataset in Stata with units (say, statefips codes) and dates at regular intervals (say, daily) and observations on variables X, Y, and Z.

I would like to extend my panel backwards in time T number of days, filling in the variables X, Y, and Z with 0 for all units. I have to do this many times, so manually inserting a first observation and then using carryforward could be prohibitively time consuming. Thank you.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17701
#2

08 Aug 2020, 09:38

Michael:
welcome to this foum.
What you've in mind sounds, methodologically speaking, like a very questionable approach, as you would end up with a panel dataset that is miles far away from the original one.
In addition, Stata can handle both balanced and unbalanced panel dataset with no problem.

Kind regards,
Carlo
(Stata 19.0)
Comment
Michael Lachanski

Join Date: Apr 2020

Posts: 7
#3

08 Aug 2020, 09:45

Carlo,

I agree in principle re: "panel dataset that is miles...". I am extending my primary panel for the purpose of merging the primary balanced panel created by this process with another, second dataset. That other, second dataset has daily predictors and a time series structure relevant for forecasting things in the primary balanced panel dataset.

The point isn't to balance the primary panel for the purpose of directly estimating something on the panel. I only note that the primary panel dataset is balanced to further simplify the data munging problem I describe.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17701

08 Aug 2020, 09:59

Michael:
thanks for clarifying.
You may want to consider something along the line of the following toy-example:

Code:

. set obs 10
number of observations (_N) was 0, now 10

. g X=1 in 1/7
(3 missing values generated)

. g Y=1 in 1/6
(4 missing values generated)

. g Z=1 in 1/4
(6 missing values generated)

. foreach var of varlist X-Z {
  2. replace `var'=0 if `var'==.
  3.  }
(3 real changes made)
(4 real changes made)
(6 real changes made)

. list

     +-----------+
     | X   Y   Z |
     |-----------|
  1. | 1   1   1 |
  2. | 1   1   1 |
  3. | 1   1   1 |
  4. | 1   1   1 |
  5. | 1   1   0 |
     |-----------|
  6. | 1   1   0 |
  7. | 1   0   0 |
  8. | 0   0   0 |
  9. | 0   0   0 |
 10. | 0   0   0 |
     +-----------+

.

Comment

Michael Lachanski

Join Date: Apr 2020

Posts: 7
#5

08 Aug 2020, 10:38

Thank you for this helpful example Carlo.
My challenge is to do something like what you have done for each set of units from the primary dataset.

So, in your example, I would have units A, B, C, and say, two prespecified dates D1 and D2. I want to be able to copy the list of units from the primary dataset, create an observation at regular intervals (daily for concreteness) between dates D1 and D2, and then set the values of those observations to 0.

Surprisingly, I've never seen something like this done on the forum but I think this must be a common enough task that I thought it would be worth asking.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#6

08 Aug 2020, 12:29

The way I usually approach this situation is to start by creating a framework data set that contains only the unit-identifier and the time variable, fully balanced. For example:

Code:

clear* local D1 td(1jan2019) local D2 td(31dec2019) set obs 3 gen unit = "A" in 1 replace unit = "B" in 2 replace unit = "C" in 3 expand `D2' - `D1' + 1 by unit, sort: gen date = `D1' + _n - 1 format date %td

Then I -merge- the data into that. The _merge variable created by that process gives me a handle on which observations were in the data and which ones remain open. You can replace the values of the variable by zero if you want to, but I suspect that for many purposes it won't even be necessary to do that as you can easily distinguish those observations by _merge == 2.

I find this approach much simpler than trying to stick extra observations into an existing data set.

Added: Assuming that in real life you have more than three units, and more than you would care to type a line of code each for, you can replace the -set obs 3- through -replace unit = "C" in 3- commands with:

Code:

use unit using my_data_set duplicates drop

Last edited by Clyde Schechter; 08 Aug 2020, 12:31.
1 like
Comment
Michael Lachanski

Join Date: Apr 2020

Posts: 7
#7

08 Aug 2020, 17:30

Clyde:

This was really exactly what I was looking for. For completeness, and future (self-)reference I would add that I would also want to append the original dataset and then use carryforward.

Code:

append using my_data_set g negdate = -date bysort unit (negdate): carryforward X Y Z, replace back save data_set

Thank you.
Comment

Announcement

Making a Balanced Panel Dataset in Stata 16 from (Almost) Scratch

Comment

Comment

Comment

Comment

Comment

Comment