Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a Balanced Panel Dataset in Stata 16 from (Almost) Scratch

    Hello, I'm a long-time lurker and first time poster.

    I have a dataset in Stata with units (say, statefips codes) and dates at regular intervals (say, daily) and observations on variables X, Y, and Z.

    I would like to extend my panel backwards in time T number of days, filling in the variables X, Y, and Z with 0 for all units. I have to do this many times, so manually inserting a first observation and then using carryforward could be prohibitively time consuming. Thank you.

  • #2
    Michael:
    welcome to this foum.
    What you've in mind sounds, methodologically speaking, like a very questionable approach, as you would end up with a panel dataset that is miles far away from the original one.
    In addition, Stata can handle both balanced and unbalanced panel dataset with no problem.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo,

      I agree in principle re: "panel dataset that is miles...". I am extending my primary panel for the purpose of merging the primary balanced panel created by this process with another, second dataset. That other, second dataset has daily predictors and a time series structure relevant for forecasting things in the primary balanced panel dataset.

      The point isn't to balance the primary panel for the purpose of directly estimating something on the panel. I only note that the primary panel dataset is balanced to further simplify the data munging problem I describe.

      Comment


      • #4
        Michael:
        thanks for clarifying.
        You may want to consider something along the line of the following toy-example:
        Code:
        . set obs 10
        number of observations (_N) was 0, now 10
        
        . g X=1 in 1/7
        (3 missing values generated)
        
        . g Y=1 in 1/6
        (4 missing values generated)
        
        . g Z=1 in 1/4
        (6 missing values generated)
        
        . foreach var of varlist X-Z {
          2. replace `var'=0 if `var'==.
          3.  }
        (3 real changes made)
        (4 real changes made)
        (6 real changes made)
        
        . list
        
             +-----------+
             | X   Y   Z |
             |-----------|
          1. | 1   1   1 |
          2. | 1   1   1 |
          3. | 1   1   1 |
          4. | 1   1   1 |
          5. | 1   1   0 |
             |-----------|
          6. | 1   1   0 |
          7. | 1   0   0 |
          8. | 0   0   0 |
          9. | 0   0   0 |
         10. | 0   0   0 |
             +-----------+
        
        .
        See also -ipolate-, just in case.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you for this helpful example Carlo.
          My challenge is to do something like what you have done for each set of units from the primary dataset.

          So, in your example, I would have units A, B, C, and say, two prespecified dates D1 and D2. I want to be able to copy the list of units from the primary dataset, create an observation at regular intervals (daily for concreteness) between dates D1 and D2, and then set the values of those observations to 0.

          Surprisingly, I've never seen something like this done on the forum but I think this must be a common enough task that I thought it would be worth asking.

          Comment


          • #6
            The way I usually approach this situation is to start by creating a framework data set that contains only the unit-identifier and the time variable, fully balanced. For example:

            Code:
            clear*
            
            local D1 td(1jan2019)
            local D2 td(31dec2019)
            
            set obs 3
            gen unit = "A" in 1
            replace unit = "B" in 2
            replace unit = "C" in 3
            
            expand `D2' - `D1' + 1
            by unit, sort: gen date = `D1' + _n - 1
            format date %td
            Then I -merge- the data into that. The _merge variable created by that process gives me a handle on which observations were in the data and which ones remain open. You can replace the values of the variable by zero if you want to, but I suspect that for many purposes it won't even be necessary to do that as you can easily distinguish those observations by _merge == 2.

            I find this approach much simpler than trying to stick extra observations into an existing data set.

            Added: Assuming that in real life you have more than three units, and more than you would care to type a line of code each for, you can replace the -set obs 3- through -replace unit = "C" in 3- commands with:

            Code:
            use unit using my_data_set
            duplicates drop
            Last edited by Clyde Schechter; 08 Aug 2020, 12:31.

            Comment


            • #7
              Clyde:

              This was really exactly what I was looking for. For completeness, and future (self-)reference I would add that I would also want to append the original dataset and then use carryforward.

              Code:
              append using my_data_set
              g negdate = -date
              bysort unit (negdate): carryforward X Y Z, replace back
              save data_set
              Thank you.

              Comment

              Working...
              X