Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lag variable with gaps in timevar

    Hi,

    I am currently using an unbalanced panel with gaps in the timevar (it's a panel on parties in cabinets, with timevar the start_date of the cabinet recorded in %td, hence I have no consecutive timevar observation, they are all separated by gaps). I would like to create a simple lag, but it keeps generating all missing values. Is there a way to lag with respect to the previous, non missing value available?

    Thanks,
    Federico

  • #2
    What defines a panel here? It's perfectly possible to create your own pseudo-time variable as here, so each time occurs just once for each panel,

    Code:
    clear 
    input id time 
    1 2 
    1 3 
    1 5 
    2 7
    2 11
    2 13
    end 
    
    bysort id (time) : gen counter = _n 
    
    tsset id counter 
    
    list, sepby(id)
    -- but there is a price to be paid for this in ignoring gaps and even asynchrony of panels.

    Comment


    • #3
      Thanks, that could be a way out. I was hoping to do that without losing the (proper) time dimension. Do you think there is no way to lag back to the last non missing value (ignoring somehow the gaps)?

      Comment


      • #4
        You can always refer to the last non-missing value using subscripting [_n-1] If you have missing values, see also http://www.stata.com/support/faqs/da...issing-values/





        Comment


        • #5
          With xtset, the lag operator won't lag across panels. If you go the [_n-1] etc route, you can lag across panels, so you'll probably want to add an if panel==panel[_n-1] condition.

          Comment


          • #6
            Code:
            bysort id (counter) : gen change = whatever - whatever[_n-1]
            is a common pattern to insist on all comparisons being within the same panel.

            Comment


            • #7
              Hi Nick, please what is the difference between creating manually a lagged variable by doing this bmilag1 = bmi[_n-1] and use this opeartor L1.bmi ?

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Code:
                bysort id (counter) : gen change = whatever - whatever[_n-1]
                is a common pattern to insist on all comparisons being within the same panel.
                Hi Nick Cox , please what is the difference between creating manually a lagged variable by doing this bmilag1 = bmi[_n-1] and the use this opeartor L1.bmi ?

                Comment


                • #9
                  It's the difference between the previous value and the value one time unit before, which is no difference if there are no gaps.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    It's the difference between the previous value and the value one time unit before, which is no difference if there are no gaps.
                    Thank you Nick Cox

                    Comment


                    • #11
                      Nick Cox

                      I have a similar problem: unbalanced panel dataset, T = 120, n = 3.

                      There is only one variable, say X, that I need it in lagged values. This has the same value for all n, at each t.

                      Therefore, it's possible to calculate lags in each n.

                      However, since n =2, 3 have gaps, I lose about half of observations.

                      Is it possible to calculate the lagged value of X for n 2, 3, even if there are gaps?

                      I guess I can do it manually in Excel, but that would take a lot of time.

                      Is there ANY way to do this in stata?

                      Comment


                      • #12
                        Sorry, but I can't follow #11 at all. Please give a data example.

                        Comment


                        • #13
                          Year Order Age AEA
                          2010 First 69 10
                          2010 Second 58 10
                          2010 Third 71 10
                          2011 First 59 15
                          2011 Second 65 15
                          2012 First 62 20
                          2012 Second 62 20
                          2013 First 61 25
                          2013 Second 60 25
                          2013 Third 60 25
                          2014 First 59 30
                          2015 First 58 35
                          2016 First 57 40
                          2016 Second 57 40
                          2017 First 56 45
                          2018 First 55 50
                          2018 Second 55 50
                          2019 First 54 55
                          2019 Second 53 55
                          2019 Third 53 55
                          Random values. The dataset refers to the age of winners in an prize where every year can be 1 to 3 winners.

                          Therefore time variable (t) = Year. Cross-section variable (n) = Order.

                          Say I want to estimate the following panel data model using the data above:

                          Age n, t = a + b * AEA n, t-1

                          t = year
                          n = order

                          As it is, using the lag operator loses a lot of observations because e.g. for observations where order = second and there was not a second observation in the previous year (e.g. 2018 and 2017) then there would be a missing observation. AEA 2, 2017 = .

                          In that manner, I lose a lot of observations.

                          I ask if there is a way for Stata to register AEA 2, 2017 = 45.



                          Comment


                          • #14
                            Thanks for the data example. Treating this even as panel data seems a real stretch, but if there was no awardee in a given year, then there is nothing to impute, interpolate, infer, or estimate.

                            Comment


                            • #15
                              Thank you for the reply.

                              There is always at least an awardee in every year.

                              I treat this as an unbalanced panel data set, that can be either modelled as a random effects model or a pooled OLS regression (I will use.Breusch-Pagan and Hausman tests for the final selection).

                              However, no matter what model I would choose, the problem of losing observations will remain: I would always lose observations if use the lagged value of AEA.

                              What I want is for lagged value of AEA to be recorded also for observations of second and third. awardee, not just the first (in every year).

                              Is it possible to that in Stata? Or do I have to compute them manually in Excel?

                              Comment

                              Working...
                              X