Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a lag variable in unbalanced dataset

    Hello everybody,

    I have an unbalanced dataset, and I want to lag my Independent Variable "No_Deals" (number of deals). For this, I used the following code:

    Code:
    bysort gvkey (fyear): generate N_Deals_lag2=No_Deals[_n+2]
    I want the Independent Variable to be lagged by two years. But my dataset shows that the lag is shifted by "two rows". This can cause a lag from the year 2010 to 2005 due to missing observations.
    Is it possible that a lag is made when it is possible to lag back two years?
    In the output shown below, the values are unfortunately not always exactly correct (see for the example year 2005 on 2003).

    Hopefully, the question is understandable.
    Thank you!
    Peter


    HTML Code:
    gvkey    fyear    No_Deals    N_Deals_lag2
    001078    2002    1               2
    001078    2003    1               1
    001078    2005    2               2
    001078    2006    1               1
    001078    2007    2               2
    001078    2008    1               2
    001078    2010    2               1
    001078    2011    2               3
    001078    2014    1               3
    001078    2015    3               7
    001078    2016    3              14
    001078    2017    7               1
    001078    2018    14              .
    001078    2019    1               .  

  • #2
    Code:
    xtset gvkey fyear
    gen want = F2.No_Deals

    Comment


    • #3
      #1 seems to be best interpreted as your meaning lead (future values) not lag (past values). .

      #2 is helpful. If the OP really means lag, the operator needed is L not F.

      Comment


      • #4
        #2 Thank you!

        #3 Thank you very much!
        Nick, I am not sure how you mean your statement. Would you be so kind and explain it again please?
        Background information: the assumption was that CVC activities are moved to the back as they are supposed to measure the impact on innovation performance (innovation performance is measured as patent output).

        Since it is unrealistic that activities immediately lead to patent output, this should be taken into account by the lag.
        Last edited by sladmin; 17 May 2022, 10:29. Reason: anonymize original poster

        Comment


        • #5
          My point is about language to describe time series operators. See

          Code:
          help varlist
          for Stata's jargon, which is standard beyond Stata too, at least in my experience.

          For time series operators, lag means past values, lead means future values. If you want to regard a future value as having a negative lag, that is fine by me.

          As a point of English usage, an expression such as "lead to" means perhaps imply or produce or yield, each of which is rather different.

          Comment

          Working...
          X