Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lagged variable - time series operator vs subscript

    Hello everyone,

    I'm trying to estimate the lagged effect of my independent variable on my dependent variable. My dataset is a cross-country panel dataset with different firms and years.

    Please note that the independent variable is a country-level variable which varies over time (but doesn't vary with the firm_id).

    I generated the lag of my independent variable in two ways.

    Method 1: using subscripts

    bysort ncountry: gen ind_sup = ind[_n-1]

    Method 2: using time series operator

    xtset firm_id year

    gen ind_time = L1.ind

    Following is a summary of the dataset.

    CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str62 country double firm_id float year double ind float(ind_sup ind_time)
    "Afghanistan" 1101290 2005 .32105717500000003 . .
    "Afghanistan" 1101290 2006 .15308987500000001 .3210572 .3210572
    "Afghanistan" 1101290 2007 .025807775 .1530899 .1530899
    "Afghanistan" 1101290 2008 .151470875 .025807776 .025807776
    "Afghanistan" 1101290 2009 .051024324999999995 .15147087 .15147087
    "Afghanistan" 1101290 2010 .02869605 .05102433 .05102433
    "Afghanistan" 1101290 2011 .0571796 .02869605 .02869605
    "Afghanistan" 1101290 2012 .25967055 .0571796 .0571796
    "Afghanistan" 1101290 2013 .5672399 .25967056 .25967056
    "Afghanistan" 1101290 2014 .54212275 .5672399 .5672399
    "Afghanistan" 1101290 2015 .377781225 .5421227 .5421227
    "Afghanistan" 1101290 2016 .33739715000000003 .3777812 .3777812
    "Afghanistan" 1101290 2017 .369859225 .3373972 .3373972
    "Afghanistan" 1101290 2018 .174369825 .3698592 .3698592
    "Afghanistan" 1101290 2019 .25233217500000005 .17436983 .17436983
    "Afghanistan" 1101290 2020 .17920345 .25233218 .25233218
    "Afghanistan" 1134770 2005 .32105717500000003 .17920345 .
    "Afghanistan" 1134770 2006 .15308987500000001 .3210572 .3210572
    "Afghanistan" 1134770 2007 .025807775 .1530899 .1530899
    "Afghanistan" 1134770 2008 .151470875 .025807776 .025807776
    "Afghanistan" 1134770 2009 .051024324999999995 .15147087 .15147087
    "Afghanistan" 1134770 2010 .02869605 .05102433 .05102433
    "Afghanistan" 1134770 2011 .0571796 .02869605 .02869605
    "Afghanistan" 1134770 2012 .25967055 .0571796 .0571796
    "Afghanistan" 1134770 2013 .5672399 .25967056 .25967056
    "Afghanistan" 1134770 2014 .54212275 .5672399 .5672399
    "Afghanistan" 1134770 2015 .377781225 .5421227 .5421227
    "Afghanistan" 1134770 2016 .33739715000000003 .3777812 .3777812
    "Afghanistan" 1134770 2017 .369859225 .3373972 .3373972
    "Afghanistan" 1134770 2018 .174369825 .3698592 .3698592
    "Afghanistan" 1134770 2019 .25233217500000005 .17436983 .17436983
    "Afghanistan" 1134770 2020 .17920345 .25233218 .25233218
    "Afghanistan" 1183990 2007 .025807775 .17920345 .
    "Afghanistan" 1183990 2008 .151470875 .025807776 .025807776
    "Afghanistan" 1183990 2009 .051024324999999995 .15147087 .15147087
    end
    [/CODE]

    As shown in the above data, if you have a look at the bolded row (row 33), the lagged value of the independent variable generated using subscripts method for year 2007 is 0.17920345 (which is the value of the independent variable in 2020, as seen in the row above) while the lagged value of the independent variable generated using time series operator shows a missing value. However, the correct value should be the value for year 2006 which is denoted in red colour font (0.15308). Thus, neither of the generated lag variables give the correct value when the years are not in consecutive order.

    Can someone please let me know the correct code to generate the correct lagged values of the independent variable?

    Thank you.

  • #2
    Your panel is identified by firm, so country is irrelevant here. With

    xtset firm_id year
    the first lag of variable V corresponding to firm X in 2007 is the value of V for firm X in 2006. Thus

    Code:
    g lagV= L.V
    is equivalent to

    Code:
    bys firm (year): g lagV= V[_n-1]
    if and only if the panel is balanced (no holes). To ensure this, you first need

    Code:
    xtset firm_id year
    tsfill
    Last edited by Andrew Musau; 06 Jul 2022, 05:19.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Your panel is identified by firm, so country is irrelevant here. With



      the first lag of variable V corresponding to firm X in 2007 is the value of V for firm X in 2006. Thus

      Code:
      g lagV= L.V
      is equivalent to

      Code:
      bys firm (year): g lagV= V[_n-1]
      if and only if the panel is balanced (no holes). To ensure this, you first need

      Code:
      xtset firm_id year
      tsfill
      Andrew, thanks for your reply.

      Variable 'V' is a country-level variable, therefore the first lag of variable V corresponding to firm X in 2007 can be the value of V not only for firm X in 2006, but also the value for any firm in 2006 located in country 'Y'.

      It isn't possible through either of the above codes.

      Thanks.


      Comment


      • #4
        Ama, you may try the code below.

        Code:
        bys country (year): gen ind_lag = ind[_n-1] if year-year[_n-1]==1
        bys country year (ind_lag): replace ind_lag = ind_lag[1]
        sort country firm_id year

        Comment


        • #5
          Originally posted by Fei Wang View Post
          Ama, you may try the code below.

          Code:
          bys country (year): gen ind_lag = ind[_n-1] if year-year[_n-1]==1
          bys country year (ind_lag): replace ind_lag = ind_lag[1]
          sort country firm_id year
          This worked perfectly. Thanks a lot Fei Wang

          Comment


          • #6
            Hello,

            I am using Stata 14. The data type of my research is panel data (unbalanced); the time period is 22 years; I have annual data for 5084 firms. My model includes 10 explanatory variables (X1, X2, …, X10) where nine of these 10 explanatory variables are lagged one year, while only one explanatory variable is in the current time/present period (i.e., not lagged. It is in time t).

            Therefore, I kindly ask you please about the command I should use to express the lagged explanatory variables in my regression (where all the explanatory variables are lagged except one explanatory variable).

            Thank you in advance.

            Comment


            • #7
              See

              Code:
              help tsvarlist
              The lag operator is "L." and can be used once your data is tsset: see

              Code:
              help tsset

              So in the Grunfeld dataset, if I wanted to take the first lag of capital stock and time, but not market value:

              Code:
              webuse grunfeld, clear
              tsset company year
              xtreg invest L.(kstock time) mvalue, fe
              Res.:

              Code:
              . xtreg invest L.(kstock time) mvalue, fe
              
              Fixed-effects (within) regression               Number of obs     =        190
              Group variable: company                         Number of groups  =         10
              
              R-sq:                                           Obs per group:
                   within  = 0.7169                                         min =         19
                   between = 0.8140                                         avg =       19.0
                   overall = 0.7882                                         max =         19
              
                                                              F(3,177)          =     149.38
              corr(u_i, Xb)  = -0.3436                        Prob > F          =     0.0000
              
              ------------------------------------------------------------------------------
                    invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                    kstock |
                       L1. |   .3825641   .0299213    12.79   0.000     .3235156    .4416125
                           |
                      time |
                       L1. |  -2.248589   1.049511    -2.14   0.034    -4.319754    -.177424
                           |
                    mvalue |   .1246594   .0136119     9.16   0.000     .0977969     .151522
                     _cons |  -63.16327   15.98739    -3.95   0.000    -94.71371   -31.61284
              -------------+----------------------------------------------------------------
                   sigma_u |  94.772595
                   sigma_e |   57.99268
                       rho |   .7275697   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              F test that all u_i=0: F(9, 177) = 41.06                     Prob > F = 0.0000
              
              .

              Comment


              • #8
                Dear Professor Andrew,
                Thank you for your reply.

                To check my understanding, you mean that I need to type in Stata the following:

                xtreg y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, fe

                Given y is the dependent variable and my model includes 10 explanatory variables where 9 of these 10 explanatory variables are lagged one year, while only one explanatory variable is in the current time/present period (i.e., not lagged. It is in time t). As the command includes the lag operator "L." for the nine explanatory variables (x1, x2, x3, x4, x5, x6, x7, x8, x9), Stata will regress y on the lagged values of these nine explanatory variables, but on x10 in the current time.
                Is my understanding correct?

                My second question is: what is the command to get the second lag of a variable i.e., to lag a variable two periods?

                My third question is: is there any difference between the different commands of the lagged variable? If so, which is the best command of the lagged variable?

                Your help and cooperation are highly appreciated.

                Comment


                • #9
                  Originally posted by Zainab Mariam View Post
                  To check my understanding, you mean that I need to type in Stata the following:

                  xtreg y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, fe

                  Given y is the dependent variable and my model includes 10 explanatory variables where 9 of these 10 explanatory variables are lagged one year, while only one explanatory variable is in the current time/present period (i.e., not lagged. It is in time t). As the command includes the lag operator "L." for the nine explanatory variables (x1, x2, x3, x4, x5, x6, x7, x8, x9), Stata will regress y on the lagged values of these nine explanatory variables, but on x10 in the current time.
                  Is my understanding correct?
                  The assumption is that all variables are in levels, but you want to regress the first lag of each of x1-x9 and the level of x10 on y. In such a case, your understanding is correct.


                  My second question is: what is the command to get the second lag of a variable i.e., to lag a variable two periods?
                  Code:
                  L2.var
                  is the 2nd lag of the variable "var". "L3.var" is the 3rd lag of "var", and so on.


                  My third question is: is there any difference between the different commands of the lagged variable? If so, which is the best command of the lagged variable?
                  The only requirement with lagging is that there is a time dimension. So the data should be time-series or a panel. The estimator will depend on other considerations, not the lagging per se. Usually, lagging variables is justified by theory or common sense. A mayor of a city may want to reduce crime and goes about doing so by expanding the city's police force. However, the effect of expanding the police force on crime may not be immediate. In this case, there is a lagged effect and a model that relates these two variables may include a lagged variable for this reason.
                  Last edited by Andrew Musau; 17 Jul 2022, 10:25.

                  Comment


                  • #10
                    Dear Professor Andrew,

                    Thank you for your reply.

                    What I meant by my third question is that I read that there are different commands of the lagged variable. For instance,

                    gen w1 = L1.var
                    gen w2 = var[_n-1]


                    Thus, I asked whether there are any differences between the different commands of the lagged variable, and if so, which command of the lagged variable is the best.

                    I do appreciate your cooperation.

                    Comment


                    • #11
                      Yes, there is a difference between these. Or, more precisely, there can be a difference, depending on your data.

                      If your time series has no time gaps, then these will produce the same results. But if there are time gaps, then L1.var is always the 1 year (or whatever time unit) lag of var, or missing value if the preceding year's data does not exist in the data set, whereas var[_n-1] will be the value of var in the most recent preceding year found in the data, which could be many years earlier if several years of data are missing.

                      Since you might have gaps in your data that you are not aware of, it is safer to use the L1.var notation.

                      As an aside, things like L1.var or var[_n-1] are called expressions, not commands.

                      Comment


                      • #12
                        Dear Professor Clyde,

                        Thank you for your swift reply and for correcting my terms.

                        I do appreciate your cooperation.

                        Comment


                        • #13
                          Originally posted by Fei Wang View Post
                          Ama, you may try the code below.

                          Code:
                          bys country (year): gen ind_lag = ind[_n-1] if year-year[_n-1]==1
                          bys country year (ind_lag): replace ind_lag = ind_lag[1]
                          sort country firm_id year
                          Hi Fei Wang

                          Can you please share with me how to adjust the above code to generate two year lag?

                          Many thanks.

                          Comment

                          Working...
                          X