Hello everyone,
I'm trying to estimate the lagged effect of my independent variable on my dependent variable. My dataset is a cross-country panel dataset with different firms and years.
Please note that the independent variable is a country-level variable which varies over time (but doesn't vary with the firm_id).
I generated the lag of my independent variable in two ways.
Method 1: using subscripts
bysort ncountry: gen ind_sup = ind[_n-1]
Method 2: using time series operator
xtset firm_id year
gen ind_time = L1.ind
Following is a summary of the dataset.
CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str62 country double firm_id float year double ind float(ind_sup ind_time)
"Afghanistan" 1101290 2005 .32105717500000003 . .
"Afghanistan" 1101290 2006 .15308987500000001 .3210572 .3210572
"Afghanistan" 1101290 2007 .025807775 .1530899 .1530899
"Afghanistan" 1101290 2008 .151470875 .025807776 .025807776
"Afghanistan" 1101290 2009 .051024324999999995 .15147087 .15147087
"Afghanistan" 1101290 2010 .02869605 .05102433 .05102433
"Afghanistan" 1101290 2011 .0571796 .02869605 .02869605
"Afghanistan" 1101290 2012 .25967055 .0571796 .0571796
"Afghanistan" 1101290 2013 .5672399 .25967056 .25967056
"Afghanistan" 1101290 2014 .54212275 .5672399 .5672399
"Afghanistan" 1101290 2015 .377781225 .5421227 .5421227
"Afghanistan" 1101290 2016 .33739715000000003 .3777812 .3777812
"Afghanistan" 1101290 2017 .369859225 .3373972 .3373972
"Afghanistan" 1101290 2018 .174369825 .3698592 .3698592
"Afghanistan" 1101290 2019 .25233217500000005 .17436983 .17436983
"Afghanistan" 1101290 2020 .17920345 .25233218 .25233218
"Afghanistan" 1134770 2005 .32105717500000003 .17920345 .
"Afghanistan" 1134770 2006 .15308987500000001 .3210572 .3210572
"Afghanistan" 1134770 2007 .025807775 .1530899 .1530899
"Afghanistan" 1134770 2008 .151470875 .025807776 .025807776
"Afghanistan" 1134770 2009 .051024324999999995 .15147087 .15147087
"Afghanistan" 1134770 2010 .02869605 .05102433 .05102433
"Afghanistan" 1134770 2011 .0571796 .02869605 .02869605
"Afghanistan" 1134770 2012 .25967055 .0571796 .0571796
"Afghanistan" 1134770 2013 .5672399 .25967056 .25967056
"Afghanistan" 1134770 2014 .54212275 .5672399 .5672399
"Afghanistan" 1134770 2015 .377781225 .5421227 .5421227
"Afghanistan" 1134770 2016 .33739715000000003 .3777812 .3777812
"Afghanistan" 1134770 2017 .369859225 .3373972 .3373972
"Afghanistan" 1134770 2018 .174369825 .3698592 .3698592
"Afghanistan" 1134770 2019 .25233217500000005 .17436983 .17436983
"Afghanistan" 1134770 2020 .17920345 .25233218 .25233218
"Afghanistan" 1183990 2007 .025807775 .17920345 .
"Afghanistan" 1183990 2008 .151470875 .025807776 .025807776
"Afghanistan" 1183990 2009 .051024324999999995 .15147087 .15147087
end
[/CODE]
As shown in the above data, if you have a look at the bolded row (row 33), the lagged value of the independent variable generated using subscripts method for year 2007 is 0.17920345 (which is the value of the independent variable in 2020, as seen in the row above) while the lagged value of the independent variable generated using time series operator shows a missing value. However, the correct value should be the value for year 2006 which is denoted in red colour font (0.15308). Thus, neither of the generated lag variables give the correct value when the years are not in consecutive order.
Can someone please let me know the correct code to generate the correct lagged values of the independent variable?
Thank you.
I'm trying to estimate the lagged effect of my independent variable on my dependent variable. My dataset is a cross-country panel dataset with different firms and years.
Please note that the independent variable is a country-level variable which varies over time (but doesn't vary with the firm_id).
I generated the lag of my independent variable in two ways.
Method 1: using subscripts
bysort ncountry: gen ind_sup = ind[_n-1]
Method 2: using time series operator
xtset firm_id year
gen ind_time = L1.ind
Following is a summary of the dataset.
CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str62 country double firm_id float year double ind float(ind_sup ind_time)
"Afghanistan" 1101290 2005 .32105717500000003 . .
"Afghanistan" 1101290 2006 .15308987500000001 .3210572 .3210572
"Afghanistan" 1101290 2007 .025807775 .1530899 .1530899
"Afghanistan" 1101290 2008 .151470875 .025807776 .025807776
"Afghanistan" 1101290 2009 .051024324999999995 .15147087 .15147087
"Afghanistan" 1101290 2010 .02869605 .05102433 .05102433
"Afghanistan" 1101290 2011 .0571796 .02869605 .02869605
"Afghanistan" 1101290 2012 .25967055 .0571796 .0571796
"Afghanistan" 1101290 2013 .5672399 .25967056 .25967056
"Afghanistan" 1101290 2014 .54212275 .5672399 .5672399
"Afghanistan" 1101290 2015 .377781225 .5421227 .5421227
"Afghanistan" 1101290 2016 .33739715000000003 .3777812 .3777812
"Afghanistan" 1101290 2017 .369859225 .3373972 .3373972
"Afghanistan" 1101290 2018 .174369825 .3698592 .3698592
"Afghanistan" 1101290 2019 .25233217500000005 .17436983 .17436983
"Afghanistan" 1101290 2020 .17920345 .25233218 .25233218
"Afghanistan" 1134770 2005 .32105717500000003 .17920345 .
"Afghanistan" 1134770 2006 .15308987500000001 .3210572 .3210572
"Afghanistan" 1134770 2007 .025807775 .1530899 .1530899
"Afghanistan" 1134770 2008 .151470875 .025807776 .025807776
"Afghanistan" 1134770 2009 .051024324999999995 .15147087 .15147087
"Afghanistan" 1134770 2010 .02869605 .05102433 .05102433
"Afghanistan" 1134770 2011 .0571796 .02869605 .02869605
"Afghanistan" 1134770 2012 .25967055 .0571796 .0571796
"Afghanistan" 1134770 2013 .5672399 .25967056 .25967056
"Afghanistan" 1134770 2014 .54212275 .5672399 .5672399
"Afghanistan" 1134770 2015 .377781225 .5421227 .5421227
"Afghanistan" 1134770 2016 .33739715000000003 .3777812 .3777812
"Afghanistan" 1134770 2017 .369859225 .3373972 .3373972
"Afghanistan" 1134770 2018 .174369825 .3698592 .3698592
"Afghanistan" 1134770 2019 .25233217500000005 .17436983 .17436983
"Afghanistan" 1134770 2020 .17920345 .25233218 .25233218
"Afghanistan" 1183990 2007 .025807775 .17920345 .
"Afghanistan" 1183990 2008 .151470875 .025807776 .025807776
"Afghanistan" 1183990 2009 .051024324999999995 .15147087 .15147087
end
[/CODE]
As shown in the above data, if you have a look at the bolded row (row 33), the lagged value of the independent variable generated using subscripts method for year 2007 is 0.17920345 (which is the value of the independent variable in 2020, as seen in the row above) while the lagged value of the independent variable generated using time series operator shows a missing value. However, the correct value should be the value for year 2006 which is denoted in red colour font (0.15308). Thus, neither of the generated lag variables give the correct value when the years are not in consecutive order.
Can someone please let me know the correct code to generate the correct lagged values of the independent variable?
Thank you.
Comment