Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values when generating lagged variable in panel data

    Context:
    I'm using Stata 16.1 on Window 10. I have unbalanced panel data with 52 variables and 953 observations. Variables id, fyear and cf refer to unique identifier, fiscal year and cash flows, respectively. I want to generate a new variable that is equal to cashflow_t -cashflow_t-1 so that I can measure growth in cash flow. Below is a sample of the data set.


    dataex id fyear cf

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id double fyear float cf
    16902008 2008     4650
    16902009 2009     8593
    16902010 2010    14790
    16902011 2011    27329
    16902012 2012    44416
    16902013 2013    42638
    16902014 2014    46476
    16902015 2015    62609
    16902016 2016    53591
    16902017 2017    55006
    16902018 2018    68193
    16902019 2019    65996
    21842008 2008     2142
    21842009 2009     2429
    21842010 2010     2614
    21842011 2011     2581
    21842012 2012     1913
    21842013 2013     1388
    21842014 2014     1970
    21842015 2015     1659
    21842016 2016     1778
    21842017 2017     1818
    21842018 2018     2312
    21842019 2019     2413
    28072008 2008  177.429
    28072009 2009  208.303
    28072010 2010  232.863
    28072011 2011  248.535
    28072012 2012  257.496
    28072013 2013  297.895
    28072014 2014  377.964
    28072015 2015  437.092
    28072016 2016   416.65
    28072017 2017  589.813
    28072018 2018  503.929
    28072019 2019  568.439
    28182008 2008   49.041
    28182009 2009   63.347
    28182010 2010   75.627
    28182011 2011   82.863
    28182012 2012   80.457
    28182013 2013   72.955
    28182014 2014    79.14
    28182015 2015   86.613
    28182016 2016   63.063
    28182017 2017   23.186
    28182018 2018   42.029
    28182019 2019   45.346
    38132008 2008     4906
    38132009 2009     5312
    38132010 2010     5761
    38132011 2011     5926
    38132012 2012     5742
    38132013 2013     5063
    38132014 2014     5687
    38132015 2015     5737
    38132016 2016     5975
    38132017 2017     6034
    38132018 2018     5838
    38132019 2019     6324
    39642008 2008  314.278
    39642009 2009  402.892
    39642010 2010  508.284
    39642011 2011  741.716
    39642012 2012  653.063
    39642013 2013  634.313
    39642014 2014  637.208
    39642015 2015  566.322
    39642016 2016  481.486
    39642017 2017  510.601
    39642018 2018  454.273
    39642019 2019  367.031
    40162008 2008  761.765
    40162009 2009 1041.784
    40162010 2010  1160.05
    40162011 2011     1310
    40162012 2012 1404.644
    40162013 2013 1468.671
    40162014 2014 1509.577
    40162015 2015  1609.95
    40162016 2016 1727.554
    40162017 2017 2068.529
    40162018 2018 2178.296
    40162019 2019 2348.933
    40722008 2008  118.784
    40722009 2009  122.162
    40722010 2010  222.505
    40722011 2011      278
    40722012 2012    316.7
    40722013 2013    408.4
    40722014 2014    386.1
    40722015 2015    377.9
    40722016 2016    653.2
    40722017 2017    876.5
    40722018 2018    522.6
    40722019 2019    197.5
    46112008 2008     2789
    46112009 2009     2491
    46112010 2010     2589
    46112011 2011     2752
    end
    Problem I have
    - I use the following code and get missing value instead of getting the intended new variable.

    Code:
    by id (fyear), sort: gen cfg = cf - cf[_n-1]
    (953 missing values generated)

    Please let me know if my request is not clear and I could provide additional information. Thanks.

  • #2
    You are getting missings because in the data you show you have one observation per id.

    To generate what you want to generate you need to have at least two consecutive years with non missing data on cf for each id.

    Comment


    • #3
      Joro, Thanks for your reply.

      Since my panel data contains firm-level data across 2008 to 2019 period, I used the following codes to generate annual growth rate in cash flow (cf). Not sure if this is the most efficient way, but it seems to work.

      Code:
      . sort id ( fyear)
      
      . gen cf_L1= cf[_n-1]
      (145 missing values generated)
      
      . gen cf_growth = cf - cf_L1
      (157 missing values generated)
      
      . gen cf_growthrate = cf_growth/ cf_L1
      (157 missing values generated)
      
      . browse cf_growthrate

      Comment


      • #4
        What you have done seems correct, I do not see any errors.

        But it is easier if you -tsset- your data, and then you use lag operators.

        Comment


        • #5
          What you have done seems correct, I do not see any errors.

          But it is easier if you -tsset- your data, and then you use lag operators.
          Many thanks, Joro, for your help.

          Comment

          Working...
          X