Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New panel ID variable based on certain coniditions.

    Hello stata community,

    I have a panel dataset over 21 years and many firms. I have a set of regression equations I will be running. I woulld like to ensure that firms which enter into the regressions have at least 4 consecutive years where data for all variables are present. Currently, firms are indentified by a variable "DSCD" and my time variable is "year". If a firm has 15 years of data, but there is a year in the middle with missing values (e.g data is available from 2000 to 2007 and from 2010 to 2018) then I would like to encode the set of observations from 2010 to 2018 as a different firm for the purposes of the regression in a new variable akin to DSCD. I can then use the new identifier along with "year" to -tsset- before I run xtreg and reg. I've created a variable "balancedind" which =1 when all variables necessary for the regessions are present in that firm year.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long DSCD int year float balancedind
    1 1999 .
    1 2000 .
    1 2001 .
    1 2002 .
    1 2003 .
    1 2004 .
    1 2005 .
    1 2006 .
    1 2007 .
    1 2008 .
    1 2009 .
    1 2010 .
    1 2011 .
    1 2012 .
    1 2013 .
    1 2014 .
    1 2015 1
    1 2016 1
    1 2017 1
    1 2018 1
    1 2019 1
    1 2020 .
    1 2021 .
    2 1999 1
    2 2000 .
    2 2001 1
    2 2002 1
    2 2003 1
    2 2004 1
    2 2005 1
    2 2006 1
    2 2007 1
    2 2008 1
    2 2009 1
    2 2010 1
    2 2011 1
    2 2012 1
    2 2013 1
    2 2014 1
    2 2015 1
    2 2016 1
    2 2017 1
    2 2018 1
    2 2019 1
    2 2020 1
    2 2021 .
    3 1999 1
    3 2000 .
    3 2001 1
    3 2002 1
    3 2003 1
    3 2004 1
    3 2005 1
    3 2006 1
    3 2007 1
    3 2008 1
    3 2009 .
    3 2010 .
    3 2011 .
    3 2012 .
    3 2013 .
    3 2014 .
    3 2015 .
    3 2016 .
    3 2017 .
    3 2018 .
    3 2019 .
    3 2020 .
    3 2021 .
    4 1999 .
    4 2000 .
    4 2001 .
    4 2002 .
    4 2003 .
    4 2004 .
    4 2005 .
    4 2006 .
    4 2007 .
    4 2008 .
    4 2009 .
    4 2010 .
    4 2011 .
    4 2012 .
    4 2013 .
    4 2014 .
    4 2015 1
    4 2016 1
    4 2017 1
    4 2018 1
    4 2019 1
    4 2020 1
    4 2021 .
    5 1999 .
    5 2000 .
    5 2001 .
    5 2002 .
    5 2003 .
    5 2004 .
    5 2005 .
    5 2006 .
    end
    label values DSCD DSCD1
    label def DSCD1 1 "130042", modify
    label def DSCD1 2 "130062", modify
    label def DSCD1 3 "130079", modify
    label def DSCD1 4 "130086", modify
    label def DSCD1 5 "130088", modify
    Any guidance would be greatly appreciated.

  • #2
    Perhaps this example code will start you in a useful direction.
    Code:
    by DSCD (balancedind year), sort: gen spell = sum(year!=1+year[_n-1])
    by DSCD spell (year), sort: egen count = total(balancedind!=.)
    decode DSCD, generate(DSCDstr)
    replace DSCDstr = DSCDstr+"_"+strofreal(spell,"%02.0f") if count>=4
    sort DSCD year
    encode DSCDstr, generate(DSCD_S) label(DSCD_Sl)
    drop DSCDstr spell

    Comment


    • #3
      Hi William,

      Thanks for your response. Your code has definitely helped get me going.

      Best wishes,
      Ammar.


      Comment

      Working...
      X