Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regressions by industry-year and keeping coefficient estimates for each industry-year regression with the same variable name

    Hi


    I am trying to run a regression for each industry-year group within a data set, and then generate variables of the estimated coefficients. My two current code attempts are below:


    Code:
    forval y=2002/2021 {
           forval  i= 1/97 {
            di "Year = `y' and sic2 = `i'"
            count if sic2 == `i' & Year == `y' & !missing(yy, x1, x2)
            if r(N) > 10 {
            regress yy x1 x2 if  sic2== `i' & Year==`y'
            foreach v of varlist x1 x2 {
            gen b_`v' = _b[`v']
       }
            }
    }
    
    }

    On the second industry-year combo, this produces an error that the new variable b_x1 is already defined. What I am trying to do is to essentially get this for each industry-year combo. I guess it may need a nested loop to capture the fact it is for each industry-year group .. ?


    Other code I had tried was using -runby-:

    Code:
    capture program drop myregress
    program define myregress
        regress y x1 x2
        foreach v of varlist x1 x2 {
            gen b_`v' = _b[`v']
       }
       gen n_obs = e(N)
       exit
    end
    
    runby myregress, by(sic2 Year)

    The issue in this case was that I only want to run the regression for industry-years where I have at least 10 observations (of y x1 x2) in the regression. I couldn't seem to get a way to only run for industry-years with at least 10 observations in the regression. Instead, it ran for all and removed some observations from my data set. It feels like there should be a simple fix for this, but I am yet to find one which does the trick and feel like I've been going round in circles..



    Data sample below (although very small relative to the full data set):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int Year byte sic2 float(id y x1 x2)
    2001 62 1           .             .          .
    2002 62 1           .             .          .
    2003 62 1           .             .          .
    2004 62 1           .             .          .
    2005 62 1           .             .          .
    2006 62 1           .             .          .
    2007 62 1           .             .          .
    2008 62 1           .             .          .
    2009 62 1           .             .          .
    2010 62 1           .             .          .
    2011 62 1           .             .          .
    2012 62 1 -.015869012  .00001050928 -.09859806
    2013 62 1  -.13666852  9.380775e-06          .
    2014 62 1  -.06423958 .000013209865          .
    2015 62 1  -.20667164  .00001164741  .09992313
    2016 62 1  .005849376 3.9231227e-06 -.01027858
    2017 62 1           .  4.980378e-06          .
    2018 62 1           .             .          .
    2019 62 1           .             .          .
    2020 62 1           .             .          .
    2021 62 1           .             .          .
    2001 35 2           .             .          .
    2002 35 2           .             .          .
    2003 35 2           .             .          .
    2004 35 2           .             .          .
    2005 35 2           .             .          .
    2006 35 2           .             .          .
    2007 35 2           .             .          .
    2008 35 2           .             .          .
    2009 35 2           .             .          .
    2010 35 2  -.08069245  9.844865e-07 -.08591416
    2011 35 2   -.0487175  1.043447e-06  -.1507656
    2012 35 2           . 1.2754988e-06          .
    2013 35 2           .             .          .
    2014 35 2           .             .          .
    2015 35 2           .             .          .
    2016 35 2           .             .          .
    2017 35 2           .             .          .
    2018 35 2           .             .          .
    2019 35 2           .             .          .
    2020 35 2           .             .          .
    2021 35 2           .             .          .
    2001 36 3           .             .          .
    2002 36 3           .             .          .
    2003 36 3           .             .          .
    2004 36 3           .             .          .
    2005 36 3           .             .          .
    2006 36 3           .             .          .
    2007 36 3           .             .          .
    2008 36 3           .             .          .
    2009 36 3           .             .          .
    2010 36 3           .             .          .
    2011 36 3           .             .          .
    2012 36 3           .  .00013303179          .
    2013 36 3           .             .          .
    2014 36 3           .             .          .
    2015 36 3           .             .          .
    2016 36 3           .             .          .
    2017 36 3           .             .          .
    2018 36 3           .             .          .
    2019 36 3           .             .          .
    2020 36 3           .             .          .
    2021 36 3           .             .          .
    2001 73 4           .             .          .
    2002 73 4           .             .          .
    2003 73 4           .             .          .
    2004 73 4           .             .          .
    2005 73 4           .             .          .
    2006 73 4           .             .          .
    2007 73 4           .             .          .
    2008 73 4           .             .          .
    2009 73 4           .             .          .
    2010 73 4           .             .          .
    2011 73 4 -.034475815  8.390318e-06  .08454923
    2012 73 4           .  7.091846e-06          .
    2013 73 4           .             .          .
    2014 73 4           .             .          .
    2015 73 4           .             .          .
    2016 73 4           .             .          .
    2017 73 4           .             .          .
    2018 73 4           .             .          .
    2019 73 4           .             .          .
    2020 73 4           .             .          .
    2021 73 4           .             .          .
    2001 14 5           .             .          .
    2002 14 5           .             .          .
    2003 14 5           .             .          .
    2004 14 5           .             .          .
    2005 14 5           .             .          .
    2006 14 5           .             .          .
    2007 14 5           .             .          .
    2008 14 5           .             .          .
    2009 14 5           .             .          .
    2010 14 5           .             .          .
    2011 14 5           .             .          .
    2012 14 5  .009635898 1.6602168e-06   .1071836
    2013 14 5  .011130786 1.5824262e-06  .09883676
    2014 14 5  -.01621402  1.521586e-06 .017382598
    2015 14 5  .002984429  1.556011e-06 -.00879613
    2016 14 5  -.06915921  1.202309e-06  .06596348
    end


    Any advice is much appreciated.

    Paul

  • #2
    Code:
    capture program drop myregress
    program define myregress
        count if !missing(y, x1, x2)
        if r(N) >= 10 {
            regress y x1 x2
            foreach v of varlist x1 x2 {
                gen b_`v' = _b[`v']
           }
           gen n_obs = e(N)
       }
       exit
    end
    
    runby myregress, by(sic2 Year)
    should do what you want. The regressions will only be done where there are 10 observations available (which, by the way, never happens in your example data). And no observations for sic2-year combinations that lack 10 available observations will be deleted: they will just show missing values for the b_* variables and n_obs.

    Comment


    • #3
      Works perfectly Clyde - many thanks. (and yes, the data example probably wasn't the best in hindsight - but I can confirm it does work on my full sample)

      Comment

      Working...
      X