Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate residuals by industry and year

    Dear Statalist users,

    Assume that we have firm-level data and we want to calculate the residuals from a regression by industry and year. For example, assume that we have two years of data, 2000 and 2001, and two industries, 1 and 2.

    For example, assume that the data looks like this.

    Code:
    input firm_id year group y x
    1    2000    1    .    0.587681712
    2    2000    1    0.220527045    0.020397402
    3    2000    1    0.598266595    0.276305834
    4    2000    1    0.848473642    0.503412603
    5    2000    1    0.566881707    0.143577444
    6    2000    1    0.69872904    .
    1    2001    1    0.581496724    0.952147679
    2    2001    1    0.447513514    0.547753335
    3    2001    1    0.492024424    0.380500378
    4    2001    1    0.913852189    0.396933955
    5    2001    1    0.181215711    0.220948854
    6    2001    1    0.393435702    0.974829582
    7    2000    2    0.035029052    0.080399976
    8    2000    2    0.552878997    0.163354383
    9    2000    2    0.55373046    0.543578162
    10    2000    2    0.272902519    0.870706828
    11    2000    2    0.700316363    0.262667598
    12    2000    2    0.485204026    0.970839238
    6    2001    2    0.238687785    0.488399578
    7    2001    2    0.844819818    0.849078286
    8    2001    2    0.139093221    0.73683734
    9    2001    2    0.397981489    0.503380686
    10    2001    2    0.127906763    0.954909727
    11    2001    2    0.118464559    0.656839917
    12    2001    2    0.608098688    0.304986828
    end
    A simple regression should be run for each year and group, and as long as that group-year pair has at least 5 observations, a column should be generated with the residuals. For example, for group 1 and year 2000, since we have only 4 observations, the residuals should not be calculated.

    I reckon that a double loop might be needed here, but with I am stuck a bit.

    How would one solve this problem?

  • #2
    There are several ways. The following is relatively slow to execute, though that won't be a real problem unless your data set is huge, but displays the logic most clearly, and uses only official Stata commands:
    Code:
    gen residuals = .
    levelsof year, local(years)
    foreach y of local years {
        levelsof group if year == `y', local(groups)
        foreach g of local groups {
            quietly count if group == `g' & year == `y' & !missing(x, y)
            if r(N) >= 5 {
                regress y x if group == `g' & year == `y'
                predict resid, resid
                replace residuals = resid if group == `g' & year == `y'
                drop resid
            }
        }
    }

    Comment


    • #3
      Thank you Clyde!

      Comment

      Working...
      X