Predicted values after reg by group

paulvonhippel

Join Date: Apr 2014

Posts: 502
#1

Predicted values after reg by group

14 May 2023, 21:19

In each of 8,000 schools I'd like to regress y on x1 and x2 and then generate the predicted value of y. The slopes and intercept may be different in each school. This is a little tricky, because as far as I can tell the predict command isn't byable. So I can't do this:
by school: reg y x1 x2
by school: predict y_predicted
This works on small datasets:
reg y i.school_id##(c.x1 c.x2)
predict y_predicted
but gets very slow if the number of schools is large. Any other ideas? I'm wondering if one of the fixed-effects commands would do the trick. Many thanks!
Paul
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17724

15 May 2023, 00:57

Paul:
what springs to my mind is:

Code:

use "https://www.stata-press.com/data/r17/nlswork.dta"
. g double predict=.
. forval i=1/5159 {
  2. quietly regress ln_wage age if idcode== `i'
  3. predict fitted, xb
  4. replace predict= fitted if idcode==`i'
  5. drop fitted 
  6.  }
  
. list idcode ln_wage predict if idcode<=2

       +-------------------------------+
       | idcode    ln_wage     predict |
       |-------------------------------|
    1. |      1   1.451214   1.4478402 |
    2. |      1    1.02862   1.5189514 |
    3. |      1   1.589977   1.5900626 |
    4. |      1   1.780273   1.6611738 |
    5. |      1   1.777012   1.8033962 |
       |-------------------------------|
    6. |      1   1.778681   1.9456186 |
    7. |      1   2.493976   2.0167298 |
    8. |      1   2.551715   2.1589522 |
    9. |      1   2.420261   2.3722858 |
   10. |      1   2.614172   2.5145082 |
       |-------------------------------|
   11. |      1   2.536374   2.6567307 |
   12. |      1   2.462927   2.7989531 |
   13. |      2   1.360348   1.4594426 |
   14. |      2   1.206198   1.4868761 |
   15. |      2   1.549883   1.5143095 |
       |-------------------------------|
   16. |      2   1.832581   1.5691764 |
   17. |      2   1.726721   1.6240433 |
   18. |      2    1.68991   1.6514767 |
   19. |      2   1.726964   1.7063437 |
   20. |      2   1.808289   1.7612106 |
       |-------------------------------|
   21. |      2   1.863417    1.788644 |
   22. |      2   1.789367   1.8435109 |
   23. |      2    1.84653   1.8983777 |
   24. |      2   1.856449   1.9532446 |
       +-------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

paulvonhippel

Join Date: Apr 2014

Posts: 502
#3

15 May 2023, 08:03

Thanks Carlo Lazzaro -- that'll do the trick!

Bonus question: each time the if clause is used, Stata has to scan the whole dataset to find the relevant subset of the data, which increases runtime. Is there a way to reduce this?

Last edited by paulvonhippel; 15 May 2023, 08:09.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#4

15 May 2023, 08:07

Paul:
not that I know, unfortunately.

Kind regards,
Carlo
(Stata 19.0)
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#5

15 May 2023, 08:11

"if" qualifiers are slow - the above code has 2 - you can cut that in half by using, e.g., -keep- to keep the relevant portion of your data and then appending all at the end

you may also be able to use the user-written -runby- command which can be found on SSC but I don't know if this will be faster maybe Clyde Schechter (one of the authors) has a comment
1 like
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1058

15 May 2023, 08:20

Command statsby may also work. Perhaps generate the coefficients first, then merge back to the main data and computer the predicted values at one go:

Code:

use "https://www.stata-press.com/data/r17/nlswork.dta", clear
bysort idcode: gen case = _N
drop if case < 3

preserve
statsby _b, clear by(idcode): regress ln_wage age
scalar t2 = c(current_time)
save tempcoef, replace
restore

merge m:1 idcode using tempcoef

gen predict2 = _b_cons + _b_age * age

Announcement

Predicted values after reg by group

Comment

Comment

Comment

Comment

Comment