Finding Average OLS Coefficients and Standard Errors of multiple Datasets

Sean Tibay

Join Date: Apr 2021

Posts: 4
#1

Finding Average OLS Coefficients and Standard Errors of multiple Datasets

12 Apr 2021, 16:56

Hello!

I'm currently working with a Dataset with over 330,000 obs. I'd like to take samples of 5000 obs without replacement, and run a regression for each subgroup of data. (66 datasets with 5000 obs each).
Is there a command that can run the 66 regressions and store the Coefficients and Standard Errors so that I can find an average at the end?
I've seen that parmest or parmby can help with that, but I don't know the command.

The regression that I'm working with is:

reg lnwage educ

Thanks,
Sean
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

12 Apr 2021, 17:12

Hello and welcome.

Check out -statsby- at https://www.stata.com/manuals/dstatsby.pdf. It may be what you need.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#3

12 Apr 2021, 17:50

Yes, -statsby- recommended by Ken Chui will work. You might find it easier to use -rangestat-, and given the size of your data set you will also find it faster. For either approach you will first need to create a variable that identifies the 66 samples. Let's call it sample_num. Then all you need to do is:

Code:

rangestat (reg) lnwage educ, by(sample_num) interval(lnwage . .)

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.
Comment
Sean Tibay

Join Date: Apr 2021

Posts: 4
#4

12 Apr 2021, 18:04

Originally posted by Clyde Schechter View Post

Yes, -statsby- recommended by Ken Chui will work. You might find it easier to use -rangestat-, and given the size of your data set you will also find it faster. For either approach you will first need to create a variable that identifies the 66 samples. Let's call it sample_num. Then all you need to do is:

Code:

rangestat (reg) lnwage educ, by(sample_num) interval(lnwage . .)

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

Hello,
I'm finding that -statsby- is working quiet well for me.

Code:

statsby _b[educ] _se[educ] , by(groupnum3) saving(group3OLS): regress lnwage educ

Is there anyway I can include the t-statistic as well as the f-statistic?
I've tried several commands, but can't seem to find the right syntax.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#5

12 Apr 2021, 18:28

The t-statistics for educ,in this regression that contains no other predictors, is just the square root of the regression's F statistic.
Comment
Sean Tibay

Join Date: Apr 2021

Posts: 4
#6

12 Apr 2021, 18:32

Originally posted by Clyde Schechter View Post

The t-statistics for educ,in this regression that contains no other predictors, is just the square root of the regression's F statistic.

Yes. But I'm wondering what the syntax is to include the F statistic or the T statistic within the -statsby- command so that I can compile the averages from all 66 Datasets.
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#7

12 Apr 2021, 18:43

Stored items in the -return- and -ereturn- can be added as is (like e(F)). t-statistics are in a matrix; a more straightforward method would be to export the standard error as well and then loop through a set of division to get the t-stat. Example:

Code:

use "https://www.stata-press.com/data/r16/auto2", clear statsby F_stat=e(F) _b _se , by(foreign) nodots: reg price weight length mpg foreach x in weight length mpg{ gen t_`x' = _b_`x' / _se_`x' }
Comment
Sean Tibay

Join Date: Apr 2021

Posts: 4
#8

12 Apr 2021, 18:46

Originally posted by Ken Chui View Post

Stored items in the -return- and -ereturn- can be added as is (like e(F)). t-statistics are in a matrix; a more straightforward method would be to export the standard error as well and then loop through a set of division to get the t-stat. Example:

Code:

use "https://www.stata-press.com/data/r16/auto2", clear statsby F_stat=e(F) _b _se , by(foreign) nodots: reg price weight length mpg foreach x in weight length mpg{ gen t_`x' = _b_`x' / _se_`x' }

Is there a way to return R^2 as well? through -ereturn- ?
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#9

12 Apr 2021, 18:50

Try run a sample regression model using any data, and then use -ereturn list- to see what are available for you to pick.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#10

12 Apr 2021, 19:15

Also, you can calculate the t-statistic as the coefficient divided by the standard error, and then get the F as the square of that.
Comment

Announcement

Finding Average OLS Coefficients and Standard Errors of multiple Datasets

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment