Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding Average OLS Coefficients and Standard Errors of multiple Datasets

    Hello!

    I'm currently working with a Dataset with over 330,000 obs. I'd like to take samples of 5000 obs without replacement, and run a regression for each subgroup of data. (66 datasets with 5000 obs each).
    Is there a command that can run the 66 regressions and store the Coefficients and Standard Errors so that I can find an average at the end?
    I've seen that parmest or parmby can help with that, but I don't know the command.

    The regression that I'm working with is:

    reg lnwage educ

    Thanks,
    Sean

  • #2
    Hello and welcome.

    Check out -statsby- at https://www.stata.com/manuals/dstatsby.pdf. It may be what you need.

    Comment


    • #3
      Yes, -statsby- recommended by Ken Chui will work. You might find it easier to use -rangestat-, and given the size of your data set you will also find it faster. For either approach you will first need to create a variable that identifies the 66 samples. Let's call it sample_num. Then all you need to do is:

      Code:
      rangestat (reg) lnwage educ, by(sample_num) interval(lnwage . .)
      -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Yes, -statsby- recommended by Ken Chui will work. You might find it easier to use -rangestat-, and given the size of your data set you will also find it faster. For either approach you will first need to create a variable that identifies the 66 samples. Let's call it sample_num. Then all you need to do is:

        Code:
        rangestat (reg) lnwage educ, by(sample_num) interval(lnwage . .)
        -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.
        Hello,
        I'm finding that -statsby- is working quiet well for me.

        Code:

        statsby _b[educ] _se[educ] , by(groupnum3) saving(group3OLS): regress lnwage educ

        Is there anyway I can include the t-statistic as well as the f-statistic?
        I've tried several commands, but can't seem to find the right syntax.

        Comment


        • #5
          The t-statistics for educ,in this regression that contains no other predictors, is just the square root of the regression's F statistic.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            The t-statistics for educ,in this regression that contains no other predictors, is just the square root of the regression's F statistic.
            Yes. But I'm wondering what the syntax is to include the F statistic or the T statistic within the -statsby- command so that I can compile the averages from all 66 Datasets.

            Comment


            • #7
              Stored items in the -return- and -ereturn- can be added as is (like e(F)). t-statistics are in a matrix; a more straightforward method would be to export the standard error as well and then loop through a set of division to get the t-stat. Example:

              Code:
              use "https://www.stata-press.com/data/r16/auto2", clear
              
              statsby F_stat=e(F) _b _se , by(foreign) nodots: reg price weight length mpg
              foreach x in weight length mpg{
                  gen t_`x' = _b_`x' / _se_`x'
              }

              Comment


              • #8
                Originally posted by Ken Chui View Post
                Stored items in the -return- and -ereturn- can be added as is (like e(F)). t-statistics are in a matrix; a more straightforward method would be to export the standard error as well and then loop through a set of division to get the t-stat. Example:

                Code:
                use "https://www.stata-press.com/data/r16/auto2", clear
                
                statsby F_stat=e(F) _b _se , by(foreign) nodots: reg price weight length mpg
                foreach x in weight length mpg{
                gen t_`x' = _b_`x' / _se_`x'
                }
                Is there a way to return R^2 as well? through -ereturn- ?

                Comment


                • #9
                  Try run a sample regression model using any data, and then use -ereturn list- to see what are available for you to pick.

                  Comment


                  • #10
                    Also, you can calculate the t-statistic as the coefficient divided by the standard error, and then get the F as the square of that.

                    Comment

                    Working...
                    X