Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a loop to repeat an equation and save the coefficients

    Hello everyone! I would greatly appreciate your insights on the following matter. I'm interested in implementing a loop that encompasses various countries within the code. Specifically, I aim to execute the following regression:

    ​​​​​​reg y x1 x2 x3 x4 av_y av_x1 av_x2 av_x3 av_x4 if country == "Australia"

    My original dataset is a 30 years panel dataset so all the averages in the code are cross-sectional averages. I am trying to break down the steps of the Common Correlated Mean Group Estimation and examine the estimates in detail.

    I would like to repeat this regression for 36 countries in total. Additionally, I would like to save all the beta coefficients for each variable (not the cross-sectional averages) within each country's equation so that I can plot it later on.

    Could you kindly guide me on how to approach this loop? Thank you in advance for your help!



  • #2
    No loop needed.
    Code:
    rangestat (reg) y x1 x2 x3 x4 av_y av_x1 av_x2 av_x3 av_x4, by(country) interval(x1 . .)
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    If you are unable to install user-written commands, you can do something similar with the official Stata command -statsby-, although it is a bit more complicated to use and, if your data set is very large, will be appreciably slower.

    Comment


    • #3
      Hi Clyde, the code works beautifully, thank you. However, it seems that none of the estimates are being saved for a specific country "b". I executed the regression exclusively for country "b" and it worked perfectly fine. I double-checked the code and followed your instructions precisely. There are no missing values for that country either. Do you have any insight into why this might be happening?

      Comment


      • #4
        If there are truly no missing values for any variables in the observations for country "b", then the only thing I can think of is that the number of observations for that country is too small to do a regression with that many variables. If some observations have missing values, bear in mind that a missing value on any variable mentioned in the regression results in omission of that observation from the regression's estimation sample. It doesn't take a lot of missing values scattered in the data to this process of omission to result in a sample too small for a regression with lots of variables in it.

        But if the above is what is going on, you should not be able to run the regression separately on just a data set with country "b".

        If that doesn't answer your question, you will need to post example data that exhibits this problem in order for me to troubleshoot it. (Be sure to use -dataex- to do this.)

        Comment


        • #5
          Hi Clyde, I've already handled missing values in my data, so that's not the issue. The problem seems to be the large number of variables. Currently, I have 11 variables, and to replicate the MGCCE method, I need to include the averages of these 11 variables as well as the average of the y variable. This means I end up with a total of 23 explanatory variables. Additionally, I have 32 years of data and 36 unique IDs.

          This abundance of variables, years, and IDs seems to be causing a problem when I use the OLS method to break down the CCMGE steps. Whenever I remove a year or variables, more IDs have missing values in the rangestat estimates. However, when I run the process individually for each ID, it works fine.

          I'm wondering if there's a more efficient approach that avoids estimating the variables separately and manually filling in the values. Can you suggest a shorter way to tackle this issue? (I didn't provide a "dataex" example because the dataset is quite large, and the issue I'm facing doesn't occur with smaller examples.)
          Last edited by Lily Ksh; 04 Jul 2023, 20:17.

          Comment


          • #6
            I am not sure if this helps, but "statsby" may also be able to do this task:

            Code:
            statsby, by(country) nosily: reg y x1 x2 x3 x4 av_y av_x1 av_x2 av_x3 av_x4
            It will save all the coefficients, but you can easily delete the unwanted.

            Regarding your observation in #5, I don't see how variable number be an issue, you have 32 cases for each country (years, assuming balance in each country) but only 23 variables (plus 1 for intercept), good modeling idea or not aside, it should have enough df. (Unless some of the variables are categorical which need to be expanded into dummies, that I couldn't tell.)

            And regarding the dataex concern, I'd suggest you provide some sample data if your trouble persists. It's not that hard for us to propagate a 30-case data into thousands given the right summary statistics.

            Comment


            • #7
              Thank you, Ken. That worked perfectly. I will keep that in mind next time.

              Comment

              Working...
              X