Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Nick, thanks for a providing some more detail. I guess you can do that, but it seems to me like you might be approaching this problem backwards. Taking small samples of data and fitting models seems a way to (possibly poorly) approximate the model you would get by fitting to the entire dataset. If you did have a single, large data model, then you could then produce post-estimation predictions using any number of people for any combination of covariates after the fact, and do your post-processing with those results (e.g., take the average opinion and look at a distribution thereof).

    Comment


    • #17
      Thank you Leonardo, could you maybe explain that in more simpler terms? I get that you would just run the every single opinion for every ID year combination first. But i'm not sure how to make a group of people after the fact? If possible, could you maybe give a small example?
      Last edited by Nick Bertel; 31 May 2023, 16:11.

      Comment


      • #18
        Let's use the NLSW 88 dataset as an example dataset.

        First, we'll start by trying to apply the idea you described in #15. We'll take small, random sample of the dataset a fit a regression model trying to predict hourly wage based solely on college graduate status. This is an arbitarily simple model for illustrative purposes.

        This program draws 100 random samples of 5% of the data which have wage and college graduate status data available.

        Code:
        set seed 18
        webuse nlsw88, clear
        
        keep if !missing(collgrad, wage)
        
        cap program drop samplereg
        program samplereg
          syntax , pct(int)
         
          preserve
          sample `pct'
          reg wage i.collgrad
          restore
        end
        
        * Run several regressions on subsamples, then average the coefficients.
        preserve
        simulate _b, reps(100) nodots : samplereg, pct(5)
        list in 1/5
        mean _sim_2 _b_cons
        restore
        
        * Regression on the whole dataset
        reg wage i.collgrad, nohead
        Relevant results:

        Code:
        * Run several regressions on subsamples, then average the coefficients.
        . mean _sim_2 _b_cons
        
        Mean estimation                            Number of obs = 100
        
        --------------------------------------------------------------
                     |       Mean   Std. err.     [95% conf. interval]
        -------------+------------------------------------------------
              _sim_2 |   3.419975   .1180181      3.185801    3.654148
             _b_cons |   6.975366    .058489      6.859311    7.091421
        --------------------------------------------------------------
        
        * Regression on the whole dataset
        . reg wage i.collgrad, nohead
        -------------------------------------------------------------------------------
                 wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        --------------+----------------------------------------------------------------
             collgrad |
        College grad  |   3.615502   .2753268    13.13   0.000      3.07558    4.155424
                _cons |   6.910561   .1339984    51.57   0.000     6.647788    7.173335
        -------------------------------------------------------------------------------
        Notice that the first part of the output shows averages of regression coefficients from those 100 models. The mean coeffient for college graduates (-_sim_2-) and the constant, not college graduates (-_b_cons-) approximate the values of the regression model. As you increase the number of replications and size of the sample, the average regression coefficients get closer to the true value (the value of obtained by fitting one model to the entire dataset).

        ------------------

        Switching gears now. What if we just start with a regression model fit the entire dataset.
        I'll fit a model to predict wage, this time based on age, marital and college graduate status.

        Code:
        . webuse nlsw88, clear
        . reg wage age i.collgrad i.married, nohead
        -------------------------------------------------------------------------------
                 wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        --------------+----------------------------------------------------------------
                  age |  -.0656183   .0382239    -1.72   0.086    -.1405762    .0093397
                      |
             collgrad |
        College grad  |   3.615111   .2750165    13.15   0.000     3.075797    4.154424
                      |
              married |
             Married  |  -.5125942   .2439234    -2.10   0.036    -.9909336   -.0342548
                _cons |   9.808917   1.513613     6.48   0.000     6.840687    12.77715
        -------------------------------------------------------------------------------
        
        . est store MyModel
        We can use this model to make predictions on the wage for people in the dataset (in-sample prediction).
        Here I use -predict- to compute the model-based mean for every individual.

        Code:
        . predict pred_wage, xb
        
        . list age collgrad married wage pred_wage in 1/5, nolabel
        
             +-------------------------------------------------+
             | age   collgrad   married       wage   pred_wage |
             |-------------------------------------------------|
          1. |  37          0         0   11.73913   7.3810416 |
          2. |  37          0         0   6.400963   7.3810416 |
          3. |  42          0         0   5.016723   7.0529503 |
          4. |  43          1         1   9.033813   10.089849 |
          5. |  42          0         1   8.083731   6.5403561 |
             +-------------------------------------------------+
        But you're not limited to making predictions on observed individuals. You can use out-of-sample or hypothetical data with the existing coefficients to make new predictions. For example:

        Code:
        mkf New
        cwf New
        input int age byte(collgrad married)
        39 0 1
        40 1 1
        44 1 0
        end
        
        est restore MyModel
        predict pred_wage, xb
        list
        Results

        Code:
        . list
        
             +--------------------------------------+
             | age   collgrad   married   pred_wage |
             |--------------------------------------|
          1. |  39          0         1   6.7372109 |
          2. |  40          1         1   10.286703 |
          3. |  44          1         0   10.536824 |
             +--------------------------------------+
        So I think in your case, you could work out whatever groups of ID-Year you like, predict the response, take the average, and store the result. Do this several tiems to get your distribution of results you were looking for.

        Comment


        • #19
          Thank you for your great effort! Will definitely take this into consideration!

          Comment

          Working...
          X