Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • prediction after a count random coefficient model

    Dear Statalist, I am doing a multilevel model with a regional panel dataset (region-year obs). Since my dep variable is a count, I am using a poisson/negative binomial models with random intercepts and coefficient. The idea is to study if the effect of the main indep variable is different for different regions.

    My first problem is with the estimation. If I use a negative binomial model the parameter is significant, but the random coefficient is not (from a visual interpretation since the random coefficient and its standard error are tiny – see below the estimation from “menbreg” ). However, if I redo the same model, this time using a poisson model with GLLAMM, the coefficient is not significant, but the random part it is (which I interpret that on average there is no effect of the independent variable, but it could be that some regions have a positive and significant effect since the random coefficient seems to be significant). Because of that, I would like to make a graph of the coefficient, but for each region (274 in total), which I believe would imply using the coefficient itself plus the estimation for each region random effect. Just to visually check my hypothesis.

    Is it possible to do this with the GLLAMM and the menbreg command to compare both graph? Can you help me with this?

    Another less important issue is that the same model is slightly different (main indep variable coefficient and standard error) from the GLLAMM poisson command than from “mepoisson”. See the results below for the “menbreg”, GLLAMM, and “mepoisson” to check. Maybe I am doing something different, but do not see what. Is this normal?

    I also include an example of the data below for if it helps.

    Thanks for your help in advance.

    Code:
    menbreg y1 L5.c.x1 i.year if year<=2022 || id: L5.c.x1 , vce(robust) eform
    Code:
    Mixed-effects nbinomial regression              Number of obs     =      3,288
    Overdispersion:            mean
    Group variable:              id                 Number of groups  =        274
    
                                                    Obs per group:
                                                                  min =         12
                                                                  avg =       12.0
                                                                  max =         12
    
    Integration method: mvaghermite                 Integration pts.  =          7
    
                                                    Wald chi2(12)     =      64.75
    Log pseudolikelihood = -10922.383               Prob > chi2       =     0.0000
                                       (Std. Err. adjusted for 274 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
              y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x1 |
             L5. |   1.028907   .0147667     1.99   0.047     1.000368     1.05826
                 |
            year |
           2006  |          1  (empty)
           2007  |          1  (empty)
           2008  |          1  (empty)
           2009  |          1  (empty)
           2010  |          1  (empty)
           2011  |   .9119078   .0344727    -2.44   0.015     .8467849     .982039
           2012  |   .9894641   .0338078    -0.31   0.757      .925372    1.057995
           2013  |   1.025426   .0316322     0.81   0.416     .9652655    1.089337
           2014  |   1.006971    .030441     0.23   0.818     .9490412    1.068438
           2015  |   .9904435   .0273633    -0.35   0.728     .9382385    1.045553
           2016  |    .967458   .0277543    -1.15   0.249     .9145616    1.023414
           2017  |   .9405029   .0270033    -2.14   0.033      .889039    .9949458
           2018  |   .9948799   .0277803    -0.18   0.854     .9418946    1.050846
           2019  |   1.010921   .0257702     0.43   0.670     .9616529    1.062712
           2020  |   1.030888   .0231985     1.35   0.176     .9864079    1.077374
           2021  |   1.025108   .0183955     1.38   0.167     .9896797    1.061804
           2022  |          1  (omitted)
                 |
           _cons |   14.59819   1.620518    24.15   0.000      11.7438    18.14635
    -------------+----------------------------------------------------------------
        /lnalpha |  -3.470395   .1139376                     -3.693708   -3.247081
    -------------+----------------------------------------------------------------
    id           |
       var(L5.x1)|   9.39e-37   4.39e-35                      1.63e-76    5396.854
       var(_cons)|   3.238494   .3348332                      2.644454    3.965978
    ------------------------------------------------------------------------------
    Code:
    gllamm y1 L5x1 year6-year16, i(id) family(poisson) link(log) nrf(2) eq(inter slope) eform adapt nip(15 15) cluster(id)
    Code:
    number of level 1 units = 3288
    number of level 2 units = 274
     
    Condition Number = 12.315206
     
    gllamm model 
     
    log likelihood = -11301.293
     
    Robust standard errors for clustered data: cluster(id)
    ----------------------------------------------------------------------------------------
                        y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
                      L5x1 |   1.022388   .0252052     0.90   0.369     .9741609    1.073002
                     year6 |   .8245291   .0430724    -3.69   0.000     .7442867    .9134225
                     year7 |   .9245136   .0419246    -1.73   0.083     .8458888    1.010447
                     year8 |    .966416   .0449236    -0.73   0.462     .8822594      1.0586
                     year9 |   .9527646    .039601    -1.16   0.244     .8782254     1.03363
                    year10 |   .9471477   .0341022    -1.51   0.132     .8826125    1.016402
                    year11 |   .9249721   .0342073    -2.11   0.035     .8602993    .9945068
                    year12 |   .8964757   .0316301    -3.10   0.002     .8365768    .9606633
                    year13 |   .9199161   .0324448    -2.37   0.018     .8584735    .9857562
                    year14 |   .9663046   .0378807    -0.87   0.382     .8948403    1.043476
                    year15 |   1.013463   .0269093     0.50   0.614     .9620709    1.067601
                    year16 |   1.002321   .0176545     0.13   0.895     .9683096    1.037528
                     _cons |   15.23028    1.74463    23.77   0.000     12.16753    19.06396
    ----------------------------------------------------------------------------------------
    Note: Estimates are transformed only in the first equation.
     
     
    Variances and covariances of random effects
    ------------------------------------------------------------------------------
    
     
    ***level 2 (id)
     
        var(1): 3.1587211 (.33311119)
        cov(2,1): -.08200452 (.03828723) cor(2,1): -.24904404
     
        var(2): .03432512 (.00608759)
    ------------------------------------------------------------------------------
    Code:
    mepoisson y1 L5.c.x1 i.year if year<=2022 || id: L5.c.x1 , vce(robust) eform
    Code:
    Mixed-effects Poisson regression                Number of obs     =      3,288
    Group variable:              id                 Number of groups  =        274
    
                                                    Obs per group:
                                                                  min =         12
                                                                  avg =       12.0
                                                                  max =         12
    
    Integration method: mvaghermite                 Integration pts.  =          7
    
                                                    Wald chi2(12)     =      91.69
    Log pseudolikelihood = -11301.512               Prob > chi2       =     0.0000
                                       (Std. Err. adjusted for 274 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
              y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x1 |
             L5. |   .9986708   .0221158    -0.06   0.952     .9562519    1.042971
                 |
            year |
           2006  |          1  (empty)
           2007  |          1  (empty)
           2008  |          1  (empty)
           2009  |          1  (empty)
           2010  |          1  (empty)
           2011  |   .8269095   .0433115    -3.63   0.000     .7462325    .9163087
           2012  |   .9266356   .0422601    -1.67   0.095     .8474013    1.013279
           2013  |   .9683979   .0452809    -0.69   0.492     .8835942    1.061341
           2014  |   .9543322   .0398591    -1.12   0.263      .879322    1.035741
           2015  |     .94837   .0343184    -1.46   0.143      .883437    1.018076
           2016  |   .9263578   .0344053    -2.06   0.039     .8613206    .9963059
           2017  |   .8976381   .0317488    -3.05   0.002     .8375194    .9620722
           2018  |   .9211994   .0325613    -2.32   0.020      .859541    .9872809
           2019  |   .9674635   .0380058    -0.84   0.400     .8957689    1.044896
           2020  |   1.014135   .0269662     0.53   0.598     .9626357    1.068389
           2021  |   1.002216   .0176661     0.13   0.900     .9681825    1.037446
           2022  |          1  (omitted)
                 |
           _cons |   15.23259   1.725934    24.04   0.000     12.19911     19.0204
    -------------+----------------------------------------------------------------
    id           |
       var(L5.x1)|   .0322669   .0055729                      .0230008     .045266
       var(_cons)|   3.226455   .3347703                      2.632732    3.954071
    ------------------------------------------------------------------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id int year float(y1 x1)
    1 2006  5  -1.7775646
    1 2007  5  -1.2202544
    1 2008  8   -.7455087
    1 2009  9   -.8487142
    1 2010  9    -.807432
    1 2011  7   -.0849928
    1 2012  9    .1420595
    1 2013 11    -.642303
    1 2014 12   -.1881984
    1 2015  7  -.06435169
    1 2016  8   .05949503
    1 2017  4   .03885391
    1 2018  9    .7819343
    1 2019 10    .4103941
    1 2020 13   1.6695024
    1 2021  9   1.6695024
    1 2022  6    1.607579
    2 2006 20  -1.9668324
    2 2007 29   -.6613229
    2 2008 36   -.5426402
    2 2009 35     .849184
    2 2010 41   -.2297495
    2 2011 52  -.05712012
    2 2012 39   .09393056
    2 2013 58  -.58579755
    2 2014 55   -.2297495
    2 2015 57   -.5642189
    2 2016 56   -.1758028
    2 2017 49  -.29448548
    2 2018 59   -.9094776
    2 2019 71    .3852426
    2 2020 51    .9678667
    2 2021 64    1.658384
    2 2022 57   2.2625868
    3 2006 22   -1.461945
    3 2007 26   -.7937856
    3 2008 25   -.5662836
    3 2009 35   -.3920703
    3 2010 38   -.5990766
    3 2011 44   -.6298202
    3 2012 50   -.4720034
    3 2013 63   -.6195723
    3 2014 54   -.6072749
    3 2015 59   -.6031758
    3 2016 56   -.1502212
    3 2017 56   .11417311
    3 2018 54    .6593583
    3 2019 72    1.040578
    3 2020 76   1.1901965
    3 2021 51     1.69644
    3 2022 53    2.194485
    4 2006  4  -1.5086967
    4 2007 10   -.6170927
    4 2008  9    -.636063
    4 2009 16   -.7878254
    4 2010 18  -.10489471
    4 2011 29   -.7688551
    4 2012 30 -.010043235
    4 2013 17   -.5791521
    4 2014 20   -.4843006
    4 2015 11  .008927062
    4 2016 16   -.3515086
    4 2017 18    .2365706
    4 2018 19  -.44636005
    4 2019 21    .4452439
    4 2020 21     2.11463
    4 2021 21    1.678313
    4 2022 17   1.8111053
    5 2006 31   -1.749826
    5 2007 24   -1.010769
    5 2008 32   -.5142152
    5 2009 40  -.12159124
    5 2010 39   -.6527883
    5 2011 42   -.6758839
    5 2012 66  -.21397334
    5 2013 90    -.768266
    5 2014 85   -.2255211
    5 2015 50  -.23706888
    5 2016 58   .59437007
    5 2017 63  -.39873755
    5 2018 83   .05162521
    5 2019 80     .744491
    5 2020 90     1.67986
    5 2021 82    1.772242
    5 2022 64    1.726051
    end

  • #2
    I don't have all the answers to your questions, but here are a few things to get you started.
    • With the data you provided, I cannot get the menbreg model to run. Obviously you did, so that is a non-issue.
    • The Poisson model is nested in the negative binomial model (they differ by a single parameter, which is /lnalpha in the menbreg output), so you can run a model comparison test (lrtest in Stata) to see which fits the data better.
    • In the gllamm syntax, you are calling for 15 integration points. mepoisson and menbreg both default to 7. You can change that behavior with the intp(15) option.
    • Note that in gllamm, there is a covariance estimated between the random slope and intercept. You do not estimate that in either of the me models. Use the option cov(un) to get this parameter. Fixing the me models to be as similar as possible to the gllamm model gets you closer to an apples to apples comparison.
    • Unless you have a model that cannot be estimated with Stata's me or gsem suite of commands, I would generally not use gllamm. Mainly that is because it is slow,
    • Across models, you have very little to almost no variance in the L5.x1 slope. This suggests to me that the groups do not differ much in the association between L5.x1 and y1. Graphically this would show up as parallel lines for the groups, differing only in how far away they are from the population line (the random intercept). In other words, you can produce a graph, but it is not going to show much of interest.
    • See this Statalist thread and this post for how to plot the random slopes.

    Comment


    • #3
      Dear Erik Ruzek, thanks for your help with this. Both pages are really helpful. You are right about the command for "menbreg" not having the intpoints(15) and cov(uns). After doing that I see that the random coefficient as well as the fixed part for the main coefficient (x1) are statistically significant (see LR test below for random coefficient model with respect to random intercept only). However, after trying to replicate what I see in the linked page, I came out with a plot in which almost all regions seems to be non-significant in the random effect for x1. I am not so sure if there is something wrong in my commands, but is strange to me to see in the model that the random coefficient is highly significant, while in the plot most of region's random effects predictions are not.

      Is it possible that I am doing or interpreting something wrong? See the results and commands below:

      Code:
      menbreg y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 i.year if year<=2022 || id: , vce(robust) eform intpoints(15)
      estimates store ri
      Code:
      Mixed-effects nbinomial regression              Number of obs     =      3,006
      Overdispersion:            mean
      Group variable:              id                 Number of groups  =        254
      
                                                      Obs per group:
                                                                    min =          4
                                                                    avg =       11.8
                                                                    max =         12
      
      Integration method: mvaghermite                 Integration pts.  =         15
      
                                                      Wald chi2(17)     =      68.64
      Log pseudolikelihood =  -10315.79               Prob > chi2       =     0.0000
                                         (Std. Err. adjusted for 254 clusters in id)
      ------------------------------------------------------------------------------
                   |               Robust
                y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                x1 |
               L5. |    1.03384   .0162593     2.12   0.034     1.002458    1.066204
                   |
                x2 |
               L5. |   .9973701   .0132023    -0.20   0.842     .9718267    1.023585
                   |
                x3 |
               L5. |   .9867636   .0180156    -0.73   0.465      .952078    1.022713
                   |
                x4 |
               L5. |   .9821639   .0121073    -1.46   0.144     .9587184    1.006183
                   |
                x5 |
               L5. |   1.004023   .0162252     0.25   0.804     .9727207    1.036333
                   |
                x6 |
               L5. |   1.021519   .0109755     1.98   0.048     1.000232    1.043259
                   |
              year |
             2006  |          1  (empty)
             2007  |          1  (empty)
             2008  |          1  (empty)
             2009  |          1  (empty)
             2010  |          1  (empty)
             2011  |   .8601985   .0513747    -2.52   0.012      .765176    .9670213
             2012  |   .9411211   .0522225    -1.09   0.274     .8441363    1.049249
             2013  |   .9891747   .0471648    -0.23   0.819     .9009215    1.086073
             2014  |   .9545947   .0472438    -0.94   0.348     .8663478    1.051831
             2015  |   .9574161   .0414006    -1.01   0.314     .8796159    1.042098
             2016  |   .9510924    .035121    -1.36   0.174     .8846885     1.02248
             2017  |   .9272056    .033535    -2.09   0.037     .8637537    .9953188
             2018  |   .9791434    .033789    -0.61   0.541     .9151082     1.04766
             2019  |   .9960691    .030873    -0.13   0.899     .9373604    1.058455
             2020  |   1.014371   .0267424     0.54   0.588     .9632877    1.068163
             2021  |   1.016465   .0203154     0.82   0.414     .9774176    1.057073
             2022  |          1  (omitted)
                   |
             _cons |   17.72197   1.768058    28.82   0.000      14.5744    21.54931
      -------------+----------------------------------------------------------------
          /lnalpha |  -3.431682   .1148393                     -3.656762   -3.206601
      -------------+----------------------------------------------------------------
      id           |
         var(_cons)|   2.408516   .2039353                      2.040216    2.843302
      ------------------------------------------------------------------------------
      Note: Estimates are transformed only in the first equation.
      Code:
      menbreg y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 i.year if year<=2022 || id: L5.c.x1 , cov(unstructure) vce(robust) eform intpoints(15)
      estimates store rc
      Code:
      Mixed-effects nbinomial regression              Number of obs     =      3,006
      Overdispersion:            mean
      Group variable:              id                 Number of groups  =        254
      
                                                      Obs per group:
                                                                    min =          4
                                                                    avg =       11.8
                                                                    max =         12
      
      Integration method: mvaghermite                 Integration pts.  =         15
      
                                                      Wald chi2(17)     =      67.13
      Log pseudolikelihood =  -10258.05               Prob > chi2       =     0.0000
                                             (Std. Err. adjusted for 254 clusters in id)
      ----------------------------------------------------------------------------------
                       |               Robust
                    y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
                    x1 |
                   L5. |   1.044958   .0216087     2.13   0.033     1.003453     1.08818
                       |
                    x2 |
                   L5. |   1.000673   .0130966     0.05   0.959      .975331    1.026674
                       |
                    x3 |
                   L5. |   .9796073   .0149627    -1.35   0.177     .9507155    1.009377
                       |
                    x4 |
                   L5. |   .9918395   .0124067    -0.66   0.512     .9678185    1.016457
                       |
                    x5 |
                   L5. |   1.003104   .0163533     0.19   0.849     .9715582    1.035673
                       |
                    x6 |
                   L5. |   1.016806   .0103273     1.64   0.101     .9967654     1.03725
                       |
                  year |
                 2006  |          1  (empty)
                 2007  |          1  (empty)
                 2008  |          1  (empty)
                 2009  |          1  (empty)
                 2010  |          1  (empty)
                 2011  |   .8308796   .0507725    -3.03   0.002     .7370956    .9365962
                 2012  |   .9199021   .0503215    -1.53   0.127      .826377    1.024012
                 2013  |   .9671356   .0469571    -0.69   0.491     .8793447    1.063691
                 2014  |   .9498539   .0480901    -1.02   0.310     .8601246    1.048944
                 2015  |   .9521785   .0425776    -1.10   0.273     .8722802    1.039395
                 2016  |    .945086   .0369091    -1.45   0.148     .8754448    1.020267
                 2017  |   .9209528   .0349554    -2.17   0.030     .8549278    .9920769
                 2018  |   .9709001   .0346993    -0.83   0.409     .9052181    1.041348
                 2019  |   .9962854   .0321001    -0.12   0.908     .9353156    1.061229
                 2020  |   1.015637   .0270911     0.58   0.561     .9639033    1.070147
                 2021  |   1.015463   .0202364     0.77   0.441     .9765646     1.05591
                 2022  |          1  (omitted)
                       |
                 _cons |   17.93158    1.77702    29.13   0.000     14.76606    21.77573
      -----------------+----------------------------------------------------------------
              /lnalpha |  -3.738889   .1233591                     -3.980668   -3.497109
      -----------------+----------------------------------------------------------------
      id               |
             var(L5.x1)|   .0274824   .0060578                      .0178414    .0423332
             var(_cons)|   2.360432   .2013258                      1.997059    2.789923
      -----------------+----------------------------------------------------------------
      id               |
       cov(L5.x1,_cons)|  -.0466977   .0299401    -1.56   0.119    -.1053793    .0119839
      ----------------------------------------------------------------------------------
      Note: Estimates are transformed only in the first equation.
      
      . estimates store rc
      
      . lrtest ri rc, force
      
      Likelihood-ratio test                                 LR chi2(2)  =    115.48
      (Assumption: ri nested in rc)                         Prob > chi2 =    0.0000
      Code:
      predict xb, xb
      predict u*, reffects reses(u_se*)
      des u*
      gen predicted_value = xb + u2 + u1*x1
      gen predicted_value_se = sqrt(u_se2^2 + (u_se1*x1)^2)
      gen lower_bound = u1 - 1.96 * u_se1
      gen upper_bound = u1 + 1.96 * u_se1
      egen id_index = group(id)
      sort u1
      gen id_sorted = _n
      twoway (scatter u1 id_sorted, yline(0) xlabel(1(10)254)) ///
             (rcap lower_bound upper_bound id_sorted, lcolor(red) lwidth(vvthin))
      Graph1.gph

      Comment


      • #4
        Generally speaking, with these types of models, you are not looking at the random effects model output to determine statistical significance. Instead, as you have done, you look at model comparison tests. Indeed, the model with the random slope and random slope-intercept covariance, provides a better fit to your data than the random intercept only model. In terms of the differences in the slopes across clusters, it is entirely possible that there can be small between cluster differences in the slope for a particular variable. With large enough sample sizes, even a small parameter estimate can achieve statistical significance. This is why many folks do not recommend using statistical significance as a criteria in evaluating a model or parameter estimate.

        A logical next step in models such as these, when you find evidence supporting variation in the slope for a particular variable, would be to consider whether you have cluster-level information (variables) that might explain such variation. To explore this, you would include an interaction between the L5.x1 predictor and a cluster-level variable. However, if that is not the point of your analysis, then perhaps you stop here.

        Comment


        • #5
          Dear Erik Ruzek, thanks for your answer. I see your point (significance of random slope predictions are not much relevant, if the test says that random coeff model is preferred). But still, it is hard to me to understand why if the random slope is highly significant in the model, most of the predicted random slopes in the plot in #3 are not. So, I am not sure if I calculated wrong the confidence intervals for the plot.

          As you say, this is just a first step in my analysis, my plan is to derive a kind of efficiency measure from the coefficients (fixed + random) for x1 variable to use in a second regression. The hypothesis would be that the effect of x1 is different for each region (which I get). However, that regression will give time invariant kind of predictions for this efficiency measure, and it will let me do only between regions comparisons in the second part of my analysis. So, my aim now is how to get a yearly prediction for this measurement of the efficiency of x1.

          Since the number of years in my panel are not much, I discarded using “year” as a level (random effect). Even though the panel is from 2006-2022, given the nature of the dep variable I have to lag 5 periods (reduces the panel). So, my next idea is to interact x1 with years in the fix part of the model (L5.c.x1##i.year) including x1 as a random variable as before. And then use the coefficients from fixed and random part to get this prediction for the efficiency measure in each year:
          e.g., for 2015 would be: [(betax1 + beta_x1*year_2015) + random coeff_x1].

          Where the first coeff would be x1 without interaction, the second coeff would be the one for the interaction in the current year (e.g. 2015), and the third one would be the random coeff for x1.

          I believe this prediction will give me how regions differ in taking profit from x1, but now would be yearly (not time-invariant as before). My main issue is what to do if some of the interactions are not significant, should I calculate the measure anyway?

          I mean, what would be the implications of a prediction for a year if its interaction is not significant?

          If you have any other idea to make this efficiency measure time varying, please let me know.

          Thanks!

          Comment


          • #6
            Doris,

            This is not something that I have seen before, so I don't have a ton of ideas about how to help you move forward. One thought is that you could allow for random intercepts for year (you have about 10, which is probably enough) and then allow the x1 variable's slope to also vary across years. In econometrics, models such as these are sometimes referred to as two-way error components models. Years are crossed with firms and so the syntax would be as follows:
            Code:
            menbreg y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 year<=2022 || _all:R.year || id: L5.c.x1 , cov(unstructure) vce(robust) eform intpoints(15)
            The problem is that you cannot specify a random slope with these models in the me suite of commands. So you cannot allow the c.x1 variable's slope to vary across years. You would need to go over to gsem for this. I believe that the following syntax would work, but I am not 100% sure:
            Code:
            gsem y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 year<=2022 M1[year]  M2[id] L5.c.x1#M3[year] :L5.c.x1#M4[id],  latent(M1 M2 M3 M4) vce(robust) nbreg
            Using similar code as before, you could look at how the L5.cx1 random slope varies across years. The random effects are additive, so you can look at the combined effect of x1 for each cluster and year.

            This may not be what you want, but it is probably how I would move forward with the analysis. In general with these models, it is better to keep the latent variables intact rather than predicting them and running subsequent regressions. Mainly this is due to the rather large uncertainty in the a given cluster's value (see your graph in post #3. Keeping them latent preserves the uncertainty, reducing both sampling and measurement error.

            Comment


            • #7
              Is the primary goal prediction or are you trying to infer causality? Frankly, multi-level models aren't the best for either of these. The random effects are assumed to be independent of the explanatory variables as menbreg is being implemented. For causality, you should use the fixed effects Poisson estimator, which is fully robust to distributional misspecification and serial correlation.

              FE Poisson won't be great for prediction because you have to plug in values for the unobserved effects. Of course, multi-level models essentially as for that, too. You can use the version of Poisson regression that includes region-specific averages of the covariates. This is similar to fixed effects but won't rely on noisy estimates of the unobserved effects.

              Multi-level models actually impose lots of assumptions: heterogeneity independent of x(i,t) and no serial correlation are two. Plus, the distributional assumptions have to be correct. And, computationally, they're not straightforward.

              Comment


              • #8
                Dear Erik Ruzek. Thanks again for your help, really! About your suggestion of using year as random effect, I am not sure since these are only 11 years (too short number of groups). I believe I can do the same (with less computational issues and random effect properties) interacting “x1” with years in the fixed part of the model. Even though I am not sure what to do if an interaction is not significant.

                Dear Professor Jeff Wooldridge, thanks for your help! The idea is to use this first step coefficients for x1 (fixed and random) to build a measure of the efficiency each region have with respect to x1 and y1, and use that prediction into a second regression. This is why I chose the multilevel model, because I can model the regional (id) differences with respect to x1 using the random coefficient for x1. In my model I am using standardized variables (x1,…x6) but using each region average and standard deviation. So, I believe that I am accessing to the within effect since I demean by each region average, and the division by the region’s standard deviation allow me to easily compare parameters. That way, I think that the correlation with the random effect is dropped (because of the average demeaned step). Please correct me if I am wrong.

                However, my main problem now is that the multilevel model I showed in #3, gives me a time-invariant measure for this efficiency (unique within region, but different between regions). And I am trying to come up with a way in which this efficiency measure is also within region. This is why I think of interacting “x1” with years and use each interaction in the prediction in order to have it time varying. But I am not sure of what to do if the interaction for a given year is not significant and the implication of building for a given year that efficiency measure if the interaction is not significant (see the regression and the predictions for each year below).

                Which other ways could I try for obtaining this regional efficiency measure, that vary for each region and year?

                Code:
                Mixed-effects nbinomial regression              Number of obs     =      3,006
                Overdispersion:            mean
                Group variable:              id                 Number of groups  =        254
                
                                                                Obs per group:
                                                                              min =          4
                                                                              avg =       11.8
                                                                              max =         12
                
                Integration method: mvaghermite                 Integration pts.  =         18
                
                                                                Wald chi2(28)     =     110.46
                Log pseudolikelihood = -10246.172               Prob > chi2       =     0.0000
                                                       (Std. Err. adjusted for 254 clusters in id)
                ----------------------------------------------------------------------------------
                                 |               Robust
                              y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -----------------+----------------------------------------------------------------
                              x1 |
                             L5. |   .9471795   .0587929    -0.87   0.382     .8386811    1.069714
                                 |
                            year |
                              1  |          1  (empty)
                           2006  |          1  (empty)
                           2007  |          1  (empty)
                           2008  |          1  (empty)
                           2009  |          1  (empty)
                           2010  |          1  (empty)
                           2011  |   .7253915    .055117    -4.23   0.000     .6250231    .8418773
                           2012  |    .890322   .0550077    -1.88   0.060     .7887809    1.004935
                           2013  |    .944976   .0487513    -1.10   0.273     .8540973    1.045525
                           2014  |   .9463274   .0510981    -1.02   0.307     .8512943    1.051969
                           2015  |   .9443329   .0437772    -1.24   0.217     .8623137    1.034153
                           2016  |   .9410256   .0390412    -1.47   0.143     .8675348    1.020742
                           2017  |   .9174695   .0379242    -2.08   0.037     .8460707    .9948935
                           2018  |   .9706285   .0427292    -0.68   0.498      .890392    1.058095
                           2019  |   1.014845   .0374759     0.40   0.690     .9439885     1.09102
                           2020  |   .9973637   .0312183    -0.08   0.933     .9380159    1.060466
                           2021  |   .9961875   .0234948    -0.16   0.871     .9511866    1.043317
                           2022  |          1  (omitted)
                                 |
                     year#cL5.x1 |
                           2006  |          1  (empty)
                           2007  |          1  (empty)
                           2008  |          1  (empty)
                           2009  |          1  (empty)
                           2010  |          1  (empty)
                           2011  |   1.015037   .0743515     0.20   0.839     .8792884    1.171743
                           2012  |   1.087937   .0785062     1.17   0.243      .944453    1.253219
                           2013  |   1.099021   .0860091     1.21   0.228     .9427381    1.281211
                           2014  |   1.137622    .079224     1.85   0.064     .9924766    1.303994
                           2015  |   1.142391   .0765517     1.99   0.047     1.001788    1.302728
                           2016  |   1.165437   .0766592     2.33   0.020      1.02447    1.325801
                           2017  |    1.16328   .0821751     2.14   0.032     1.012872    1.336023
                           2018  |   1.157751   .1035184     1.64   0.101     .9716427    1.379507
                           2019  |    1.21866   .0868489     2.77   0.006     1.059793    1.401342
                           2020  |   1.104429   .0984589     1.11   0.265     .9273716     1.31529
                           2021  |   1.030108   .0645353     0.47   0.636     .9110789    1.164689
                           2022  |          1  (omitted)
                                 |
                              x2 |
                             L5. |   1.001798   .0131664     0.14   0.891     .9763214    1.027938
                                 |
                              x3 |
                             L5. |   .9759461   .0145318    -1.64   0.102     .9478759    1.004848
                                 |
                              x4 |
                             L5. |   .9962871   .0123361    -0.30   0.764     .9723999    1.020761
                                 |
                              x5 |
                             L5. |   1.005562    .016348     0.34   0.733     .9740252    1.038119
                                 |
                              x6 |
                             L5. |   1.015903   .0103353     1.55   0.121     .9958464    1.036363
                                 |
                           _cons |   18.25808   1.836593    28.88   0.000     14.99105    22.23709
                -----------------+----------------------------------------------------------------
                        /lnalpha |  -3.761077   .1256267                       -4.0073   -3.514853
                -----------------+----------------------------------------------------------------
                id               |
                       var(L5.x1)|   .0269591   .0060309                      .0173895     .041795
                       var(_cons)|   2.367242   .2014625                      2.003557    2.796943
                -----------------+----------------------------------------------------------------
                id               |
                 cov(L5.x1,_cons)|  -.0442271   .0297488    -1.49   0.137    -.1025337    .0140796
                ----------------------------------------------------------------------------------
                Code:
                predict u1, reffects
                gen baseline_ppp = _b[L5.x1] + u1
                gen ppp_year = .  
                foreach yr in 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 {
                    replace ppp_year = baseline_ppp + _b[`yr'.year#cL5.x1] if year == `yr'
                }

                Comment

                Working...
                X