prediction after a count random coefficient model

Doris Rivera

Join Date: Feb 2020
Posts: 172

prediction after a count random coefficient model

19 Sep 2024, 02:38

Dear Statalist, I am doing a multilevel model with a regional panel dataset (region-year obs). Since my dep variable is a count, I am using a poisson/negative binomial models with random intercepts and coefficient. The idea is to study if the effect of the main indep variable is different for different regions.

My first problem is with the estimation. If I use a negative binomial model the parameter is significant, but the random coefficient is not (from a visual interpretation since the random coefficient and its standard error are tiny – see below the estimation from “menbreg” ). However, if I redo the same model, this time using a poisson model with GLLAMM, the coefficient is not significant, but the random part it is (which I interpret that on average there is no effect of the independent variable, but it could be that some regions have a positive and significant effect since the random coefficient seems to be significant). Because of that, I would like to make a graph of the coefficient, but for each region (274 in total), which I believe would imply using the coefficient itself plus the estimation for each region random effect. Just to visually check my hypothesis.

Is it possible to do this with the GLLAMM and the menbreg command to compare both graph? Can you help me with this?

Another less important issue is that the same model is slightly different (main indep variable coefficient and standard error) from the GLLAMM poisson command than from “mepoisson”. See the results below for the “menbreg”, GLLAMM, and “mepoisson” to check. Maybe I am doing something different, but do not see what. Is this normal?

I also include an example of the data below for if it helps.

Thanks for your help in advance.

Code:

menbreg y1 L5.c.x1 i.year if year<=2022 || id: L5.c.x1 , vce(robust) eform

Code:

Mixed-effects nbinomial regression              Number of obs     =      3,288
Overdispersion:            mean
Group variable:              id                 Number of groups  =        274

                                                Obs per group:
                                                              min =         12
                                                              avg =       12.0
                                                              max =         12

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(12)     =      64.75
Log pseudolikelihood = -10922.383               Prob > chi2       =     0.0000
                                   (Std. Err. adjusted for 274 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |
         L5. |   1.028907   .0147667     1.99   0.047     1.000368     1.05826
             |
        year |
       2006  |          1  (empty)
       2007  |          1  (empty)
       2008  |          1  (empty)
       2009  |          1  (empty)
       2010  |          1  (empty)
       2011  |   .9119078   .0344727    -2.44   0.015     .8467849     .982039
       2012  |   .9894641   .0338078    -0.31   0.757      .925372    1.057995
       2013  |   1.025426   .0316322     0.81   0.416     .9652655    1.089337
       2014  |   1.006971    .030441     0.23   0.818     .9490412    1.068438
       2015  |   .9904435   .0273633    -0.35   0.728     .9382385    1.045553
       2016  |    .967458   .0277543    -1.15   0.249     .9145616    1.023414
       2017  |   .9405029   .0270033    -2.14   0.033      .889039    .9949458
       2018  |   .9948799   .0277803    -0.18   0.854     .9418946    1.050846
       2019  |   1.010921   .0257702     0.43   0.670     .9616529    1.062712
       2020  |   1.030888   .0231985     1.35   0.176     .9864079    1.077374
       2021  |   1.025108   .0183955     1.38   0.167     .9896797    1.061804
       2022  |          1  (omitted)
             |
       _cons |   14.59819   1.620518    24.15   0.000      11.7438    18.14635
-------------+----------------------------------------------------------------
    /lnalpha |  -3.470395   .1139376                     -3.693708   -3.247081
-------------+----------------------------------------------------------------
id           |
   var(L5.x1)|   9.39e-37   4.39e-35                      1.63e-76    5396.854
   var(_cons)|   3.238494   .3348332                      2.644454    3.965978
------------------------------------------------------------------------------

Code:

gllamm y1 L5x1 year6-year16, i(id) family(poisson) link(log) nrf(2) eq(inter slope) eform adapt nip(15 15) cluster(id)

Code:

number of level 1 units = 3288
number of level 2 units = 274
 
Condition Number = 12.315206
 
gllamm model 
 
log likelihood = -11301.293
 
Robust standard errors for clustered data: cluster(id)
----------------------------------------------------------------------------------------
                    y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
                  L5x1 |   1.022388   .0252052     0.90   0.369     .9741609    1.073002
                 year6 |   .8245291   .0430724    -3.69   0.000     .7442867    .9134225
                 year7 |   .9245136   .0419246    -1.73   0.083     .8458888    1.010447
                 year8 |    .966416   .0449236    -0.73   0.462     .8822594      1.0586
                 year9 |   .9527646    .039601    -1.16   0.244     .8782254     1.03363
                year10 |   .9471477   .0341022    -1.51   0.132     .8826125    1.016402
                year11 |   .9249721   .0342073    -2.11   0.035     .8602993    .9945068
                year12 |   .8964757   .0316301    -3.10   0.002     .8365768    .9606633
                year13 |   .9199161   .0324448    -2.37   0.018     .8584735    .9857562
                year14 |   .9663046   .0378807    -0.87   0.382     .8948403    1.043476
                year15 |   1.013463   .0269093     0.50   0.614     .9620709    1.067601
                year16 |   1.002321   .0176545     0.13   0.895     .9683096    1.037528
                 _cons |   15.23028    1.74463    23.77   0.000     12.16753    19.06396
----------------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): 3.1587211 (.33311119)
    cov(2,1): -.08200452 (.03828723) cor(2,1): -.24904404
 
    var(2): .03432512 (.00608759)
------------------------------------------------------------------------------

Code:

mepoisson y1 L5.c.x1 i.year if year<=2022 || id: L5.c.x1 , vce(robust) eform

Code:

Mixed-effects Poisson regression                Number of obs     =      3,288
Group variable:              id                 Number of groups  =        274

                                                Obs per group:
                                                              min =         12
                                                              avg =       12.0
                                                              max =         12

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(12)     =      91.69
Log pseudolikelihood = -11301.512               Prob > chi2       =     0.0000
                                   (Std. Err. adjusted for 274 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |
         L5. |   .9986708   .0221158    -0.06   0.952     .9562519    1.042971
             |
        year |
       2006  |          1  (empty)
       2007  |          1  (empty)
       2008  |          1  (empty)
       2009  |          1  (empty)
       2010  |          1  (empty)
       2011  |   .8269095   .0433115    -3.63   0.000     .7462325    .9163087
       2012  |   .9266356   .0422601    -1.67   0.095     .8474013    1.013279
       2013  |   .9683979   .0452809    -0.69   0.492     .8835942    1.061341
       2014  |   .9543322   .0398591    -1.12   0.263      .879322    1.035741
       2015  |     .94837   .0343184    -1.46   0.143      .883437    1.018076
       2016  |   .9263578   .0344053    -2.06   0.039     .8613206    .9963059
       2017  |   .8976381   .0317488    -3.05   0.002     .8375194    .9620722
       2018  |   .9211994   .0325613    -2.32   0.020      .859541    .9872809
       2019  |   .9674635   .0380058    -0.84   0.400     .8957689    1.044896
       2020  |   1.014135   .0269662     0.53   0.598     .9626357    1.068389
       2021  |   1.002216   .0176661     0.13   0.900     .9681825    1.037446
       2022  |          1  (omitted)
             |
       _cons |   15.23259   1.725934    24.04   0.000     12.19911     19.0204
-------------+----------------------------------------------------------------
id           |
   var(L5.x1)|   .0322669   .0055729                      .0230008     .045266
   var(_cons)|   3.226455   .3347703                      2.632732    3.954071
------------------------------------------------------------------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id int year float(y1 x1)
1 2006  5  -1.7775646
1 2007  5  -1.2202544
1 2008  8   -.7455087
1 2009  9   -.8487142
1 2010  9    -.807432
1 2011  7   -.0849928
1 2012  9    .1420595
1 2013 11    -.642303
1 2014 12   -.1881984
1 2015  7  -.06435169
1 2016  8   .05949503
1 2017  4   .03885391
1 2018  9    .7819343
1 2019 10    .4103941
1 2020 13   1.6695024
1 2021  9   1.6695024
1 2022  6    1.607579
2 2006 20  -1.9668324
2 2007 29   -.6613229
2 2008 36   -.5426402
2 2009 35     .849184
2 2010 41   -.2297495
2 2011 52  -.05712012
2 2012 39   .09393056
2 2013 58  -.58579755
2 2014 55   -.2297495
2 2015 57   -.5642189
2 2016 56   -.1758028
2 2017 49  -.29448548
2 2018 59   -.9094776
2 2019 71    .3852426
2 2020 51    .9678667
2 2021 64    1.658384
2 2022 57   2.2625868
3 2006 22   -1.461945
3 2007 26   -.7937856
3 2008 25   -.5662836
3 2009 35   -.3920703
3 2010 38   -.5990766
3 2011 44   -.6298202
3 2012 50   -.4720034
3 2013 63   -.6195723
3 2014 54   -.6072749
3 2015 59   -.6031758
3 2016 56   -.1502212
3 2017 56   .11417311
3 2018 54    .6593583
3 2019 72    1.040578
3 2020 76   1.1901965
3 2021 51     1.69644
3 2022 53    2.194485
4 2006  4  -1.5086967
4 2007 10   -.6170927
4 2008  9    -.636063
4 2009 16   -.7878254
4 2010 18  -.10489471
4 2011 29   -.7688551
4 2012 30 -.010043235
4 2013 17   -.5791521
4 2014 20   -.4843006
4 2015 11  .008927062
4 2016 16   -.3515086
4 2017 18    .2365706
4 2018 19  -.44636005
4 2019 21    .4452439
4 2020 21     2.11463
4 2021 21    1.678313
4 2022 17   1.8111053
5 2006 31   -1.749826
5 2007 24   -1.010769
5 2008 32   -.5142152
5 2009 40  -.12159124
5 2010 39   -.6527883
5 2011 42   -.6758839
5 2012 66  -.21397334
5 2013 90    -.768266
5 2014 85   -.2255211
5 2015 50  -.23706888
5 2016 58   .59437007
5 2017 63  -.39873755
5 2018 83   .05162521
5 2019 80     .744491
5 2020 90     1.67986
5 2021 82    1.772242
5 2022 64    1.726051
end

Tags: None

Erik Ruzek

Join Date: Oct 2017

Posts: 435
#2

19 Sep 2024, 08:45

I don't have all the answers to your questions, but here are a few things to get you started.
With the data you provided, I cannot get the menbreg model to run. Obviously you did, so that is a non-issue.

The Poisson model is nested in the negative binomial model (they differ by a single parameter, which is /lnalpha in the menbreg output), so you can run a model comparison test (lrtest in Stata) to see which fits the data better.

In the gllamm syntax, you are calling for 15 integration points. mepoisson and menbreg both default to 7. You can change that behavior with the intp(15) option.

Note that in gllamm, there is a covariance estimated between the random slope and intercept. You do not estimate that in either of the me models. Use the option cov(un) to get this parameter. Fixing the me models to be as similar as possible to the gllamm model gets you closer to an apples to apples comparison.

Unless you have a model that cannot be estimated with Stata's me or gsem suite of commands, I would generally not use gllamm. Mainly that is because it is slow,

Across models, you have very little to almost no variance in the L5.x1 slope. This suggests to me that the groups do not differ much in the association between L5.x1 and y1. Graphically this would show up as parallel lines for the groups, differing only in how far away they are from the population line (the random intercept). In other words, you can produce a graph, but it is not going to show much of interest.

See this Statalist thread and this post for how to plot the random slopes.
Comment

Doris Rivera

Join Date: Feb 2020
Posts: 172

20 Sep 2024, 04:41

Dear Erik Ruzek, thanks for your help with this. Both pages are really helpful. You are right about the command for "menbreg" not having the intpoints(15) and cov(uns). After doing that I see that the random coefficient as well as the fixed part for the main coefficient (x1) are statistically significant (see LR test below for random coefficient model with respect to random intercept only). However, after trying to replicate what I see in the linked page, I came out with a plot in which almost all regions seems to be non-significant in the random effect for x1. I am not so sure if there is something wrong in my commands, but is strange to me to see in the model that the random coefficient is highly significant, while in the plot most of region's random effects predictions are not.

Is it possible that I am doing or interpreting something wrong? See the results and commands below:

Code:

menbreg y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 i.year if year<=2022 || id: , vce(robust) eform intpoints(15)
estimates store ri

Code:

Mixed-effects nbinomial regression              Number of obs     =      3,006
Overdispersion:            mean
Group variable:              id                 Number of groups  =        254

                                                Obs per group:
                                                              min =          4
                                                              avg =       11.8
                                                              max =         12

Integration method: mvaghermite                 Integration pts.  =         15

                                                Wald chi2(17)     =      68.64
Log pseudolikelihood =  -10315.79               Prob > chi2       =     0.0000
                                   (Std. Err. adjusted for 254 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |
         L5. |    1.03384   .0162593     2.12   0.034     1.002458    1.066204
             |
          x2 |
         L5. |   .9973701   .0132023    -0.20   0.842     .9718267    1.023585
             |
          x3 |
         L5. |   .9867636   .0180156    -0.73   0.465      .952078    1.022713
             |
          x4 |
         L5. |   .9821639   .0121073    -1.46   0.144     .9587184    1.006183
             |
          x5 |
         L5. |   1.004023   .0162252     0.25   0.804     .9727207    1.036333
             |
          x6 |
         L5. |   1.021519   .0109755     1.98   0.048     1.000232    1.043259
             |
        year |
       2006  |          1  (empty)
       2007  |          1  (empty)
       2008  |          1  (empty)
       2009  |          1  (empty)
       2010  |          1  (empty)
       2011  |   .8601985   .0513747    -2.52   0.012      .765176    .9670213
       2012  |   .9411211   .0522225    -1.09   0.274     .8441363    1.049249
       2013  |   .9891747   .0471648    -0.23   0.819     .9009215    1.086073
       2014  |   .9545947   .0472438    -0.94   0.348     .8663478    1.051831
       2015  |   .9574161   .0414006    -1.01   0.314     .8796159    1.042098
       2016  |   .9510924    .035121    -1.36   0.174     .8846885     1.02248
       2017  |   .9272056    .033535    -2.09   0.037     .8637537    .9953188
       2018  |   .9791434    .033789    -0.61   0.541     .9151082     1.04766
       2019  |   .9960691    .030873    -0.13   0.899     .9373604    1.058455
       2020  |   1.014371   .0267424     0.54   0.588     .9632877    1.068163
       2021  |   1.016465   .0203154     0.82   0.414     .9774176    1.057073
       2022  |          1  (omitted)
             |
       _cons |   17.72197   1.768058    28.82   0.000      14.5744    21.54931
-------------+----------------------------------------------------------------
    /lnalpha |  -3.431682   .1148393                     -3.656762   -3.206601
-------------+----------------------------------------------------------------
id           |
   var(_cons)|   2.408516   .2039353                      2.040216    2.843302
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

Code:

menbreg y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 i.year if year<=2022 || id: L5.c.x1 , cov(unstructure) vce(robust) eform intpoints(15)
estimates store rc

Code:

Mixed-effects nbinomial regression              Number of obs     =      3,006
Overdispersion:            mean
Group variable:              id                 Number of groups  =        254

                                                Obs per group:
                                                              min =          4
                                                              avg =       11.8
                                                              max =         12

Integration method: mvaghermite                 Integration pts.  =         15

                                                Wald chi2(17)     =      67.13
Log pseudolikelihood =  -10258.05               Prob > chi2       =     0.0000
                                       (Std. Err. adjusted for 254 clusters in id)
----------------------------------------------------------------------------------
                 |               Robust
              y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
              x1 |
             L5. |   1.044958   .0216087     2.13   0.033     1.003453     1.08818
                 |
              x2 |
             L5. |   1.000673   .0130966     0.05   0.959      .975331    1.026674
                 |
              x3 |
             L5. |   .9796073   .0149627    -1.35   0.177     .9507155    1.009377
                 |
              x4 |
             L5. |   .9918395   .0124067    -0.66   0.512     .9678185    1.016457
                 |
              x5 |
             L5. |   1.003104   .0163533     0.19   0.849     .9715582    1.035673
                 |
              x6 |
             L5. |   1.016806   .0103273     1.64   0.101     .9967654     1.03725
                 |
            year |
           2006  |          1  (empty)
           2007  |          1  (empty)
           2008  |          1  (empty)
           2009  |          1  (empty)
           2010  |          1  (empty)
           2011  |   .8308796   .0507725    -3.03   0.002     .7370956    .9365962
           2012  |   .9199021   .0503215    -1.53   0.127      .826377    1.024012
           2013  |   .9671356   .0469571    -0.69   0.491     .8793447    1.063691
           2014  |   .9498539   .0480901    -1.02   0.310     .8601246    1.048944
           2015  |   .9521785   .0425776    -1.10   0.273     .8722802    1.039395
           2016  |    .945086   .0369091    -1.45   0.148     .8754448    1.020267
           2017  |   .9209528   .0349554    -2.17   0.030     .8549278    .9920769
           2018  |   .9709001   .0346993    -0.83   0.409     .9052181    1.041348
           2019  |   .9962854   .0321001    -0.12   0.908     .9353156    1.061229
           2020  |   1.015637   .0270911     0.58   0.561     .9639033    1.070147
           2021  |   1.015463   .0202364     0.77   0.441     .9765646     1.05591
           2022  |          1  (omitted)
                 |
           _cons |   17.93158    1.77702    29.13   0.000     14.76606    21.77573
-----------------+----------------------------------------------------------------
        /lnalpha |  -3.738889   .1233591                     -3.980668   -3.497109
-----------------+----------------------------------------------------------------
id               |
       var(L5.x1)|   .0274824   .0060578                      .0178414    .0423332
       var(_cons)|   2.360432   .2013258                      1.997059    2.789923
-----------------+----------------------------------------------------------------
id               |
 cov(L5.x1,_cons)|  -.0466977   .0299401    -1.56   0.119    -.1053793    .0119839
----------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

. estimates store rc

. lrtest ri rc, force

Likelihood-ratio test                                 LR chi2(2)  =    115.48
(Assumption: ri nested in rc)                         Prob > chi2 =    0.0000

Code:

predict xb, xb
predict u*, reffects reses(u_se*)
des u*
gen predicted_value = xb + u2 + u1*x1
gen predicted_value_se = sqrt(u_se2^2 + (u_se1*x1)^2)
gen lower_bound = u1 - 1.96 * u_se1
gen upper_bound = u1 + 1.96 * u_se1
egen id_index = group(id)
sort u1
gen id_sorted = _n
twoway (scatter u1 id_sorted, yline(0) xlabel(1(10)254)) ///
       (rcap lower_bound upper_bound id_sorted, lcolor(red) lwidth(vvthin))

Graph1.gph

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 435
#4

20 Sep 2024, 11:28

Generally speaking, with these types of models, you are not looking at the random effects model output to determine statistical significance. Instead, as you have done, you look at model comparison tests. Indeed, the model with the random slope and random slope-intercept covariance, provides a better fit to your data than the random intercept only model. In terms of the differences in the slopes across clusters, it is entirely possible that there can be small between cluster differences in the slope for a particular variable. With large enough sample sizes, even a small parameter estimate can achieve statistical significance. This is why many folks do not recommend using statistical significance as a criteria in evaluating a model or parameter estimate.

A logical next step in models such as these, when you find evidence supporting variation in the slope for a particular variable, would be to consider whether you have cluster-level information (variables) that might explain such variation. To explore this, you would include an interaction between the L5.x1 predictor and a cluster-level variable. However, if that is not the point of your analysis, then perhaps you stop here.
Comment
Doris Rivera

Join Date: Feb 2020

Posts: 172
#5

22 Sep 2024, 03:48

Dear Erik Ruzek, thanks for your answer. I see your point (significance of random slope predictions are not much relevant, if the test says that random coeff model is preferred). But still, it is hard to me to understand why if the random slope is highly significant in the model, most of the predicted random slopes in the plot in #3 are not. So, I am not sure if I calculated wrong the confidence intervals for the plot.

As you say, this is just a first step in my analysis, my plan is to derive a kind of efficiency measure from the coefficients (fixed + random) for x1 variable to use in a second regression. The hypothesis would be that the effect of x1 is different for each region (which I get). However, that regression will give time invariant kind of predictions for this efficiency measure, and it will let me do only between regions comparisons in the second part of my analysis. So, my aim now is how to get a yearly prediction for this measurement of the efficiency of x1.

Since the number of years in my panel are not much, I discarded using “year” as a level (random effect). Even though the panel is from 2006-2022, given the nature of the dep variable I have to lag 5 periods (reduces the panel). So, my next idea is to interact x1 with years in the fix part of the model (L5.c.x1##i.year) including x1 as a random variable as before. And then use the coefficients from fixed and random part to get this prediction for the efficiency measure in each year:
e.g., for 2015 would be: [(beta_x1 + beta__{x1*year_2015}) + random coeff_x1].

Where the first coeff would be x1 without interaction, the second coeff would be the one for the interaction in the current year (e.g. 2015), and the third one would be the random coeff for x1.

I believe this prediction will give me how regions differ in taking profit from x1, but now would be yearly (not time-invariant as before). My main issue is what to do if some of the interactions are not significant, should I calculate the measure anyway?

I mean, what would be the implications of a prediction for a year if its interaction is not significant?

If you have any other idea to make this efficiency measure time varying, please let me know.

Thanks!
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 435
#6

22 Sep 2024, 10:55

Doris,

This is not something that I have seen before, so I don't have a ton of ideas about how to help you move forward. One thought is that you could allow for random intercepts for year (you have about 10, which is probably enough) and then allow the x1 variable's slope to also vary across years. In econometrics, models such as these are sometimes referred to as two-way error components models. Years are crossed with firms and so the syntax would be as follows:

Code:

menbreg y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 year<=2022 || _all:R.year || id: L5.c.x1 , cov(unstructure) vce(robust) eform intpoints(15)

The problem is that you cannot specify a random slope with these models in the me suite of commands. So you cannot allow the c.x1 variable's slope to vary across years. You would need to go over to gsem for this. I believe that the following syntax would work, but I am not 100% sure:

Code:

gsem y1 L5.c.x1 L5.c.x2 L5.c.x3 L5.c.x4 L5.c.x5 L5.c.x6 year<=2022 M1[year] M2[id] L5.c.x1#M3[year] :L5.c.x1#M4[id], latent(M1 M2 M3 M4) vce(robust) nbreg

Using similar code as before, you could look at how the L5.cx1 random slope varies across years. The random effects are additive, so you can look at the combined effect of x1 for each cluster and year.

This may not be what you want, but it is probably how I would move forward with the analysis. In general with these models, it is better to keep the latent variables intact rather than predicting them and running subsequent regressions. Mainly this is due to the rather large uncertainty in the a given cluster's value (see your graph in post #3. Keeping them latent preserves the uncertainty, reducing both sampling and measurement error.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2178
#7

22 Sep 2024, 16:59

Is the primary goal prediction or are you trying to infer causality? Frankly, multi-level models aren't the best for either of these. The random effects are assumed to be independent of the explanatory variables as menbreg is being implemented. For causality, you should use the fixed effects Poisson estimator, which is fully robust to distributional misspecification and serial correlation.

FE Poisson won't be great for prediction because you have to plug in values for the unobserved effects. Of course, multi-level models essentially as for that, too. You can use the version of Poisson regression that includes region-specific averages of the covariates. This is similar to fixed effects but won't rely on noisy estimates of the unobserved effects.

Multi-level models actually impose lots of assumptions: heterogeneity independent of x(i,t) and no serial correlation are two. Plus, the distributional assumptions have to be correct. And, computationally, they're not straightforward.
Comment

Doris Rivera

Join Date: Feb 2020
Posts: 172

23 Sep 2024, 08:02

Dear Erik Ruzek. Thanks again for your help, really! About your suggestion of using year as random effect, I am not sure since these are only 11 years (too short number of groups). I believe I can do the same (with less computational issues and random effect properties) interacting “x1” with years in the fixed part of the model. Even though I am not sure what to do if an interaction is not significant.

Dear Professor Jeff Wooldridge, thanks for your help! The idea is to use this first step coefficients for x1 (fixed and random) to build a measure of the efficiency each region have with respect to x1 and y1, and use that prediction into a second regression. This is why I chose the multilevel model, because I can model the regional (id) differences with respect to x1 using the random coefficient for x1. In my model I am using standardized variables (x1,…x6) but using each region average and standard deviation. So, I believe that I am accessing to the within effect since I demean by each region average, and the division by the region’s standard deviation allow me to easily compare parameters. That way, I think that the correlation with the random effect is dropped (because of the average demeaned step). Please correct me if I am wrong.

However, my main problem now is that the multilevel model I showed in #3, gives me a time-invariant measure for this efficiency (unique within region, but different between regions). And I am trying to come up with a way in which this efficiency measure is also within region. This is why I think of interacting “x1” with years and use each interaction in the prediction in order to have it time varying. But I am not sure of what to do if the interaction for a given year is not significant and the implication of building for a given year that efficiency measure if the interaction is not significant (see the regression and the predictions for each year below).

Which other ways could I try for obtaining this regional efficiency measure, that vary for each region and year?

Code:

Mixed-effects nbinomial regression              Number of obs     =      3,006
Overdispersion:            mean
Group variable:              id                 Number of groups  =        254

                                                Obs per group:
                                                              min =          4
                                                              avg =       11.8
                                                              max =         12

Integration method: mvaghermite                 Integration pts.  =         18

                                                Wald chi2(28)     =     110.46
Log pseudolikelihood = -10246.172               Prob > chi2       =     0.0000
                                       (Std. Err. adjusted for 254 clusters in id)
----------------------------------------------------------------------------------
                 |               Robust
              y1 |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
              x1 |
             L5. |   .9471795   .0587929    -0.87   0.382     .8386811    1.069714
                 |
            year |
              1  |          1  (empty)
           2006  |          1  (empty)
           2007  |          1  (empty)
           2008  |          1  (empty)
           2009  |          1  (empty)
           2010  |          1  (empty)
           2011  |   .7253915    .055117    -4.23   0.000     .6250231    .8418773
           2012  |    .890322   .0550077    -1.88   0.060     .7887809    1.004935
           2013  |    .944976   .0487513    -1.10   0.273     .8540973    1.045525
           2014  |   .9463274   .0510981    -1.02   0.307     .8512943    1.051969
           2015  |   .9443329   .0437772    -1.24   0.217     .8623137    1.034153
           2016  |   .9410256   .0390412    -1.47   0.143     .8675348    1.020742
           2017  |   .9174695   .0379242    -2.08   0.037     .8460707    .9948935
           2018  |   .9706285   .0427292    -0.68   0.498      .890392    1.058095
           2019  |   1.014845   .0374759     0.40   0.690     .9439885     1.09102
           2020  |   .9973637   .0312183    -0.08   0.933     .9380159    1.060466
           2021  |   .9961875   .0234948    -0.16   0.871     .9511866    1.043317
           2022  |          1  (omitted)
                 |
     year#cL5.x1 |
           2006  |          1  (empty)
           2007  |          1  (empty)
           2008  |          1  (empty)
           2009  |          1  (empty)
           2010  |          1  (empty)
           2011  |   1.015037   .0743515     0.20   0.839     .8792884    1.171743
           2012  |   1.087937   .0785062     1.17   0.243      .944453    1.253219
           2013  |   1.099021   .0860091     1.21   0.228     .9427381    1.281211
           2014  |   1.137622    .079224     1.85   0.064     .9924766    1.303994
           2015  |   1.142391   .0765517     1.99   0.047     1.001788    1.302728
           2016  |   1.165437   .0766592     2.33   0.020      1.02447    1.325801
           2017  |    1.16328   .0821751     2.14   0.032     1.012872    1.336023
           2018  |   1.157751   .1035184     1.64   0.101     .9716427    1.379507
           2019  |    1.21866   .0868489     2.77   0.006     1.059793    1.401342
           2020  |   1.104429   .0984589     1.11   0.265     .9273716     1.31529
           2021  |   1.030108   .0645353     0.47   0.636     .9110789    1.164689
           2022  |          1  (omitted)
                 |
              x2 |
             L5. |   1.001798   .0131664     0.14   0.891     .9763214    1.027938
                 |
              x3 |
             L5. |   .9759461   .0145318    -1.64   0.102     .9478759    1.004848
                 |
              x4 |
             L5. |   .9962871   .0123361    -0.30   0.764     .9723999    1.020761
                 |
              x5 |
             L5. |   1.005562    .016348     0.34   0.733     .9740252    1.038119
                 |
              x6 |
             L5. |   1.015903   .0103353     1.55   0.121     .9958464    1.036363
                 |
           _cons |   18.25808   1.836593    28.88   0.000     14.99105    22.23709
-----------------+----------------------------------------------------------------
        /lnalpha |  -3.761077   .1256267                       -4.0073   -3.514853
-----------------+----------------------------------------------------------------
id               |
       var(L5.x1)|   .0269591   .0060309                      .0173895     .041795
       var(_cons)|   2.367242   .2014625                      2.003557    2.796943
-----------------+----------------------------------------------------------------
id               |
 cov(L5.x1,_cons)|  -.0442271   .0297488    -1.49   0.137    -.1025337    .0140796
----------------------------------------------------------------------------------

Code:

predict u1, reffects
gen baseline_ppp = _b[L5.x1] + u1
gen ppp_year = .  
foreach yr in 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 {
    replace ppp_year = baseline_ppp + _b[`yr'.year#cL5.x1] if year == `yr'
}

Announcement

prediction after a count random coefficient model

Comment

Comment

Comment

Comment

Comment

Comment

Comment