How to choose between different regression models

Conor Cotton

Join Date: Jul 2020
Posts: 8

How to choose between different regression models

20 Aug 2020, 06:23

Hello,

One of the dependent variables for one of my statistical tests, 'distvol' is a variable measuring the ideological distance of voter party preference shifts during an election campaign. 'Distvol' can take any integer value from 0 to 7.
For example, a respondent with a 'distvol' score of 7 indicates that they switched their vote preference from the farthest left party to the farthest right party, while a score of 1 indicates they switched their preference between two ideologically close parties.
Most respondents, however, did not switch their preference, meaning that they got a 'distvol' value of 0. The distribution of 'distvol' therefore has a lot of zeros in it (see below).
For some context, my research is looking at the impact of voters' online media consumption on changes in their party preference ("electoral volatility") during an election campaign. I hypothesise that those who consume more online media will switch to ideologically closer parties than those who consume less online media.

I have two related questions:

Which type of regression model do you think would work best here? Below are the options I'm considering
- Both poisson and negative binomial regression would work with this distribution, I'm fairly sure, even though the data is not true count data, as the only other similar study I've seen using this data used a negative binomial.
- Generalised linear model with family(bin 10) and link(logit) - This is very similar to negative binomial, and is bounded at the high end (unlike negative binomial).
- Ordinal logit - Some have suggested that the distribution would better suit an ologit, since it's not technically count data.
- (Incidentally, all of the different types of models do give very similar results, shown below).

Second, would you be able to give me any advice on how to go about comparing the performance of the different models/tests to justify which model to choose.

Very grateful for any advice, thanks in advance.

Code:

 tab distvol

    distvol |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,304       89.38       89.38
          1 |         33        2.26       91.64
          2 |         57        3.91       95.54
          3 |         29        1.99       97.53
          4 |         14        0.96       98.49
          5 |         14        0.96       99.45
          6 |          5        0.34       99.79
          7 |          3        0.21      100.00
------------+-----------------------------------
      Total |      1,459      100.00

NEGATIVE BINOMIAL

Code:

Fitting Poisson model:

Iteration 0:   log pseudolikelihood = -829.75972  
Iteration 1:   log pseudolikelihood = -829.64878  
Iteration 2:   log pseudolikelihood = -829.64845  
Iteration 3:   log pseudolikelihood = -829.64845  

Fitting constant-only model:

Iteration 0:   log pseudolikelihood = -743.19607  (not concave)
Iteration 1:   log pseudolikelihood = -606.19316  
Iteration 2:   log pseudolikelihood = -599.21043  
Iteration 3:   log pseudolikelihood = -599.19066  
Iteration 4:   log pseudolikelihood = -599.19066  

Fitting full model:

Iteration 0:   log pseudolikelihood = -587.16259  
Iteration 1:   log pseudolikelihood = -584.36333  
Iteration 2:   log pseudolikelihood = -584.11853  
Iteration 3:   log pseudolikelihood = -584.11808  
Iteration 4:   log pseudolikelihood = -584.11808  

Negative binomial regression                    Number of obs     =      1,344
                                                Wald chi2(13)     =      48.32
Dispersion           = mean                     Prob > chi2       =     0.0000
Log pseudolikelihood = -584.11808               Pseudo R2         =     0.0252

-----------------------------------------------------------------------------------------------
                              |               Robust
                      distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
                    onlinemed |
               high exposure  |   .1395064   .3363322     0.41   0.678    -.5196926    .7987054
                              |
                     highknow |
              high knowledge  |   .2282583   .3450022     0.66   0.508    -.4479336    .9044502
                              |
           onlinemed#highknow |
high exposure#high knowledge  |  -1.243398   .5446583    -2.28   0.022    -2.310909   -.1758877
                              |
                  socialmedia |  -.0270818   .0775339    -0.35   0.727    -.1790455    .1248819
                        age_i |  -.0141096    .008021    -1.76   0.079    -.0298306    .0016113
                       female |  -.6162776    .223444    -2.76   0.006     -1.05422   -.1783354
                   highincome |  -.0716774   .2397122    -0.30   0.765    -.5415048    .3981499
            partyclose_binary |  -1.497653   .4460845    -3.36   0.001    -2.371963   -.6233437
                  leftright_i |  -.0361103   .0507479    -0.71   0.477    -.1355744    .0633539
                        farlr |  -.5992872   .3480854    -1.72   0.085    -1.281522    .0829477
               political_mood |    .008102   .0098578     0.82   0.411     -.011219     .027423
               networkhet12_i |   .1115985   .0556922     2.00   0.045     .0024438    .2207531
                     nptvnews |  -.0467189   .0517432    -0.90   0.367    -.1481338     .054696
                        _cons |   -.484885   .5823316    -0.83   0.405    -1.626234     .656464
------------------------------+----------------------------------------------------------------
                     /lnalpha |   2.487341   .1500471                      2.193254    2.781428
------------------------------+----------------------------------------------------------------
                        alpha |   12.02925   1.804953                      8.964335    16.14205
-----------------------------------------------------------------------------------------------

POISSON

Code:

Iteration 0:   log pseudolikelihood = -829.75972  
Iteration 1:   log pseudolikelihood = -829.64878  
Iteration 2:   log pseudolikelihood = -829.64845  
Iteration 3:   log pseudolikelihood = -829.64845  

Poisson regression                              Number of obs     =      1,344
                                                Wald chi2(13)     =      42.61
Log pseudolikelihood = -829.64845               Prob > chi2       =     0.0001

-----------------------------------------------------------------------------------------------
                              |               Robust
                      distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
                    onlinemed |
               high exposure  |   .1520361   .3637021     0.42   0.676     -.560807    .8648792
                              |
                     highknow |
              high knowledge  |   .1991359   .3109782     0.64   0.522    -.4103702    .8086421
                              |
           onlinemed#highknow |
high exposure#high knowledge  |  -1.217585   .5514483    -2.21   0.027    -2.298404   -.1367663
                              |
                  socialmedia |  -.0105604   .0821744    -0.13   0.898    -.1716192    .1504983
                        age_i |  -.0106576   .0081525    -1.31   0.191    -.0266363     .005321
                       female |  -.4278774    .258046    -1.66   0.097    -.9336382    .0778834
                   highincome |   .0076763   .2400212     0.03   0.974    -.4627566    .4781092
            partyclose_binary |    -1.3393   .5511951    -2.43   0.015    -2.419622   -.2589773
                  leftright_i |  -.0679288   .0469811    -1.45   0.148      -.16001    .0241524
                        farlr |  -.5089138   .3878363    -1.31   0.189    -1.269059    .2512312
               political_mood |   .0072836   .0089259     0.82   0.414    -.0102109     .024778
               networkhet12_i |    .080292   .0549722     1.46   0.144    -.0274516    .1880356
                     nptvnews |  -.0447907    .041423    -1.08   0.280    -.1259783     .036397
                        _cons |  -.4876739   .6399943    -0.76   0.446     -1.74204     .766692
-----------------------------------------------------------------------------------------------

GLM

Code:

Iteration 0:   log pseudolikelihood = -928.17693  
Iteration 1:   log pseudolikelihood = -861.78275  
Iteration 2:   log pseudolikelihood = -860.77761  
Iteration 3:   log pseudolikelihood = -860.76821  
Iteration 4:   log pseudolikelihood = -860.76821  

Generalized linear models                         Number of obs   =      1,344
Optimization     : ML                             Residual df     =      1,330
                                                  Scale parameter =          1
Deviance         =   1437.49301                   (1/df) Deviance =   1.080822
Pearson          =  3385.188563                   (1/df) Pearson  =   2.545255

Variance function: V(u) = u*(1-u/10)              [Binomial]
Link function    : g(u) = ln(u/(10-u))            [Logit]

                                                  AIC             =   1.301738
Log pseudolikelihood = -860.7682062               BIC             =  -8143.036

-----------------------------------------------------------------------------------------------
                              |               Robust
                      distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
                    onlinemed |
               high exposure  |    .157355   .3766299     0.42   0.676     -.580826    .8955359
                              |
                     highknow |
              high knowledge  |   .2063968    .321822     0.64   0.521    -.4243627    .8371564
                              |
           onlinemed#highknow |
high exposure#high knowledge  |  -1.247359   .5654925    -2.21   0.027    -2.355704   -.1390137
                              |
                  socialmedia |  -.0108449   .0848111    -0.13   0.898    -.1770717    .1553818
                        age_i |  -.0109973   .0084091    -1.31   0.191    -.0274789    .0054843
                       female |  -.4410633   .2655512    -1.66   0.097     -.961534    .0794074
                   highincome |   .0072811   .2479066     0.03   0.977     -.478607    .4931691
            partyclose_binary |  -1.363248   .5564878    -2.45   0.014    -2.453944   -.2725517
                  leftright_i |  -.0700498   .0484818    -1.44   0.148    -.1650723    .0249728
                        farlr |  -.5202852   .3958464    -1.31   0.189     -1.29613    .2555595
               political_mood |   .0075074   .0092249     0.81   0.416     -.010573    .0255878
               networkhet12_i |   .0829316   .0567113     1.46   0.144    -.0282206    .1940837
                     nptvnews |  -.0463583   .0428227    -1.08   0.279    -.1302892    .0375726
                        _cons |  -2.734244   .6614316    -4.13   0.000    -4.030626   -1.437862
-----------------------------------------------------------------------------------------------

OLOGIT

Code:

Iteration 0:   log pseudolikelihood = -564.33681  
Iteration 1:   log pseudolikelihood = -546.23464  
Iteration 2:   log pseudolikelihood = -544.42317  
Iteration 3:   log pseudolikelihood = -544.39738  
Iteration 4:   log pseudolikelihood = -544.39733  
Iteration 5:   log pseudolikelihood = -544.39733  

Ordered logistic regression                     Number of obs     =      1,344
                                                Wald chi2(13)     =      36.22
                                                Prob > chi2       =     0.0005
Log pseudolikelihood = -544.39733               Pseudo R2         =     0.0353

-----------------------------------------------------------------------------------------------
                              |               Robust
                      distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
                    onlinemed |
               high exposure  |   .0758672   .3551337     0.21   0.831     -.620182    .7719163
                              |
                     highknow |
              high knowledge  |   .2898168   .3289096     0.88   0.378    -.3548342    .9344678
                              |
           onlinemed#highknow |
high exposure#high knowledge  |  -1.119558    .560211    -2.00   0.046    -2.217552    -.021565
                              |
                  socialmedia |  -.0388585   .0819758    -0.47   0.635    -.1995281    .1218111
                        age_i |  -.0161103   .0093542    -1.72   0.085    -.0344442    .0022236
                       female |  -.4300516   .2629844    -1.64   0.102    -.9454915    .0853883
                   highincome |  -.0384396   .2422341    -0.16   0.874    -.5132098    .4363305
            partyclose_binary |  -1.346552   .6240141    -2.16   0.031    -2.569597   -.1235066
                  leftright_i |  -.0849347    .053184    -1.60   0.110    -.1891734    .0193041
                        farlr |  -.5247534   .3937558    -1.33   0.183    -1.296501    .2469938
               political_mood |   .0027973   .0085472     0.33   0.743    -.0139549    .0195495
               networkhet12_i |   .0695643   .0470031     1.48   0.139    -.0225601    .1616888
                     nptvnews |  -.0395345    .046652    -0.85   0.397    -.1309707    .0519016
------------------------------+----------------------------------------------------------------
                        /cut1 |   .6980686   .7310941                     -.7348496    2.130987
                        /cut2 |   .9451905   .7192096                     -.4644345    2.354816
                        /cut3 |   1.757095   .7208377                      .3442787    3.169911
                        /cut4 |   2.558683   .7413626                      1.105639    4.011727
                        /cut5 |   3.153227   .7592492                      1.665126    4.641328
                        /cut6 |   4.411359   .8810093                      2.684612    6.138105
                        /cut7 |   6.057117   1.033805                      4.030896    8.083338
-----------------------------------------------------------------------------------------------

Tags: None

Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#2

20 Aug 2020, 06:49

You can use the information criteria to help choose between estimators. After each model type

Code:

estat ic

which will report the BIC and AIC statistics. Smaller is better and you can use this to choose between estimators.
Comment
Conor Cotton

Join Date: Jul 2020

Posts: 8
#3

20 Aug 2020, 07:18

Hi Chris,

Thanks for your reply. I've tried that, and it's giving me wildly different AIC/BIC figures for the different models (ranging from 1000 to 1600), so I'm not sure whether they are valid for comparing across different model types?
Comment
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#4

20 Aug 2020, 08:35

Well, I think you can use it to justify your choice. At least, I have seen users report AIC and BIC to choose between poisson and negative binomial estimators. Perhaps extending it to other estimators might be more of a stretch.

Ultimately, your choice should be based on the most appropriate model. From a practical perspective, assuming this is for a research paper, one strategy might be to provide the results for the negative binomial estimator, since this is what a previous study did. You can then discuss how the results are robust to different estimators, which use different modeling assumptions.
Comment

Announcement

How to choose between different regression models

Comment

Comment

Comment