Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to choose between different regression models

    Hello,
    • One of the dependent variables for one of my statistical tests, 'distvol' is a variable measuring the ideological distance of voter party preference shifts during an election campaign. 'Distvol' can take any integer value from 0 to 7.
    • For example, a respondent with a 'distvol' score of 7 indicates that they switched their vote preference from the farthest left party to the farthest right party, while a score of 1 indicates they switched their preference between two ideologically close parties.
    • Most respondents, however, did not switch their preference, meaning that they got a 'distvol' value of 0. The distribution of 'distvol' therefore has a lot of zeros in it (see below).
    • For some context, my research is looking at the impact of voters' online media consumption on changes in their party preference ("electoral volatility") during an election campaign. I hypothesise that those who consume more online media will switch to ideologically closer parties than those who consume less online media.
    I have two related questions:
    • Which type of regression model do you think would work best here? Below are the options I'm considering
      • Both poisson and negative binomial regression would work with this distribution, I'm fairly sure, even though the data is not true count data, as the only other similar study I've seen using this data used a negative binomial.
      • Generalised linear model with family(bin 10) and link(logit) - This is very similar to negative binomial, and is bounded at the high end (unlike negative binomial).
      • Ordinal logit - Some have suggested that the distribution would better suit an ologit, since it's not technically count data.
      • (Incidentally, all of the different types of models do give very similar results, shown below).
    • Second, would you be able to give me any advice on how to go about comparing the performance of the different models/tests to justify which model to choose.
    Very grateful for any advice, thanks in advance.

    Code:
     tab distvol
    
        distvol |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |      1,304       89.38       89.38
              1 |         33        2.26       91.64
              2 |         57        3.91       95.54
              3 |         29        1.99       97.53
              4 |         14        0.96       98.49
              5 |         14        0.96       99.45
              6 |          5        0.34       99.79
              7 |          3        0.21      100.00
    ------------+-----------------------------------
          Total |      1,459      100.00
    NEGATIVE BINOMIAL

    Code:
    Fitting Poisson model:
    
    Iteration 0:   log pseudolikelihood = -829.75972  
    Iteration 1:   log pseudolikelihood = -829.64878  
    Iteration 2:   log pseudolikelihood = -829.64845  
    Iteration 3:   log pseudolikelihood = -829.64845  
    
    Fitting constant-only model:
    
    Iteration 0:   log pseudolikelihood = -743.19607  (not concave)
    Iteration 1:   log pseudolikelihood = -606.19316  
    Iteration 2:   log pseudolikelihood = -599.21043  
    Iteration 3:   log pseudolikelihood = -599.19066  
    Iteration 4:   log pseudolikelihood = -599.19066  
    
    Fitting full model:
    
    Iteration 0:   log pseudolikelihood = -587.16259  
    Iteration 1:   log pseudolikelihood = -584.36333  
    Iteration 2:   log pseudolikelihood = -584.11853  
    Iteration 3:   log pseudolikelihood = -584.11808  
    Iteration 4:   log pseudolikelihood = -584.11808  
    
    Negative binomial regression                    Number of obs     =      1,344
                                                    Wald chi2(13)     =      48.32
    Dispersion           = mean                     Prob > chi2       =     0.0000
    Log pseudolikelihood = -584.11808               Pseudo R2         =     0.0252
    
    -----------------------------------------------------------------------------------------------
                                  |               Robust
                          distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------+----------------------------------------------------------------
                        onlinemed |
                   high exposure  |   .1395064   .3363322     0.41   0.678    -.5196926    .7987054
                                  |
                         highknow |
                  high knowledge  |   .2282583   .3450022     0.66   0.508    -.4479336    .9044502
                                  |
               onlinemed#highknow |
    high exposure#high knowledge  |  -1.243398   .5446583    -2.28   0.022    -2.310909   -.1758877
                                  |
                      socialmedia |  -.0270818   .0775339    -0.35   0.727    -.1790455    .1248819
                            age_i |  -.0141096    .008021    -1.76   0.079    -.0298306    .0016113
                           female |  -.6162776    .223444    -2.76   0.006     -1.05422   -.1783354
                       highincome |  -.0716774   .2397122    -0.30   0.765    -.5415048    .3981499
                partyclose_binary |  -1.497653   .4460845    -3.36   0.001    -2.371963   -.6233437
                      leftright_i |  -.0361103   .0507479    -0.71   0.477    -.1355744    .0633539
                            farlr |  -.5992872   .3480854    -1.72   0.085    -1.281522    .0829477
                   political_mood |    .008102   .0098578     0.82   0.411     -.011219     .027423
                   networkhet12_i |   .1115985   .0556922     2.00   0.045     .0024438    .2207531
                         nptvnews |  -.0467189   .0517432    -0.90   0.367    -.1481338     .054696
                            _cons |   -.484885   .5823316    -0.83   0.405    -1.626234     .656464
    ------------------------------+----------------------------------------------------------------
                         /lnalpha |   2.487341   .1500471                      2.193254    2.781428
    ------------------------------+----------------------------------------------------------------
                            alpha |   12.02925   1.804953                      8.964335    16.14205
    -----------------------------------------------------------------------------------------------
    POISSON

    Code:
    Iteration 0:   log pseudolikelihood = -829.75972  
    Iteration 1:   log pseudolikelihood = -829.64878  
    Iteration 2:   log pseudolikelihood = -829.64845  
    Iteration 3:   log pseudolikelihood = -829.64845  
    
    Poisson regression                              Number of obs     =      1,344
                                                    Wald chi2(13)     =      42.61
    Log pseudolikelihood = -829.64845               Prob > chi2       =     0.0001
    
    -----------------------------------------------------------------------------------------------
                                  |               Robust
                          distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------+----------------------------------------------------------------
                        onlinemed |
                   high exposure  |   .1520361   .3637021     0.42   0.676     -.560807    .8648792
                                  |
                         highknow |
                  high knowledge  |   .1991359   .3109782     0.64   0.522    -.4103702    .8086421
                                  |
               onlinemed#highknow |
    high exposure#high knowledge  |  -1.217585   .5514483    -2.21   0.027    -2.298404   -.1367663
                                  |
                      socialmedia |  -.0105604   .0821744    -0.13   0.898    -.1716192    .1504983
                            age_i |  -.0106576   .0081525    -1.31   0.191    -.0266363     .005321
                           female |  -.4278774    .258046    -1.66   0.097    -.9336382    .0778834
                       highincome |   .0076763   .2400212     0.03   0.974    -.4627566    .4781092
                partyclose_binary |    -1.3393   .5511951    -2.43   0.015    -2.419622   -.2589773
                      leftright_i |  -.0679288   .0469811    -1.45   0.148      -.16001    .0241524
                            farlr |  -.5089138   .3878363    -1.31   0.189    -1.269059    .2512312
                   political_mood |   .0072836   .0089259     0.82   0.414    -.0102109     .024778
                   networkhet12_i |    .080292   .0549722     1.46   0.144    -.0274516    .1880356
                         nptvnews |  -.0447907    .041423    -1.08   0.280    -.1259783     .036397
                            _cons |  -.4876739   .6399943    -0.76   0.446     -1.74204     .766692
    -----------------------------------------------------------------------------------------------
    GLM

    Code:
    Iteration 0:   log pseudolikelihood = -928.17693  
    Iteration 1:   log pseudolikelihood = -861.78275  
    Iteration 2:   log pseudolikelihood = -860.77761  
    Iteration 3:   log pseudolikelihood = -860.76821  
    Iteration 4:   log pseudolikelihood = -860.76821  
    
    Generalized linear models                         Number of obs   =      1,344
    Optimization     : ML                             Residual df     =      1,330
                                                      Scale parameter =          1
    Deviance         =   1437.49301                   (1/df) Deviance =   1.080822
    Pearson          =  3385.188563                   (1/df) Pearson  =   2.545255
    
    Variance function: V(u) = u*(1-u/10)              [Binomial]
    Link function    : g(u) = ln(u/(10-u))            [Logit]
    
                                                      AIC             =   1.301738
    Log pseudolikelihood = -860.7682062               BIC             =  -8143.036
    
    -----------------------------------------------------------------------------------------------
                                  |               Robust
                          distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------+----------------------------------------------------------------
                        onlinemed |
                   high exposure  |    .157355   .3766299     0.42   0.676     -.580826    .8955359
                                  |
                         highknow |
                  high knowledge  |   .2063968    .321822     0.64   0.521    -.4243627    .8371564
                                  |
               onlinemed#highknow |
    high exposure#high knowledge  |  -1.247359   .5654925    -2.21   0.027    -2.355704   -.1390137
                                  |
                      socialmedia |  -.0108449   .0848111    -0.13   0.898    -.1770717    .1553818
                            age_i |  -.0109973   .0084091    -1.31   0.191    -.0274789    .0054843
                           female |  -.4410633   .2655512    -1.66   0.097     -.961534    .0794074
                       highincome |   .0072811   .2479066     0.03   0.977     -.478607    .4931691
                partyclose_binary |  -1.363248   .5564878    -2.45   0.014    -2.453944   -.2725517
                      leftright_i |  -.0700498   .0484818    -1.44   0.148    -.1650723    .0249728
                            farlr |  -.5202852   .3958464    -1.31   0.189     -1.29613    .2555595
                   political_mood |   .0075074   .0092249     0.81   0.416     -.010573    .0255878
                   networkhet12_i |   .0829316   .0567113     1.46   0.144    -.0282206    .1940837
                         nptvnews |  -.0463583   .0428227    -1.08   0.279    -.1302892    .0375726
                            _cons |  -2.734244   .6614316    -4.13   0.000    -4.030626   -1.437862
    -----------------------------------------------------------------------------------------------
    OLOGIT

    Code:
    Iteration 0:   log pseudolikelihood = -564.33681  
    Iteration 1:   log pseudolikelihood = -546.23464  
    Iteration 2:   log pseudolikelihood = -544.42317  
    Iteration 3:   log pseudolikelihood = -544.39738  
    Iteration 4:   log pseudolikelihood = -544.39733  
    Iteration 5:   log pseudolikelihood = -544.39733  
    
    Ordered logistic regression                     Number of obs     =      1,344
                                                    Wald chi2(13)     =      36.22
                                                    Prob > chi2       =     0.0005
    Log pseudolikelihood = -544.39733               Pseudo R2         =     0.0353
    
    -----------------------------------------------------------------------------------------------
                                  |               Robust
                          distvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------+----------------------------------------------------------------
                        onlinemed |
                   high exposure  |   .0758672   .3551337     0.21   0.831     -.620182    .7719163
                                  |
                         highknow |
                  high knowledge  |   .2898168   .3289096     0.88   0.378    -.3548342    .9344678
                                  |
               onlinemed#highknow |
    high exposure#high knowledge  |  -1.119558    .560211    -2.00   0.046    -2.217552    -.021565
                                  |
                      socialmedia |  -.0388585   .0819758    -0.47   0.635    -.1995281    .1218111
                            age_i |  -.0161103   .0093542    -1.72   0.085    -.0344442    .0022236
                           female |  -.4300516   .2629844    -1.64   0.102    -.9454915    .0853883
                       highincome |  -.0384396   .2422341    -0.16   0.874    -.5132098    .4363305
                partyclose_binary |  -1.346552   .6240141    -2.16   0.031    -2.569597   -.1235066
                      leftright_i |  -.0849347    .053184    -1.60   0.110    -.1891734    .0193041
                            farlr |  -.5247534   .3937558    -1.33   0.183    -1.296501    .2469938
                   political_mood |   .0027973   .0085472     0.33   0.743    -.0139549    .0195495
                   networkhet12_i |   .0695643   .0470031     1.48   0.139    -.0225601    .1616888
                         nptvnews |  -.0395345    .046652    -0.85   0.397    -.1309707    .0519016
    ------------------------------+----------------------------------------------------------------
                            /cut1 |   .6980686   .7310941                     -.7348496    2.130987
                            /cut2 |   .9451905   .7192096                     -.4644345    2.354816
                            /cut3 |   1.757095   .7208377                      .3442787    3.169911
                            /cut4 |   2.558683   .7413626                      1.105639    4.011727
                            /cut5 |   3.153227   .7592492                      1.665126    4.641328
                            /cut6 |   4.411359   .8810093                      2.684612    6.138105
                            /cut7 |   6.057117   1.033805                      4.030896    8.083338
    -----------------------------------------------------------------------------------------------

  • #2
    You can use the information criteria to help choose between estimators. After each model type
    Code:
    estat ic
    which will report the BIC and AIC statistics. Smaller is better and you can use this to choose between estimators.

    Comment


    • #3
      Hi Chris,

      Thanks for your reply. I've tried that, and it's giving me wildly different AIC/BIC figures for the different models (ranging from 1000 to 1600), so I'm not sure whether they are valid for comparing across different model types?

      Comment


      • #4
        Well, I think you can use it to justify your choice. At least, I have seen users report AIC and BIC to choose between poisson and negative binomial estimators. Perhaps extending it to other estimators might be more of a stretch.

        Ultimately, your choice should be based on the most appropriate model. From a practical perspective, assuming this is for a research paper, one strategy might be to provide the results for the negative binomial estimator, since this is what a previous study did. You can then discuss how the results are robust to different estimators, which use different modeling assumptions.

        Comment

        Working...
        X