How to select best fit model using Stata's countfit?

Monzur Alam

Join Date: Dec 2014
Posts: 55

How to select best fit model using Stata's countfit?

28 Apr 2015, 20:01

Dear All,

I am using the countfit command in Stata 13 to determine which count regression model is the best fit for the a model of number of current domestic migrants, with independent variables: workers in household, nonworkers in household, age of household head, land owned, number of cows, business ownership. Below is the code and results (I have not included results for zip and prm because nbreg and zinb are the best fits). However, the results seem a little contradictory to me (or I might not be understanding them well).

- From the graph, it appears that the NBRM is doing better than the ZINB
- When we are comparing the mean observed and predicted count we see that the NBRM is problematic at value=3, and ZINB at value=1.
- When comparing the actual and predicted probabilities, the sum of the Pearson column gives us a sense of how close the predicted proportions were to the actual proportions. So here it appears that the NBRM is performing better
- When looking at the Tests and Fit Statistics, it appears that the ZINB is preferred over the NBRM in two of the tests (AIC and Vuong)

I was wondering whether anyone could suggest which model seems like a better fit from the results, and why?

Thanks in advance.

Code:

countfit totcurr_dommig hage num_workers num_nonworkers own_land_hec num_cows_owned business_owned, inflate(hage community2-community7) nbreg zinb

 
.  
 
  
 
 
--------------------------------------------------------
                      Variable |   NBRM        ZINB    
-------------------------------+------------------------
totcurr_dommig                 |
                    (max) hage |     1.028       1.024
                               |     13.18       10.13
             (max) num_workers |     1.108       1.110
                               |      3.51        3.70
          (max) num_nonworkers |     1.084       1.096
                               |      4.41        5.10
            (max) own_land_hec |     0.999       0.999
                               |     -1.51       -1.62
          (max) num_cows_owned |     1.002       1.002
                               |      1.29        1.29
          (max) business_owned |     1.076       1.040
                               |      1.03        0.56
                      Constant |     0.244       0.341
                               |     -9.80       -6.94
-------------------------------+------------------------
lnalpha                        |
                      Constant |     0.567       0.355
                               |     -6.00       -6.47
-------------------------------+------------------------
inflate                        |
                    (max) hage |                 0.969
                               |                 -1.79
              (max) community2 |                 0.586
                               |                 -1.32
              (max) community3 |                 0.392
                               |                 -2.20
              (max) community4 |                 0.132
                               |                 -2.12
              (max) community5 |                 0.000
                               |                 -0.00
              (max) community6 |                 0.000
                               |                 -0.00
              (max) community7 |                 0.257
                               |                 -1.93
                      Constant |                 2.116
                               |                  0.89
-------------------------------+------------------------
Statistics                     |                      
                         alpha |     0.567            
                             N |      1425        1425
                            ll | -2280.856   -2255.383
                           bic |  4619.807    4626.958
                           aic |  4577.711    4542.767
--------------------------------------------------------
                                             legend: b/t
 
 
 
 
Comparison of Mean Observed and Predicted Count
 
            Maximum       At      Mean
Model     Difference    Value    |Diff|
---------------------------------------------
NBRM       -0.013         3      0.003
ZINB        0.031         1      0.007
 
 
 
 
NBRM: Predicted and actual probabilities
 
Count   Actual    Predicted    |Diff|   Pearson
------------------------------------------------
0        0.372       0.372      0.000     0.001
1        0.269       0.266      0.002     0.033
2        0.159       0.157      0.002     0.037
3        0.075       0.088      0.013     2.859
4        0.055       0.049      0.005     0.862
5        0.029       0.028      0.002     0.169
6        0.020       0.016      0.004     1.376
7        0.010       0.009      0.001     0.079
8        0.004       0.005      0.001     0.355
9        0.002       0.003      0.001     0.544
------------------------------------------------
Sum      0.995       0.995      0.032     6.316
 
ZINB: Predicted and actual probabilities
 
Count   Actual    Predicted    |Diff|   Pearson
------------------------------------------------
0        0.372       0.386      0.015     0.780
1        0.269       0.237      0.031     5.956
2        0.159       0.160      0.001     0.012
3        0.075       0.096      0.021     6.307
4        0.055       0.054      0.001     0.017
5        0.029       0.030      0.000     0.003
6        0.020       0.016      0.003     1.022
7        0.010       0.009      0.001     0.140
8        0.004       0.005      0.001     0.138
9        0.002       0.003      0.001     0.203
------------------------------------------------
Sum      0.995       0.996      0.074    14.580
 
Tests and Fit Statistics
 
-------------------------------------------------------------------------
NBRM           BIC=  4619.807  AIC=  4577.711  Prefer  Over  Evidence
-------------------------------------------------------------------------
  vs ZINB      BIC=  4626.958  dif=    -7.151  NBRM    ZINB  Strong
               AIC=  4542.767  dif=    34.945  ZINB    NBRM
               Vuong=   3.581  prob=    0.000  ZINB    NBRM  p=0.000

Last edited by Monzur Alam; 28 Apr 2015, 20:04.

Tags: None

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

29 Apr 2015, 00:25

Dear Monzur,

The answer to your question very much depends on why you estimated your model and what you want to do with its results.

If you want to be able to estimate the probabilities of events you need to correctly specify the conditional distribution of your data. In this case I would say that the right way to go is to perform statistical tests to see if the chosen distribution is adequate. There are some tests for this and I believe the well-known book on count data by Cameron and Trivedi describes such tests (in particular, I have in mind goodness-of-fit tests that check whether the differences between the fitted and observed probabilities are statistically significant).

However, in many (most?) cases people intend to use the results of their count data model much in the same way they would use the results of OLS. That is, people essentially want to see how a set of regressors affect the mean of the variate of interest. In this case we do not really care about the distribution of the data and just care about the specification of the conditional mean. If this is your case, the criteria that you are using are not helpful because they are comparing fitted probabilities when you only care about the fit of the conditional mean. If this is your situation, I would simply use tests such as the RESET or the HPC, that can be performed with the -hpc- command (available from SSC).

All the best,

Joao
Comment

Announcement

How to select best fit model using Stata's countfit?

Comment