Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to select best fit model using Stata's countfit?

    Dear All,

    I am using the countfit command in Stata 13 to determine which count regression model is the best fit for the a model of number of current domestic migrants, with independent variables: workers in household, nonworkers in household, age of household head, land owned, number of cows, business ownership. Below is the code and results (I have not included results for zip and prm because nbreg and zinb are the best fits). However, the results seem a little contradictory to me (or I might not be understanding them well).

    - From the graph, it appears that the NBRM is doing better than the ZINB
    - When we are comparing the mean observed and predicted count we see that the NBRM is problematic at value=3, and ZINB at value=1.
    - When comparing the actual and predicted probabilities, the sum of the Pearson column gives us a sense of how close the predicted proportions were to the actual proportions. So here it appears that the NBRM is performing better
    - When looking at the Tests and Fit Statistics, it appears that the ZINB is preferred over the NBRM in two of the tests (AIC and Vuong)

    I was wondering whether anyone could suggest which model seems like a better fit from the results, and why?

    Thanks in advance.

    Code:
    countfit totcurr_dommig hage num_workers num_nonworkers own_land_hec num_cows_owned business_owned, inflate(hage community2-community7) nbreg zinb
    
     
    .  
     
    Click image for larger version

Name:	countfitgraph.jpg
Views:	1
Size:	24.3 KB
ID:	1292600
    -------------------------------------------------------- Variable | NBRM ZINB -------------------------------+------------------------ totcurr_dommig | (max) hage | 1.028 1.024 | 13.18 10.13 (max) num_workers | 1.108 1.110 | 3.51 3.70 (max) num_nonworkers | 1.084 1.096 | 4.41 5.10 (max) own_land_hec | 0.999 0.999 | -1.51 -1.62 (max) num_cows_owned | 1.002 1.002 | 1.29 1.29 (max) business_owned | 1.076 1.040 | 1.03 0.56 Constant | 0.244 0.341 | -9.80 -6.94 -------------------------------+------------------------ lnalpha | Constant | 0.567 0.355 | -6.00 -6.47 -------------------------------+------------------------ inflate | (max) hage | 0.969 | -1.79 (max) community2 | 0.586 | -1.32 (max) community3 | 0.392 | -2.20 (max) community4 | 0.132 | -2.12 (max) community5 | 0.000 | -0.00 (max) community6 | 0.000 | -0.00 (max) community7 | 0.257 | -1.93 Constant | 2.116 | 0.89 -------------------------------+------------------------ Statistics | alpha | 0.567 N | 1425 1425 ll | -2280.856 -2255.383 bic | 4619.807 4626.958 aic | 4577.711 4542.767 -------------------------------------------------------- legend: b/t Comparison of Mean Observed and Predicted Count Maximum At Mean Model Difference Value |Diff| --------------------------------------------- NBRM -0.013 3 0.003 ZINB 0.031 1 0.007 NBRM: Predicted and actual probabilities Count Actual Predicted |Diff| Pearson ------------------------------------------------ 0 0.372 0.372 0.000 0.001 1 0.269 0.266 0.002 0.033 2 0.159 0.157 0.002 0.037 3 0.075 0.088 0.013 2.859 4 0.055 0.049 0.005 0.862 5 0.029 0.028 0.002 0.169 6 0.020 0.016 0.004 1.376 7 0.010 0.009 0.001 0.079 8 0.004 0.005 0.001 0.355 9 0.002 0.003 0.001 0.544 ------------------------------------------------ Sum 0.995 0.995 0.032 6.316 ZINB: Predicted and actual probabilities Count Actual Predicted |Diff| Pearson ------------------------------------------------ 0 0.372 0.386 0.015 0.780 1 0.269 0.237 0.031 5.956 2 0.159 0.160 0.001 0.012 3 0.075 0.096 0.021 6.307 4 0.055 0.054 0.001 0.017 5 0.029 0.030 0.000 0.003 6 0.020 0.016 0.003 1.022 7 0.010 0.009 0.001 0.140 8 0.004 0.005 0.001 0.138 9 0.002 0.003 0.001 0.203 ------------------------------------------------ Sum 0.995 0.996 0.074 14.580 Tests and Fit Statistics ------------------------------------------------------------------------- NBRM BIC= 4619.807 AIC= 4577.711 Prefer Over Evidence ------------------------------------------------------------------------- vs ZINB BIC= 4626.958 dif= -7.151 NBRM ZINB Strong AIC= 4542.767 dif= 34.945 ZINB NBRM Vuong= 3.581 prob= 0.000 ZINB NBRM p=0.000
    Last edited by Monzur Alam; 28 Apr 2015, 20:04.

  • #2
    Dear Monzur,

    The answer to your question very much depends on why you estimated your model and what you want to do with its results.

    If you want to be able to estimate the probabilities of events you need to correctly specify the conditional distribution of your data. In this case I would say that the right way to go is to perform statistical tests to see if the chosen distribution is adequate. There are some tests for this and I believe the well-known book on count data by Cameron and Trivedi describes such tests (in particular, I have in mind goodness-of-fit tests that check whether the differences between the fitted and observed probabilities are statistically significant).

    However, in many (most?) cases people intend to use the results of their count data model much in the same way they would use the results of OLS. That is, people essentially want to see how a set of regressors affect the mean of the variate of interest. In this case we do not really care about the distribution of the data and just care about the specification of the conditional mean. If this is your case, the criteria that you are using are not helpful because they are comparing fitted probabilities when you only care about the fit of the conditional mean. If this is your situation, I would simply use tests such as the RESET or the HPC, that can be performed with the -hpc- command (available from SSC).

    All the best,

    Joao

    Comment

    Working...
    X