Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Underdispersion Test and STATA command for underdisperesed counted data

    Dear all,

    I am doing analysis using longitudinal counted data. I have some questions regarding what analysis methods that I should use. Could you please answer the following questions?
    I am thinking to use xtnbreg, not xtpoission, because the means of variables are not the same with standard deviations of variables. Please see the summary statistics of my sample.

    Variable | Obs Mean Std. Dev. Min Max
    -------------+--------------------------------------------------------
    nfges_all | 8740 14.81739 8.152234 1 33
    ln_firmsize | 8740 .884222 .3701103 .6931472 3.091043
    strg_special | 8529 .9874546 .1113081 0 1
    innovative | 8715 .172117 .3775038 0 1
    other_pay | 8740 .8061785 .3953133 0 1
    -------------+--------------------------------------------------------
    avg_age | 8740 42.90567 11.33147 18.5 75.5
    ln_avg_sta~p | 8740 .5545851 .4788572 0 2.397895
    ln_avg_ind | 8740 1.564368 1.086563 0 3.871201
    ln_avg_mgr | 8740 2.158557 .8213584 0 3.821369
    family | 8740 .7159039 .4510086 0 1
    -------------+--------------------------------------------------------
    ln_age_sd | 8740 1.246822 .8209818 0 3.321882
    gender_d | 8740 .3393736 .2279552 0 .5
    eth_d | 8740 .0479357 .1697156 0 1
    ln_ind_sd | 8740 1.149404 .9272638 0 3.044523
    ln_startup~d | 8740 .3580668 .3778226 0 1.871802
    -------------+--------------------------------------------------------
    ln_mgr_sd | 8740 1.398575 .8319291 0 3.686488
    f_ln_age_sd | 8740 .8865466 .9115653 0 3.321882
    f_gender_d | 8740 .3088132 .2400879 0 .5
    f_eth_d | 8740 .0276602 .1307402 0 1
    f_ln_ind_sd | 8740 .7876592 .9510918 0 3.044523
    -------------+--------------------------------------------------------
    f_ln_start~d | 8740 .2247479 .3394624 0 1.871802
    f_ln_mgr_sd | 8740 1.023029 .9577016 0 3.068021

    However, my first question is if it’s okay to use xtnbreg command in this case because it seems the variables are rather underdispersed than overdispersed. And, there is no specific recommendation for the case of underdispersed dataset. For cross-sectional data, we can use 'estate gof’ command to identify what command to use, but for panel data, I couldn’t find appropriate commands in this case.

    Anyway, following some recommendations of websites, I conducted ‘overdispersion’ test to see if the data is really over-dispersed or under-dispersed: (1) the website recommended to use s & r at the bottom of xtnbreg outcome to calculate δ = s/(r - 1) (please look at bottom of the table below), but it didn’t say what is the criteria of overdispersion or underdispersion. More specifically, if δ is greater than 1, does it mean overdispersion? And if δ if smaller than 1, does it mean underdispersion?

    (2) Another thing that I looked at is statistic information at the bottom of ‘xtnbreg’ outcome: Likelihood-ratio test vs. pooled: chibar2(01) = 5971.35 (please look at bottom of the table below). Since chibar2 is much larger than 1, can I say the data is overdispersed and use xtnbreg command?

    The below table is the regression result of using xtnbreg command:

    Random-effects negative binomial regression Number of obs = 8504
    Group variable: sampid Number of groups = 290
    Random effects u_i ~ Beta Obs per group: min = 4
    avg = 29.3
    max = 65

    Wald chi2(37) = 339.60
    Log likelihood = -25720.958 Prob > chi2 = 0.0000

    ---------------------------------------------------------------------------------
    nfges_all | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    ln_firmsize | -.1013186 .0855316 -1.18 0.236 -.2689574 .0663203
    strg_special | -.0774372 .0733181 -1.06 0.291 -.221138 .0662636
    innovative | .0600717 .0350446 1.71 0.087 -.0086144 .1287578
    other_pay | -.4664182 .107286 -4.35 0.000 -.676695 -.2561414
    avg_age | .023599 .0059207 3.99 0.000 .0119947 .0352032
    ln_avg_startup | .322678 .1162967 2.77 0.006 .0947406 .5506154
    ln_avg_ind | -.3446654 .0655012 -5.26 0.000 -.4730454 -.2162854
    ln_avg_mgr | .1900138 .0746375 2.55 0.011 .0437269 .3363007

    family | -.2661666 .242957 -1.10 0.273 -.7423535 .2100203
    ln_age_sd | -.6348662 .0905796 -7.01 0.000 -.812399 -.4573335
    gender_d | -1.862834 .3287666 -5.67 0.000 -2.507205 -1.218464
    eth_d | 1.907085 .541249 3.52 0.000 .8462568 2.967914
    ln_ind_sd | .2728686 .0959465 2.84 0.004 .0848169 .4609204
    ln_startup_sd | -.9528149 .2106715 -4.52 0.000 -1.365723 -.5399064
    ln_mgr_sd | .2921513 .1035819 2.82 0.005 .0891346 .495168
    f_ln_age_sd | .5377865 .1053053 5.11 0.000 .3313919 .7441811
    f_gender_d | 1.653626 .425103 3.89 0.000 .8204391 2.486812
    f_eth_d | -1.202517 .5960664 -2.02 0.044 -2.370785 -.0342479
    f_ln_ind_sd | -.0612558 .0939045 -0.65 0.514 -.2453052 .1227937
    f_ln_startup_s| .1235049 .2006125 0.62 0.538 -.2696883 .5166981
    f_ln_mgr_sd| -.6731969 .1165071 -5.78 0.000 -.9015465 -.4448472
    _cons | 3.042684 .2589701 11.75 0.000 2.535112 3.550257
    ----------------+----------------------------------------------------------------
    /ln_r | 1.341416 .1095901 1.126623 1.556208
    /ln_s | .7426653 .1170089 .5133321 .9719986
    ----------------+----------------------------------------------------------------
    r | 3.824455 .4191222 3.085221 4.740812
    s | 2.101529 .2458976 1.670849 2.643222
    ---------------------------------------------------------------------------------
    Likelihood-ratio test vs. pooled: chibar2(01) = 5971.35 Prob>=chibar2 = 0.000


    I hope to have some responses from you and to find answers for this issue.
    Thank you very much!

    EJ

  • #2
    EJ:
    I would take a look at Joe Hilbe's textbook, with lots of examples using Stata: http://www.stata.com/bookstore/modeling-count-data/.
    As a small aside, you should also check whether your data are truly or apparently overdispersed.
    Eventually, in order to improve the format of your posting, please post what you typed and what Stata (not STATA, please) gave you back via code delimiters (which are covered in the FAQ). Thanks.
    Last edited by Carlo Lazzaro; 16 Aug 2015, 03:21.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X