Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • estat ic after fmm with vce(cluster clustname) gives wrong results

    estat ic after fmm with vce(cluster clustname) gives wrong results for the values of AIC and BIC, or at least results I can't make sense of. I don't think it's using the correct value for the number of model parameters (k) which enters the formulae for AIC and BIC. fmm without vce(cluster clustname) gives results I can understand. In the example code and results below, I expect k=7 in the fmm 2 example and k=11 in the fmm 3 example. Yet estat ic reports and uses k=4 in both cases. This does not make sense to me, but maybe there's something I don't understand. I know there's an issue of what n to use in such situations but I didn't think there was any question about what k should be. All insight would be greatly appreciated.

    Code for a working example is below. The log of results from that example follows the code.

    Code:
    sysuse auto
    keep if rep78!=.
    
    fmm 2, vce(cluster rep78): regress price mpg
    estat ic
    fmm 3, vce(cluster rep78): regress price mpg
    estat ic
    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . keep if rep78!=.
    (5 observations deleted)
    
    . fmm 2, vce(cluster rep78): regress price mpg
    
    <snip>
    Finite mixture model                            Number of obs     =         69
    Log pseudolikelihood = -606.22679
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.Class      |  (base outcome)
    -------------+----------------------------------------------------------------
    2.Class      |
           _cons |   .8585405   .3083115     2.78   0.005      .254261     1.46282
    ------------------------------------------------------------------------------
    
    Class          : 1
    Response       : price
    Model          : regress
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
             mpg |   -311.329   43.72171    -7.12   0.000     -397.022    -225.636
           _cons |   15687.98   1236.496    12.69   0.000      13264.5    18111.47
    -------------+----------------------------------------------------------------
     var(e.price)|    6207549    1423945                       3959712     9731430
    ------------------------------------------------------------------------------
    
    Class          : 2
    Response       : price
    Model          : regress
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
             mpg |  -75.80155   28.06208    -2.70   0.007    -130.8022   -20.80088
           _cons |   6385.969   527.5238    12.11   0.000     5352.041    7419.896
    -------------+----------------------------------------------------------------
     var(e.price)|   479575.3   131351.7                      280362.3    820339.9
    ------------------------------------------------------------------------------
    
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
    -------------+---------------------------------------------------------------
               . |         69         .  -606.2268       4    1220.454    1229.39
    -----------------------------------------------------------------------------
                   Note: N=Obs used in calculating BIC; see [R] BIC note.
    
    . fmm 3, vce(cluster rep78): regress price mpg
    
    <snip>
    Finite mixture model                            Number of obs     =         69
    Log pseudolikelihood = -602.64093
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.Class      |  (base outcome)
    -------------+----------------------------------------------------------------
    2.Class      |
           _cons |  -.6605901   .6277077    -1.05   0.293    -1.890875    .5696943
    -------------+----------------------------------------------------------------
    3.Class      |
           _cons |   .7093667   .2373561     2.99   0.003     .2441572    1.174576
    ------------------------------------------------------------------------------
    
    Class          : 1
    Response       : price
    Model          : regress
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
             mpg |  -405.1241   44.44051    -9.12   0.000    -492.2259   -318.0223
           _cons |   17374.85   1203.793    14.43   0.000     15015.46    19734.24
    -------------+----------------------------------------------------------------
     var(e.price)|    6044366    1732961                       3445922    1.06e+07
    ------------------------------------------------------------------------------
    
    Class          : 2
    Response       : price
    Model          : regress
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
             mpg |  -23.73611   1.001633   -23.70   0.000    -25.69927   -21.77294
           _cons |   6437.194    114.627    56.16   0.000     6212.529    6661.859
    -------------+----------------------------------------------------------------
     var(e.price)|   53423.12   13336.43                      32751.95    87140.74
    ------------------------------------------------------------------------------
    
    Class          : 3
    Response       : price
    Model          : regress
    
                                      (Std. Err. adjusted for 5 clusters in rep78)
    ------------------------------------------------------------------------------
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    price        |
             mpg |  -47.62128   28.06867    -1.70   0.090    -102.6349     7.39229
           _cons |   5535.232   737.4024     7.51   0.000      4089.95    6980.514
    -------------+----------------------------------------------------------------
     var(e.price)|   275264.8    69759.3                      167507.4    452342.5
    ------------------------------------------------------------------------------
    
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
    -------------+---------------------------------------------------------------
               . |         69         .  -602.6409       4    1213.282   1222.218
    -----------------------------------------------------------------------------
                   Note: N=Obs used in calculating BIC; see [R] BIC note.

  • #2
    The df value reported by estat ic is the rank of e(V).

    So in your example, the fitted model with a VCE that is cluster-robust on 5 clusters has a rank of 4.

    Comment


    • #3
      Thanks, Jeff, for your response. I don't think that's correct for a couple of reasons.

      1. In everything I've ever read on the use of AIC and BIC in pseudolikelihood situations, k is always, unambiguously, the number of estimated parameters in the model, e.g., see Jones (2011) which describes the issue with what N is in the clustered data context but there is no issue about k.
      2. Even if we agreed that the rank of 4 was useful, it's not clear what that means in the context of a finite mixture model. A 2-class model has rank of 4, a 3 class model, with many more parameters, also has a rank of 4, etc. That doesn't make sense in the context of model selection. There have to be more parameters (fewer degrees of freedom) with more classes.

      Jones, R. H. (2011), Bayesian information criterion for longitudinal and clustered data. Statist. Med., 30: 3050–3056. doi:10.1002/sim.4323

      Comment


      • #4
        Hi, in Stata 15.1, 18 April 2018 revision, the results reported above are the same. Why has there not been a follow up on this issue, or further explanation by Jeff?
        http://publicationslist.org/eric.melse

        Comment


        • #5
          Eric,

          I can't answer your question but I can tell you what I've been doing -- which is writing a few lines of code to compute AIC and BIC with the number of estimated parameters as k in
          AIC = -2ln(L) + 2k
          BIC = -2ln(L) + k ln(N)

          I'm using N as the full sample size although in the context of clustering, this is debatable.

          In my example below, my coding gives the same results as -estat ic- when there is no cluster correction to SEs. When there is, the results are different -- and I like mine better!

          Code:
          clear all
          sysuse auto
          
          drop if rep78==.
          
          fmm 2: regress price mpg i.foreign
          estat ic
          
          mat V = e(V)
          scalar k = e(k) - diag0cnt(V)
          scalar AIC = -2*e(ll) + 2*k
          scalar BIC = -2*e(ll) + k*ln(e(N))
          scalar di k
          scalar di AIC
          scalar di BIC
          
          fmm 2, vce(cluster rep78): regress price mpg i.foreign
          estat ic
          
          mat V = e(V)
          scalar k = e(k) - diag0cnt(V)
          scalar AIC = -2*e(ll) + 2*k
          scalar BIC = -2*e(ll) + k*ln(e(N))
          scalar di k
          scalar di AIC
          scalar di BIC
          
          fmm 3, vce(cluster rep78): regress price mpg i.foreign
          estat ic
          
          mat V = e(V)
          scalar k = e(k) - diag0cnt(V)
          scalar AIC = -2*e(ll) + 2*k
          scalar BIC = -2*e(ll) + k*ln(e(N))
          scalar di k
          scalar di AIC
          scalar di BIC

          Comment


          • #6
            Many thanks Partha, this is really helpful
            http://publicationslist.org/eric.melse

            Comment

            Working...
            X