Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moulton Problem Example

    I am trying to illustrate the famous Moulton variance inflation problem from ignoring clustering. I created a toy data set with 4 states of 20 observations, where each cluster has 2 types:

    Code:
      +-------------------------------+
      |    y   x       e   state    n |
      |-------------------------------|
      | 5.35   0    -.65       0   10 |
      | 7.35   0    1.35       0   10 |
      |-------------------------------|
      |    3   1      -2       1   10 |
      |    5   1       0       1   10 |
      |-------------------------------|
      |    4   2       0       2   10 |
      |    6   2       2       2   10 |
      |-------------------------------|
      | 1.65   3   -1.35       3   10 |
      | 3.65   3     .65       3   10 |
      +-------------------------------+
    I calculated the within cluster error correlation using:

    Code:
    xtreg e [fw=n], i(state) fe
    From the formula on pp. 5-6 in http://cameron.econ.ucdavis.edu/rese...5_February.pdf, this implies that 1+1*0.415518*(20-1)=8.894842, so the standard errors should be (8.894842)^.5=2.9824222 times too big.

    However, that does not seem to be the case when I compare the output:

    Code:
    reg y x [fw=n]
    reg y x [fw=n], vce(cluster state)
    For the curious, my code is:

    Code:
    set more off
    /* fake data */
    clear
    set obs 8
    gen state = mod(_n,4)
    sort state 
    gen x = state
    gen y = 6 - 1*x
    gen e = cond(mod(state,2)==1,-1,1) + cond(mod(_n,2),-1,1) + cond(state==0,-0.65,cond(state==3,0.65,0))
    replace y = y+e
    gen n=10
    
    corr x e [fw=n]
    xtreg e [fw=n], i(state) fe
    
    list y x e state n, noobs sepby(state)
    reg y x [fw=n]
    reg y x [fw=n], vce(cluster state)
    What am I doing wrong here?

  • #2
    The Moulton variance inflation factor is 𝜏 ≃ 1 + 𝜌_{π‘₯}*𝜌_{𝑒}*(𝑁̄_𝑔 βˆ’ 1)
    The pieces are
    • 𝜌_{π‘₯} is the within-cluster correlation of x. If there’s no variation within each cluster, as in our example, this will be one.
    • 𝜌_{𝑒} is the within-cluster error correlation. Here it is 0.415518.
    • 𝑁̄_𝑔 is the average cluster size. Here this is 20.
    This relationship is exact when 𝜌_{π‘₯} = 1 and all the clusters are the same size.

    I also tried estimating the within error cluster correlation 𝜌_{𝑒} with

    Code:
    expand n
    loneway e state
    This gets me 0.39792, which is pretty darn close to xtreg,fe. However, the implied standard error inflation does match the actual very well.

    Comment


    • #3
      Hi Dimitriy V. Masterov , did you get some sort of response for this? I'm interested as well.

      Comment


      • #4
        Ariel Karlinsky I never made much headway on this since posting. I wonder if the small-sample correction that Stata uses might be getting in the way, but I haven't verified that is the case.

        Comment

        Working...
        X