Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regressions with 'long'/panel data: misleading test statistics?

    Greetings,

    I'm running Stata 15.1 on a Mac OS and am currently working with Pew panel data. I believe my question is very basic. I'd like to measure the relationship between a continuous independent variable and an ordinal dependent variable (note: there are other variables whose relationships I'm interested in, but I will use the current case as an example). One (the x or independent variable) was measured in the April 2020 wave of the survey, and the other (the dependent variable) was measured in the October 2020 wave. Because my dataset also consists of variables measured in other waves, I opted to reshape the data to 'wide' format. However, I noticed that model test statistics are larger in regressions of data in 'long' than 'wide' format:

    Long Format
    Code:
    . ologit AF_GOOD4  mhindex_meanZ, or
    
    Iteration 0:   log likelihood = -27851.383  
    Iteration 1:   log likelihood =  -27270.49  
    Iteration 2:   log likelihood = -27268.927  
    Iteration 3:   log likelihood = -27268.927  
    
    Ordered logistic regression    Number of obs     =    23,538
        LR chi2(1)        =    1164.91
        Prob > chi2       =    0.0000
    Log likelihood = -27268.927    Pseudo R2         =    0.0209
    
            
    AF_GOOD4  Odds Ratio   Std. Err.      z    P>z     [95% Conf.    Interval]
            
    mhindex_meanZ    1.547024    .019962    33.82    0.000      1.50839    1.586648
            
    /cut1   -.0222594   .0132795    -.0482869    .003768
    /cut2    .9936055    .014874    .9644529    1.022758
    /cut3    2.788631   .0274565    2.734817    2.842444
    Wide Format

    Code:
    . ologit AF_GOOD4  mhindex6466_meanZ, or
    
    Iteration 0:   log likelihood = -9283.7942     
    Iteration 1:   log likelihood = -9090.1634     
    Iteration 2:   log likelihood = -9089.6425     
    Iteration 3:   log likelihood = -9089.6424     
    
    Ordered logistic regression    Number of obs    =      7,846
        LR chi2(1)    =     388.30
        Prob > chi2    =     0.0000
    Log likelihood = -9089.6424    Pseudo R2    =     0.0209
    
            
    AF_GOOD4  Odds Ratio   Std. Err.    z    P>z    [95% Conf. Interval]
            
    mhindex6466_meanZ    1.547043   .0345765    19.52   0.000    1.480737    1.616317
            
    /cut1   -.0222594   .0230008        -.0673403    .0228214
    /cut2    .9936055   .0257626        .9431117    1.044099
    /cut3    2.788631    .047556        2.695422    2.881839


    Of course, this is not surprising given that 'wide' format includes multiple measurements (at different waves) of the same variable from each respondent. But my question is whether the inflated test statistics can be trusted. As more control variables are added, it's possible that variables that remain significant in 'long' format are no longer significant in 'wide' format. I'm thus not sure how to approach this issue. Am I better off sticking to wide format? Is there a way to obtain 'adjusted' test statistics in long format? Or perhaps I'm perceiving a problem that really isn't a problem (?).

    Any input you can provide will be much appreciated. Thank you!

  • #2
    In = your wide layout you have 7,846 observations, while in your long layout, you have 23,538 observations, iprecisely three times as many.

    I have a feeling that however you structured your long layout, you ended up with copies of AF_GOOD4 and mhindex_meanZ in more observations than they should have been in. Perhaps your original ologit command should be something similar to
    Code:
    ologit AF_GOOD4  mhindex_meanZ if year==2020, or
    but without a better idea of your data, its difficult to say.

    Comment


    • #3
      William,

      Thanks for the reply.

      Some potentially relevant information I neglected to include: each variable title had a suffix or stub indicating the survey wave (e.g. 64, 66, 76) in which it was measured. The items constituting the index (mhindex_meanZ) that I'm using as my IV were measured in March (wave 64) and again in April (wave 66). Given the short intervening period, and the fact that not all panelists participated in both waves, I opted to take the average of the March and April measurements (i.e. I created an average index that includes panelists that either provided data in both or only one of the waves). I'm wondering whether this is the issue. The dependent variable was measured only in October (wave 76).

      Here is sample data in wide form:


      Code:
      * Example generated    by -dataex-. To    install: ssc install dataex
      clear
      input double caseid    float(mhindex64    mhindex66) double AF_GOOD476
      100260 1.25    1 1
      100637  2.5    2 1
      101472 1.75 1.75 2
      101493    2  1.5 1
      103094    3 3.25 .
      103538  1.5  1.5 .
      103611    2    2 .
      104210  1.5 1.75 4
      104368  1.5  1.5 3
      104491 2.25    2 1
      104689    2  2.5 1
      104727 1.75    2 1
      104937    1    1 .
      106590  2.5 2.25 1
      106960    . 1.75 .
      107329 2.25 2.25 2
      108035    1    1 1
      108348    .    3 1
      108435 1.75    2 .
      109143    1  1.5 2
      110550 1.25    1 1
      111665 3.25    3 2
      112238    2 2.25 2
      112490 3.25 3.25 3
      112613    1 1.25 1
      112984 2.75    3 2
      113248    .    1 3
      113412 2.25 2.25 1
      114058 3.25  3.5 2
      114671    1 2.25 1
      115295 1.25    1 1
      115546 1.75 1.25 2
      115706    .    . .
      115807    1    1 .
      116151    1    1 1
      116264  1.5 2.75 1
      116832 2.75    . .
      116998    2  2.5 3
      118110    1    1 1
      118414  1.5    1 1
      118847  1.5    2 .
      118888  1.5  1.5 2
      119121  2.5 3.25 2
      119392 1.25    1 .
      119548    3 1.75 1
      120343 1.75    1 1
      120873    1    1 3
      121158    2 2.75 1
      121582    2    2 1
      122503    2  2.5 2
      124561    3 1.75 2
      125280 3.75 3.75 .
      126131  1.5 1.25 3
      126211    2  2.5 1
      126570 1.25    1 1
      127160    1    1 1
      127250    .    . .
      127284 2.25    2 .
      127498 3.25 1.75 1
      128285 2.25  1.5 3
      128558    1    1 1
      129622    .    1 .
      131786    1 2.75 1
      132246  1.5  1.5 1
      132264  1.5  1.5 1
      132478 2.75 2.25 3
      132973  1.5  1.5 3
      133435 1.25    . .
      133550 2.75    2 2
      133700 1.75  2.5 2
      134129 2.25  1.5 .
      135293 2.25    2 .
      135751 1.25 1.25 2
      135822    1    1 1
      136046  2.5 2.75 2
      136999 1.25    1 1
      137139    1    1 2
      138105    2 2.75 2
      139905  2.5  1.5 2
      140204    3 1.25 .
      141319 1.25 1.25 4
      141471    1    1 .
      142398    2 1.25 .
      143122  2.5 2.25 .
      143915  1.5 1.75 .
      144036    1    1 1
      144120    .    . .
      144429 1.25    1 1
      145887    .    1 .
      146434 2.75 2.75 1
      147121  3.5    . .
      147316    2  2.5 .
      149866 2.25  2.5 .
      150084 3.25 1.75 4
      150280  1.5  1.5 .
      152324 1.75  1.5 .
      152957    . 2.25 .
      153863    .  1.5 3
      154750    1  1.5 1
      155164    .    . .
      end
      If the above has too many missings to work with, here is also sample data consisting of panelists with complete responses (i.e. they provided measures in March AND April):

      Code:
      * Example generated    by -dataex-. To    install: ssc install dataex
      clear
      input double caseid    float(mhindex64    mhindex66) double AF_GOOD476
      100260 1.25    1 1
      100637  2.5    2 1
      101472 1.75 1.75 2
      101493    2  1.5 1
      104210  1.5 1.75 4
      104368  1.5  1.5 3
      104491 2.25    2 1
      104689    2  2.5 1
      104727 1.75    2 1
      106590  2.5 2.25 1
      107329 2.25 2.25 2
      108035    1    1 1
      109143    1  1.5 2
      110550 1.25    1 1
      111665 3.25    3 2
      112238    2 2.25 2
      112490 3.25 3.25 3
      112613    1 1.25 1
      112984 2.75    3 2
      113412 2.25 2.25 1
      114058 3.25  3.5 2
      114671    1 2.25 1
      115295 1.25    1 1
      115546 1.75 1.25 2
      116151    1    1 1
      116264  1.5 2.75 1
      116998    2  2.5 3
      118110    1    1 1
      118414  1.5    1 1
      118888  1.5  1.5 2
      119121  2.5 3.25 2
      119548    3 1.75 1
      120343 1.75    1 1
      120873    1    1 3
      121158    2 2.75 1
      121582    2    2 1
      122503    2  2.5 2
      124561    3 1.75 2
      126131  1.5 1.25 3
      126211    2  2.5 1
      126570 1.25    1 1
      127160    1    1 1
      127498 3.25 1.75 1
      128285 2.25  1.5 3
      128558    1    1 1
      131786    1 2.75 1
      132246  1.5  1.5 1
      132264  1.5  1.5 1
      132478 2.75 2.25 3
      132973  1.5  1.5 3
      133550 2.75    2 2
      133700 1.75  2.5 2
      135751 1.25 1.25 2
      135822    1    1 1
      136046  2.5 2.75 2
      136999 1.25    1 1
      137139    1    1 2
      138105    2 2.75 2
      139905  2.5  1.5 2
      141319 1.25 1.25 4
      144036    1    1 1
      144429 1.25    1 1
      146434 2.75 2.75 1
      150084 3.25 1.75 4
      154750    1  1.5 1
      155464 1.75    1 1
      156615 1.75 1.75 3
      157570  2.5 1.75 1
      157730  1.5 1.25 1
      159997 1.25 1.75 1
      162341 2.25    1 3
      162517 1.25    1 3
      164283 2.25 2.25 3
      164425    2 2.25 1
      165452  1.5 1.25 1
      166301 1.75 2.25 4
      168336 2.25 2.75 2
      169249  3.5    2 1
      169864  1.5  1.5 1
      170940 1.25 1.25 1
      171264    3  2.5 2
      174032    4  2.5 1
      175363 1.75 1.75 1
      176214 2.25    2 2
      176308    2    2 3
      176940 1.25 1.25 1
      177572 1.75 1.75 2
      178470 2.75  2.5 1
      179691 1.75  2.5 1
      179979    3    3 1
      183145 1.25 1.75 1
      185394 1.75 2.25 4
      187312 3.25 2.75 1
      188245    1  1.5 1
      189589  2.5    2 3
      190780  1.5 1.25 3
      190961 3.25  3.5 1
      191063 1.75  1.5 1
      191336 3.25 2.75 3
      192294 2.25    2 2
      end
      Note: caseid= panel ID (so you can try reshaping the data yourself).

      My 'reshape' syntax was as follows:

      Code:
      reshape long mhindex, i(caseid) j(wave)
      Note: I did not reshape AF_GOOD4 due to the fact that it was only measured once.

      Thanks again for your help!

      Comment


      • #4
        If we run the code you provided on the first 10 observations of your example data
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double caseid float(mhindex64 mhindex66) double AF_GOOD476
        100260 1.25    1 1
        100637  2.5    2 1
        101472 1.75 1.75 2
        101493    2  1.5 1
        103094    3 3.25 .
        103538  1.5  1.5 .
        103611    2    2 .
        104210  1.5 1.75 4
        104368  1.5  1.5 3
        104491 2.25    2 1
        end
        reshape long mhindex, i(caseid) j(wave)
        we produce the 20 observations of reshaped data
        Code:
        . list, abbreviate(12) sepby(caseid)
        
             +--------------------------------------+
             | caseid   wave   mhindex   AF_GOOD476 |
             |--------------------------------------|
          1. | 100260     64      1.25            1 |
          2. | 100260     66         1            1 |
             |--------------------------------------|
          3. | 100637     64       2.5            1 |
          4. | 100637     66         2            1 |
             |--------------------------------------|
          5. | 101472     64      1.75            2 |
          6. | 101472     66      1.75            2 |
             |--------------------------------------|
          7. | 101493     64         2            1 |
          8. | 101493     66       1.5            1 |
             |--------------------------------------|
          9. | 103094     64         3            . |
         10. | 103094     66      3.25            . |
             |--------------------------------------|
         11. | 103538     64       1.5            . |
         12. | 103538     66       1.5            . |
             |--------------------------------------|
         13. | 103611     64         2            . |
         14. | 103611     66         2            . |
             |--------------------------------------|
         15. | 104210     64       1.5            4 |
         16. | 104210     66      1.75            4 |
             |--------------------------------------|
         17. | 104368     64       1.5            3 |
         18. | 104368     66       1.5            3 |
             |--------------------------------------|
         19. | 104491     64      2.25            1 |
         20. | 104491     66         2            1 |
             +--------------------------------------+
        But you tell us your independent variable is the average of the observations of mhindex in waves 64 and 66, but you don't tell us how you create that variable — mhindex_meanZ in the wide dataset and mhindex6466_meanZ in the long dataset.

        I believe you made the following mistake.
        Code:
        generate mhindex6466 = (mhindex64+mhindex66)/2
        reshape long mhindex, i(caseid) j(wave)
        rename mhindex mhindex_meanZ6466
        ologit AF_GOOD4  mhindex6466_meanZ, or
        But here is the data the ologit command is run on.
        Code:
        . list, abbreviate(18) sepby(caseid)
        
             +------------------------------------------------+
             | caseid   wave   AF_GOOD476   mhindex6466_meanZ |
             |------------------------------------------------|
          1. | 100260     64            1                1.25 |
          2. | 100260     66            1                   1 |
          3. | 100260   6466            1               1.125 |
             |------------------------------------------------|
          4. | 100637     64            1                 2.5 |
          5. | 100637     66            1                   2 |
          6. | 100637   6466            1                2.25 |
             |------------------------------------------------|
          7. | 101472     64            2                1.75 |
          8. | 101472     66            2                1.75 |
          9. | 101472   6466            2                1.75 |
             |------------------------------------------------|
         10. | 101493     64            1                   2 |
         11. | 101493     66            1                 1.5 |
         12. | 101493   6466            1                1.75 |
             |------------------------------------------------|
         13. | 103094     64            .                   3 |
         14. | 103094     66            .                3.25 |
         15. | 103094   6466            .               3.125 |
             |------------------------------------------------|
         16. | 103538     64            .                 1.5 |
         17. | 103538     66            .                 1.5 |
         18. | 103538   6466            .                 1.5 |
             |------------------------------------------------|
         19. | 103611     64            .                   2 |
         20. | 103611     66            .                   2 |
         21. | 103611   6466            .                   2 |
             |------------------------------------------------|
         22. | 104210     64            4                 1.5 |
         23. | 104210     66            4                1.75 |
         24. | 104210   6466            4               1.625 |
             |------------------------------------------------|
         25. | 104368     64            3                 1.5 |
         26. | 104368     66            3                 1.5 |
         27. | 104368   6466            3                 1.5 |
             |------------------------------------------------|
         28. | 104491     64            1                2.25 |
         29. | 104491     66            1                   2 |
         30. | 104491   6466            1               2.125 |
             +------------------------------------------------+
        Do you see - you have three times as many observations in the long dataset as you had in the wide dataset, exactly the problem was with your results in post #1 that I pointed out in post #2.

        You should have run the command
        Code:
        ologit AF_GOOD4  mhindex6466_meanZ if wave==6466, or
        to limit your ologit to just those observations having the average value for mhindex (from wave "6466")
        as I suggested in post #2.

        Comment


        • #5
          On further reflection, more changes to your code would be better.
          Code:
          generate mhmeanindex66 = (mhindex64+mhindex66)/2
          reshape long mhindex mhmeanindex, i(caseid) j(wave)
          list, abbreviate(18) sepby(caseid)
          ologit AF_GOOD4 mhmeanindex, or
          Now this is the data the ologit command will be run on.
          Code:
          . list, abbreviate(18) sepby(caseid)
          
               +----------------------------------------------------+
               | caseid   wave   mhindex   AF_GOOD476   mhmeanindex |
               |----------------------------------------------------|
            1. | 100260     64      1.25            1             . |
            2. | 100260     66         1            1         1.125 |
               |----------------------------------------------------|
            3. | 100637     64       2.5            1             . |
            4. | 100637     66         2            1          2.25 |
               |----------------------------------------------------|
            5. | 101472     64      1.75            2             . |
            6. | 101472     66      1.75            2          1.75 |
               |----------------------------------------------------|
            7. | 101493     64         2            1             . |
            8. | 101493     66       1.5            1          1.75 |
               |----------------------------------------------------|
            9. | 103094     64         3            .             . |
           10. | 103094     66      3.25            .         3.125 |
               |----------------------------------------------------|
           11. | 103538     64       1.5            .             . |
           12. | 103538     66       1.5            .           1.5 |
               |----------------------------------------------------|
           13. | 103611     64         2            .             . |
           14. | 103611     66         2            .             2 |
               |----------------------------------------------------|
           15. | 104210     64       1.5            4             . |
           16. | 104210     66      1.75            4         1.625 |
               |----------------------------------------------------|
           17. | 104368     64       1.5            3             . |
           18. | 104368     66       1.5            3           1.5 |
               |----------------------------------------------------|
           19. | 104491     64      2.25            1             . |
           20. | 104491     66         2            1         2.125 |
               +----------------------------------------------------+
          As you can see, at most one observation for each value of caseid, rather than three.

          Comment

          Working...
          X