Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coefficients from -areg- and difference in predicted scores using -margins- do not match!

    Hello,

    I'm having trouble troubleshooting why a coefficient I see in the -areg- output does not match the difference in the predicted values in the -margins- output.

    Specifically, I used -areg- to run an interrupted time-series model to understand the impact of an event on test performance by participants' ethnicity, where post1 is the first year after the event I'm interested in (i.e., "i.post1#ib3.ethnicity" where the third ethnicity group, Hispanic, was the reference group). For reference, time=0 indicates when the event of interest occurred (i.e., post1 = (time=1)).

    Code:
    areg testscore time female i.post1##ib3.ethnicity i.post2##ib3.ethnicity disabled servicestatus i.post1##ib1.level i.post2##ib1.level p_a p_b p_h p_w p_ai p_nh p_esl if include_fulldata==1 [pweight=ipsw*sweight], vce(cluster participantid) absorb(schoolid)
    The results show that the coefficient for "1.post1" is -12.83, which tells me that the event had an average effect of -12.83 point drop for Hispanic participants (i.e., baseline).

    When I run -margins- to get the exact predicted test scores for Hispanic participants:

    Code:
    margins post1, over(ethnicity) at((means) time==1 post2=0)
    In the -margins- output (format: ethnicity#post1), Stata returns 331.81 points at post1 assuming the event of interest did not happen (i.e., hispanic#0) and 322.64 points at post1 given that the event has happened (i.e., hispanic#1). I would expect that the difference between the two predicted scores would be -12.83 based on -areg- results; however, it is 9.17.


    I cannot seem to figure out why the the two values do not match. I'd really appreciate any insight into this. Unfortunately, I can't share the data/parts of data due to confidentiality, but am ready to run diagnostics/different versions of the -margins- command as recommended from you all.



    Last edited by Min Oh; 25 Aug 2023, 11:02.

  • #2
    The discrepancy arises because post1, in addition to being interacted with ethnicity, is also interacted with level. When you have two interactions of the same variable, the coefficients take on different meanings than when there is only one interaction. In particular, the "main effect" of post1 now represents the effect of post 1 conditional on both ethnicity and level being at their base values. You can see this directly with this example:

    Code:
    sysuse auto, clear
    
    keep if rep78 >= 3
    
    summ mpg headroom trunk
    regress price i.foreign##i.rep78 mpg headroom trunk
    
    margins foreign, over(rep78) at((means) mpg = 15 headroom = 3 trunk = 10) post
    lincom _b[3bn.rep78#1.foreign] - _b[3bn.rep78#0bn.foreign] // MATCHES COEFFICIENT OF foreign IN REGRESSION
    
    regress price i.foreign##i.rep78 mpg headroom i.foreign##c.trunk
    estimates store regression
    margins foreign, over(rep78) at((means) mpg = 15 headroom = 3 trunk = 10) post
    lincom _b[3bn.rep78#1.foreign] - _b[3bn.rep78#0bn.foreign] // DOES NOT MATCH COEFFICIENT OF foreign IN REGRESSION
    
    estimates restore regression
    margins foreign, over(rep78) at((means) mpg = 15 headroom = 3 trunk = 0) post // DOES MATCH
    lincom _b[3bn.rep78#1.foreign] - _b[3bn.rep78#0bn.foreign]

    Comment


    • #3
      Clyde, that was it! Thank you very much for your detailed explanation.

      Comment

      Working...
      X