Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Decipher a user-written "bogus" program

    Hi everyone

    From this discussion, Daniel Klein wrote a program called "bogus" that is able to correctly apply -margins- on an interaction term calculated manually.

    In that example, the interaction term was created from 2 continuous variables. I am trying to modify the "bogus" program for interaction terms created from 2 or more binary variables.

    As such, can I please ask someone to explain in a bit more detail what each line is doing in that program? And if its modification to accomodate interaction term from binary variable is a relatively straightforward process?

    Thanks,
    Junran

  • #2
    Could you say a few words on why you (think) you need this? As I have repeatedly stated, this approach really manipulates Stata in a way you probably should not. The reason is that once you mess with Stata on this level, you can no longer be certain that the results you obtain are valid and you can no longer rely on Stata to throw errors and warnings to tell you that results might be questionable or wrong.

    In general, the outlined approach manipulates the coefficients (and covariance) matrix, stored in e(), in a way that resembles those matrices when factor variables are being used. This is rather simple (depending on your Stata programming knowledge) for simple problems. It probably gets harder when you have more complicated problems. For example, if certain levels of factor variables are removed, due to collinearity, you need to keep track of them and make sure you reassemble the matrices correctly.

    Best
    Daniel

    Comment


    • #3
      Thanks Daniel. Yes ideally I would prefer not to change Stata in such a way, the reason for this question is because I am using a model which does not support factor notation (I've written to Technical Support and understand there are good reasons why factor notations are not used in this case) but the results interpretation would be enhanced greatly with the -margins- commands.

      Specifically, I am using the Hausman-Taylor estimator -xthtaylor-: the variable of interest is gender & the model contains interaction terms that are products of the gender variable with marital status, parental status and so on. As such, with many interaction terms, interpretation becomes difficult and I would therefore like to create marginsplots to assist in the presentation. However, the margins commands do not work correctly because -xthtaylor- does not use factor notations.

      After scouring through the internet, only something similar to your "bogus" program seems to offer a solution. But if there are ways to get the correct margins without manipulating Stata in such a way, I would really appreciate it if you can please give me some pointers.

      Click image for larger version

Name:	Screen Shot 2019-11-18 at 11.53.04 pm.png
Views:	1
Size:	208.5 KB
ID:	1525111

      Comment


      • #4
        If StataCorp. has good reasons to not support factor-variables, then perhaps those reasons also have implications for margins; I see that xthtaylor stores lots of variable names in e() macros but I do not know whether any of these macros is relevant to margins.

        Anyway, I lack the time to modify the bogus-approach to handle this case. If I wanted this, I would start by estimating your model with regress, using factor-variable-notation, and store the e(b) and e(V) matrices. I would then re-run the model using xthtaylor, using the manually created interactions, and again store e(b) and e(V).* In the next step, I would replace the coefficients (and covariances) from regress with those from xthtaylor, making sure that each coefficient goes in the correct cell of the respective matrices. That is, I would use the column (and row) names from regress and replace the respective coefficients with those from xthtaylor. The former matrices will be larger because they will contain the omitted levels with coefficients set to 0. Last, I would repost e(b) and e(V) into the other e() results from xthtaylor.


        *Actually, if I were to do this for the first time, I would re-run the model using regress but with manually created interactions. I would then follow the remaining steps described above, using regress instead of xthtaylor, and verify that margins with my manipulated regress results (those with manually created interactions) matched the results obtained from margins based on regress with factor-variables. Only then would I turn to xthtaylor again.

        Best
        Daniel

        Comment


        • #5
          Many thanks Daniel.

          For my own elucidation, I summarized the procedure you outlined above in point form:

          1. Estimate model with regress, using factor-variable-notation & store the e(b) and e(V) matrices. – model 1
          2. Re-run the model using regress but with manually created interactions & store the e(b) and e(V) matrices. – model 2
          3. Replace the coefficients & covariances from regress (model 2) with those from regress (model 1) – making sure each coefficient goes in the correct cell of the respective matrices. Use the column and row names & replace the respective coefficients with those from regress (model 1). The model 1 matrices will be larger because they will contain the omitted levels with coefficients set to 0.
          4. Verify margins with the manipulated regress (model 2) results - those with manually created interactions - match the results obtained from margins based on regress (model 1) with factor-variables.
          5. Re-run model using xthtaylor, which uses manually created interactions, & store the e(b) and e(V) matrices. – model 3
          6. Replace the coefficients & covariances from xthtaylor (model 3) with those from regress (model 1) – making sure each coefficient goes in the correct cell of the respective matrices. Use the column and row names & replace the respective coefficients with those from regress (model 1). The model 1 matrices will be larger because they will contain the omitted levels with coefficients set to 0.
          7. Repost e(b) and e(V) into the other e() results from xthtaylor (model 3).
          If I do get it to work correctly, I will be sure to cite your authorship from this discussion thread. Thanks again.

          Comment


          • #6
            I think I was not clear enough. Sorry.

            I have mainly suggested using regress twice because you can use factor-variables with regress. If you use factor variables, you know what the "true" estimates are. You can then more easily check whether you can reproduce those estimates by manipulating the matrices when factor variables are not used. Once you are confident that you have understood what you are doing and that your results match those produced with factor variables, you can turn to xthtaylor.

            Once you turn to xthtaylor, you can use the matrices you get from regress with factor variable notation. These matrices will have the correct column (and row) names that margins requires. The coefficients (and covariance) will be wrong, of course. You do not want any coefficients from regress. That is, you got step 6 wrong! Obviously, step 6 should be the other way round: you want the coefficients from xthtaylor; that is what you are interested in.

            I will still not give you a full example because it is still more cumbersome than typing these answers. Here is the "problem" in a nutshell

            Code:
            . sysuse auto
            (1978 Automobile Data)
            
            . regress price i.foreign
            (output omitted)
            
            . matlist e(b)
            
                         |        0b.         1.          
                         |   foreign    foreign      _cons
            -------------+---------------------------------
                      y1 |         0   312.2587   6072.423
            This is how margins expects e(b) to look like.

            Here is how it looks if we omit factor variable notation

            Code:
            . regress price foreign
            (output omitted)
            
            . matlist e(b)
            
                         |   foreign      _cons
            -------------+----------------------
                      y1 |  312.2587   6072.423
            See how e(b) looks completely different? There are only two columns now and the names are different, too.

            The second matrix is similar to what you get from xthtaylor. You will have to make it match the first matrix.

            Best
            Daniel

            Comment


            • #7
              Thanks Daniel for the additional pointers.

              I think I managed to follow your instructions and got it to work for the simplest case involving the interaction of 2 binary variables.

              Code:
              . // 1. Set up example data
              . sysuse auto, clear
              (1978 Automobile Data)
              
              . generate heavy = 1 if weight >= 3100 & !missing(weight)
              (35 missing values generated)
              
              . replace heavy = 0 if missing(heavy) & !missing(weight)
              (35 real changes made)
              
              . label define heavy 1 "Heavy" 0 "Light"
              . label value heavy heavy
              Code:
              . // 2. The reference model
              . // This reports the correct results from -margins- using factor variables
              . // The aim is to replicate -e(b)- and -e(V)- from this model for other models which do not allow for factor variables
              .
              . regress price i.foreign##i.heavy
              (output omitted)
               
              . margins foreign
              
              Predictive margins                              Number of obs     =         74
              Model VCE    : OLS
              
              Expression   : Linear prediction, predict()
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   foreign |
                 Domestic  |   5582.678   390.5804    14.29   0.000      4803.69    6361.666
                  Foreign  |   9314.784   1010.475     9.22   0.000     7299.454    11330.11
              ------------------------------------------------------------------------------
              
              . margins, dydx(foreign) 
              
              Average marginal effects                        Number of obs     =         74
              Model VCE    : OLS
              
              Expression   : Linear prediction, predict()
              dy/dx w.r.t. : 1.foreign
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   foreign |
                  Foreign  |   3732.106   1083.335     3.45   0.001     1571.463    5892.748
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.
              
              . matrix list e(b)
              (output omitted)
              
              . matrix list e(V)
              (output omitted)
              Code:
              . // 3. The to-be-corrected model
              . // Suppose this model cannot use factor variables
              . // We see it cannot estimate -margins foreign-
              . // And the results from -margins, dydx(foreign)- is incorrect
              
              . generate interact = foreign * heavy
               
              . regress price foreign heavy interact
              (output omitted)
              
              . margins foreign
              factor foreign not found in list of covariates
              r(322);
              
              . margins, dydx(foreign) 
              
              Average marginal effects                        Number of obs     =         74
              Model VCE    : OLS
              
              Expression   : Linear prediction, predict()
              dy/dx w.r.t. : foreign
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   foreign |     1590.1   890.9658     1.78   0.079    -186.8752    3367.075
              ------------------------------------------------------------------------------
              
              . matrix list e(b)
              (output omitted)
              
              . matrix list e(V)
              (output omitted)
              Code:
              . // 4. Create program to align the -e(b)- & -e(V)- matrices' row & column names from the "to-be-corrected" model to the reference model
              . // Reference: https://www.statalist.org/forums/forum/general-stata-discussion/general/316905-mimrgns-interaction-effects?p=318738#post318738
              
              program manual_calculation, eclass
                   tempname b V sample    
                   matrix `b' = e(b)
                   matrix `V' = e(V)
                        
                   matrix rownames `b' = y1
                   matrix colnames `b' = 1.foreign 1.heavy 1.foreign#1.heavy _cons
                        
                   matrix rownames `V' = 1.foreign 1.heavy 1.foreign#1.heavy _cons
                   matrix colnames `V' = 1.foreign 1.heavy 1.foreign#1.heavy _cons
                        
                   generate `sample' = e(sample)
                   ereturn post `b' `V', e(`sample')
               end
              Code:
              . // 5. Repeat the model without factor variables and verify that the margins results are now correct
              
              . regress price foreign heavy interact
              (output omitted)
              
              . manual_calculation
              
              . margins foreign
              Warning: cannot perform check for estimable functions.
              
              Predictive margins                              Number of obs     =         74
              
              Expression   : Linear prediction, predict()
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   foreign |
                 Domestic  |   5582.678   390.5804    14.29   0.000     4817.154    6348.201
                  Foreign  |   9314.784   1010.475     9.22   0.000     7334.288    11295.28
              ------------------------------------------------------------------------------
              
              . margins, dydx(foreign)
              Warning: cannot perform check for estimable functions.
              (note: continuous option implied because a factor with only one level was specified in the dydx() option)
              
              Average marginal effects                        Number of obs     =         74
              
              Expression   : Linear prediction, predict()
              dy/dx w.r.t. : 1.foreign
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   foreign |
                  Foreign  |   3732.106   1083.335     3.45   0.001     1608.809    5855.403
              ------------------------------------------------------------------------------
              Can I please ask

              1) At this point, should I do something to the effect of (cannot use this data as -xthtaylor- needs panel data)?
              Code:
              xthtaylor ....
              
              manual_calculation
              
              margins....
              2) Can you please give me some hints on how to incorporate -regress..., vce(robust)- and therefore -margins..., vce(unconditional)- into your program? I tried to change your program to allow for -vce(unconditional) after running a regression with robust SE but couldn't get it to work.

              Thanks again.

              Comment


              • #8
                Originally posted by Junran Cao View Post
                I think I managed to follow your instructions and got it to work for the simplest case involving the interaction of 2 binary variables.
                I disagree. You got the correct results, this time, but you are not really following my instructions. For one thing, you are trying to adjust the column and row names of the e() matrices; I suggested letting regress create those names and plug-in the corresponding coefficients. The reason I have suggested to do it this way is that I have anticipated what you can now observe: your manual_calculation does not get matrices correct! Please do look at the correct matrices from factor-variable notation, e.g., e(b)

                Code:
                . matlist e(b)
                
                             |        0b.         1.        0b.         1. 0b.fore~n# 0b.fore~n# 1o.fore~n# 1.foreign#          
                             |   foreign    foreign      heavy      heavy   0b.heavy   1o.heavy   0b.heavy    1.heavy      _cons
                -------------+---------------------------------------------------------------------------------------------------
                          y1 |         0     1590.1          0   2654.281          0          0          0   4064.319     4183.8
                Now compare this with the matrices you get from your manual_calculation

                Code:
                .  matlist e(b)
                
                             |         1.         1. 1.foreign#          
                             |   foreign      heavy    1.heavy      _cons
                -------------+--------------------------------------------
                          y1 |    1590.1   2654.281   4064.319     4183.8
                Those two matrices look completely different! You have simply left out all the omitted factors. Those are important for Stata, internally. Note that Stata actually complains about this

                Code:
                (note: continuous option implied because a factor with only one level was specified in the dydx() option)
                You got lucky that the results still match in this case but there really is no guarantee that this approach will get you the correct results in another situation. Let me boil my suggestion down one last time: Make the manually created matrices look exactly like the ones you would get from factor-variable notation. This is really the minimum that is required; even if you get this right, there might be other problems.

                Originally posted by Junran Cao View Post
                1) At this point, should I do something to the effect of (cannot use this data as -xthtaylor- needs panel data)?
                I do not quite understand that question. Sorry.

                Originally posted by Junran Cao View Post
                2) Can you please give me some hints on how to incorporate -regress..., vce(robust)- and therefore -margins..., vce(unconditional)- into your program? I tried to change your program to allow for -vce(unconditional) after running a regression with robust SE but couldn't get it to work.
                What do you mean by "couldn't get it to work"? Did Stata throw an error? Were the results not what you expected? I honestly, never thought about such details (which might be one of the reasons why things could still go wrong even if you get the matrices right). My guess is that you would have to preserve all the other original stuff (i.e., scalars, macros) in e() which for now gets wiped out by manual_calculation.

                Although you are making quite some progress, I have now invested more time than I intended to into what I am still not convinced is a good idea in the first place, so I will bail out here.

                Best
                Daniel

                Comment


                • #9
                  Yes I see how my manually created matrices are incorrect. I was rather overjoyed when the -margins- results matched and prematurely concluded that I had stumbled on the right approach.

                  Not a problem, thanks very much for your help Daniel. I concur this approach is too risky to use for now, although it can probably be deemed a "successful negative result" in that it highlights a potential means of overcoming the no-factor-variables restriction but is arguably inadvisable in most circumstances.

                  Comment

                  Working...
                  X