Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit Regression Results: Suppressor or Confounder Effects?

    Dear members,

    I am currently analyzing the determinants of sustainable investment using probit regressions and have encountered some perplexing results.

    Initially, when I regressed "sustainable concerns" alone, it exhibited a positive and significant impact. However, when I introduced the variable "recycling," the effect of "sustainable concerns" became negative and insignificant. Subsequently, when I added three more variables to the model, "sustainable concerns" remained negative but turned significant, while "recycling" lost its significance.

    These findings are particularly confusing since "sustainable concerns" is expected to have a positive impact on sustainable investment, or at the very least, no impact. The observed negative effect seems counterintuitive.

    I am wondering if this could be indicative of a suppression effect, with "recycling" acting as the suppressor, or if the results suggest that all my variables might be confounding factors?

    I would greatly appreciate your insights or suggestions on how to interpret these results and any guidance on how to address these issues.

    Thank you,

  • #2
    I think the negative coefficient makes sense. The concern coefficient with recycling in the model represents the association between concern controlling for doing something about that concern that is different from and more accessible than SI. If someone is concerned but does not recycle, you wouldn’t expect them to invest. If you added more variables that are correlated with recycling (say donating money to environmental organizations), the model would have trouble separating that, so you would lose significance. But if you test all four non-concern coefficients jointly, you probably won’t be able to reject the null. Jointly, they matter, but you can’t tell which ones are key if they have lots of variation in common.

    I suspect you could confirm this by calculating the marginal effect of concern with recyling set to 1 vs 0.
    Last edited by Dimitriy V. Masterov; 19 Jul 2024, 06:21.

    Comment


    • #3
      Originally posted by Dimitriy V. Masterov View Post
      I think the negative coefficient makes sense. The concern coefficient with recycling in the model represents the association between concern controlling for doing something about that concern that is different from and more accessible than SI. If someone is concerned but does not recycle, you wouldn’t expect them to invest. If you added more variables that are correlated with recycling (say donating money to environmental organizations), the model would have trouble separating that, so you would lose significance. But if you test all four non-concern coefficients jointly, you probably won’t be able to reject the null. Jointly, they matter, but you can’t tell which ones are key if they have lots of variation in common.

      I suspect you could confirm this by calculating the marginal effect of concern with recyling set to 1 vs 0.
      Thank you for your insights!

      The correlation between concerns and recycling is at 0.525.
      The 3 other variables in my model are moderately correlated with both "sustainable concerns" and "recycling," though the correlation is higher with "sustainable concerns" (e.g., importance of sustainable investment, confidence in these investments, PCE). (the highest correlation is 0.4166)

      All variables are measured on a 7-point Likert scale. I transformed "recycling" into a dummy variable based on the median and created a new variable with "recycling" set to 1 and 0. I found that the marginal effects of "sustainable concerns" are negative when the dummy variable for "recycling" is 0 and positive when it is 1, but neither effect is statistically significant.

      Comment


      • #4
        Can you use a bigger change? Say concern going from 1 to 7 rather than increasing by 1 from current level.

        Could you also show your code for the marginal effects?
        Last edited by Dimitriy V. Masterov; 19 Jul 2024, 10:28.

        Comment


        • #5
          Originally posted by Dimitriy V. Masterov View Post
          Can you use a bigger change? Say concern going from 1 to 7 rather than increasing by 1 from current level.

          Could you also show your code for the marginal effects?
          How can I do that if I may ask?
          I don't suppose it's this code, margins, dydx(*) at (sustainable_concerns=7)?

          I run the probit normally, probit SI concerns recycling control variables if recycling==0, robust (and then the same for recycling==1)
          After each model, I use the command: margins, dydx(*)
          Last edited by Serena Menny; 19 Jul 2024, 10:45.

          Comment


          • #6
            Try something like this:

            Code:
            . sysuse auto, clear
            (1978 automobile data)
            
            .
            . // Create a binary variable for high mpg (1 if mpg > 20, 0 otherwise)
            . generate high_mpg = mpg > 20
            
            .
            . // Run a probit regression
            . // Dependent variable: foreign (car origin)
            . // Independent variables:
            . //   i.high_mpg: Indicator variable for high mpg
            . //   c.(rep78 price): Continuous variables rep78 and price
            . probit foreign i.high_mpg c.(rep78 price)
            
            Iteration 0:   log likelihood = -42.400729  
            Iteration 1:   log likelihood = -23.534352  
            Iteration 2:   log likelihood = -22.583291  
            Iteration 3:   log likelihood =  -22.57019  
            Iteration 4:   log likelihood = -22.570177  
            Iteration 5:   log likelihood = -22.570177  
            
            Probit regression                                       Number of obs =     69
                                                                    LR chi2(3)    =  39.66
                                                                    Prob > chi2   = 0.0000
            Log likelihood = -22.570177                             Pseudo R2     = 0.4677
            
            ------------------------------------------------------------------------------
                 foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
              1.high_mpg |   1.432869   .4931867     2.91   0.004      .466241    2.399497
                   rep78 |   1.108671   .2692634     4.12   0.000     .5809246    1.636418
                   price |   .0000662   .0000786     0.84   0.399    -.0000878    .0002202
                   _cons |  -5.871357   1.334911    -4.40   0.000    -8.487735   -3.254979
            ------------------------------------------------------------------------------
            
            .
            . // Calculate average marginal effects of high_mpg
            . // at rep78 values of 1 and 5
            . // post: Post results as estimation results
            . // coeflegend: Display internal names of coefficients
            . margins, dydx(high_mpg) at(rep78 = (1 5)) post // coeflegend
            
            Average marginal effects                                    Number of obs = 69
            Model VCE: OIM
            
            Expression: Pr(foreign), predict()
            dy/dx wrt:  1.high_mpg
            1._at: rep78 = 1
            2._at: rep78 = 5
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
            0.high_mpg   |  (base outcome)
            -------------+----------------------------------------------------------------
            1.high_mpg   |
                     _at |
                      1  |   .0021346   .0052399     0.41   0.684    -.0081353    .0124046
                      2  |   .4013341   .1712985     2.34   0.019     .0655952    .7370729
            ------------------------------------------------------------------------------
            Note: dy/dx for factor levels is the discrete change from the base level.
            
            .
            . // Compute linear combination of marginal effects
            . // Subtract marginal effect of high_mpg when rep78=5
            . // from marginal effect when rep78=1
            . lincom _b[1.high_mpg:2._at] - _b[1.high_mpg:1bn._at]
            
             ( 1)  - [1.high_mpg]1bn._at + [1.high_mpg]2._at = 0
            
            ------------------------------------------------------------------------------
                         | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                     (1) |   .3991994   .1691118     2.36   0.018     .0677465    .7306524
            ------------------------------------------------------------------------------
            
            .
            end of do-file
            Here, we would reject the null that ME of high_mpg @ rep = 5 equals ME of high_mpg @ rep = 1 in favor of the alternative that they are equal.

            Running separate probits is not a good idea for many reasons. The key one is that the difference in the MEs could be driven by other variables (like price in my model) since those may be quite different in the two subsamples.

            Comment


            • #7
              Originally posted by Dimitriy V. Masterov View Post
              Try something like this:

              Code:
              . sysuse auto, clear
              (1978 automobile data)
              
              .
              . // Create a binary variable for high mpg (1 if mpg > 20, 0 otherwise)
              . generate high_mpg = mpg > 20
              
              .
              . // Run a probit regression
              . // Dependent variable: foreign (car origin)
              . // Independent variables:
              . // i.high_mpg: Indicator variable for high mpg
              . // c.(rep78 price): Continuous variables rep78 and price
              . probit foreign i.high_mpg c.(rep78 price)
              
              Iteration 0: log likelihood = -42.400729
              Iteration 1: log likelihood = -23.534352
              Iteration 2: log likelihood = -22.583291
              Iteration 3: log likelihood = -22.57019
              Iteration 4: log likelihood = -22.570177
              Iteration 5: log likelihood = -22.570177
              
              Probit regression Number of obs = 69
              LR chi2(3) = 39.66
              Prob > chi2 = 0.0000
              Log likelihood = -22.570177 Pseudo R2 = 0.4677
              
              ------------------------------------------------------------------------------
              foreign | Coefficient Std. err. z P>|z| [95% conf. interval]
              -------------+----------------------------------------------------------------
              1.high_mpg | 1.432869 .4931867 2.91 0.004 .466241 2.399497
              rep78 | 1.108671 .2692634 4.12 0.000 .5809246 1.636418
              price | .0000662 .0000786 0.84 0.399 -.0000878 .0002202
              _cons | -5.871357 1.334911 -4.40 0.000 -8.487735 -3.254979
              ------------------------------------------------------------------------------
              
              .
              . // Calculate average marginal effects of high_mpg
              . // at rep78 values of 1 and 5
              . // post: Post results as estimation results
              . // coeflegend: Display internal names of coefficients
              . margins, dydx(high_mpg) at(rep78 = (1 5)) post // coeflegend
              
              Average marginal effects Number of obs = 69
              Model VCE: OIM
              
              Expression: Pr(foreign), predict()
              dy/dx wrt: 1.high_mpg
              1._at: rep78 = 1
              2._at: rep78 = 5
              
              ------------------------------------------------------------------------------
              | Delta-method
              | dy/dx std. err. z P>|z| [95% conf. interval]
              -------------+----------------------------------------------------------------
              0.high_mpg | (base outcome)
              -------------+----------------------------------------------------------------
              1.high_mpg |
              _at |
              1 | .0021346 .0052399 0.41 0.684 -.0081353 .0124046
              2 | .4013341 .1712985 2.34 0.019 .0655952 .7370729
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.
              
              .
              . // Compute linear combination of marginal effects
              . // Subtract marginal effect of high_mpg when rep78=5
              . // from marginal effect when rep78=1
              . lincom _b[1.high_mpg:2._at] - _b[1.high_mpg:1bn._at]
              
              ( 1) - [1.high_mpg]1bn._at + [1.high_mpg]2._at = 0
              
              ------------------------------------------------------------------------------
              | Coefficient Std. err. z P>|z| [95% conf. interval]
              -------------+----------------------------------------------------------------
              (1) | .3991994 .1691118 2.36 0.018 .0677465 .7306524
              ------------------------------------------------------------------------------
              
              .
              end of do-file
              Here, we would reject the null that ME of high_mpg @ rep = 5 equals ME of high_mpg @ rep = 1 in favor of the alternative that they are equal.

              Running separate probits is not a good idea for many reasons. The key one is that the difference in the MEs could be driven by other variables (like price in my model) since those may be quite different in the two subsamples.
              Thank youuuu!!

              Click image for larger version

Name:	Capture d'écran 2024-07-20 000735.png
Views:	1
Size:	35.1 KB
ID:	1759293


              This is what I got. (beh is recycling)

              Comment


              • #8
                You wanted the effect of concern going from 1 to 7 while varying binary recycling, so my example was backward.

                Try this:

                Code:
                . sysuse auto, clear
                (1978 automobile data)
                
                . generate high_mpg = mpg > 20
                
                . probit foreign i.high_mpg c.(rep78 price), nolog
                
                Probit regression                                       Number of obs =     69
                                                                        LR chi2(3)    =  39.66
                                                                        Prob > chi2   = 0.0000
                Log likelihood = -22.570177                             Pseudo R2     = 0.4677
                
                ------------------------------------------------------------------------------
                     foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                  1.high_mpg |   1.432869   .4931867     2.91   0.004      .466241    2.399497
                       rep78 |   1.108671   .2692634     4.12   0.000     .5809246    1.636418
                       price |   .0000662   .0000786     0.84   0.399    -.0000878    .0002202
                       _cons |  -5.871357   1.334911    -4.40   0.000    -8.487735   -3.254979
                ------------------------------------------------------------------------------
                
                . margins, at(rep78 = (1 5) high_mpg = (0 1)) post // coeflegend
                
                Predictive margins                                          Number of obs = 69
                Model VCE: OIM
                
                Expression: Pr(foreign), predict()
                1._at: high_mpg = 0
                       rep78    = 1
                2._at: high_mpg = 0
                       rep78    = 5
                3._at: high_mpg = 1
                       rep78    = 1
                4._at: high_mpg = 1
                       rep78    = 5
                
                ------------------------------------------------------------------------------
                             |            Delta-method
                             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         _at |
                          1  |    .000011   .0000432     0.26   0.799    -.0000737    .0000958
                          2  |   .5302059   .1869542     2.84   0.005     .1637823    .8966295
                          3  |   .0021457   .0052769     0.41   0.684    -.0081968    .0124882
                          4  |     .93154   .0536117    17.38   0.000      .826463    1.036617
                ------------------------------------------------------------------------------
                
                . nlcom ///
                > (me_rep78_5_vs_1_at_high_mpg:_b[4._at] - _b[3._at]) ///
                > (me_rep78_5_vs_1_at_low_mpg: _b[2._at] - _b[1._at]) ///
                > (high_low_me_diff:          (_b[4._at] - _b[3._at]) - (_b[2._at] - _b[1._at]))
                
                me_rep78_5~g: _b[4._at] - _b[3._at]
                me_rep78_5~g: _b[2._at] - _b[1._at]
                high_low_m~f: (_b[4._at] - _b[3._at]) - (_b[2._at] - _b[1._at])
                
                ---------------------------------------------------------------------------------------------
                                            | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                ----------------------------+----------------------------------------------------------------
                me_rep78_5_vs_1_at_high_mpg |   .9293943   .0566132    16.42   0.000     .8184346    1.040354
                 me_rep78_5_vs_1_at_low_mpg |   .5301949   .1869614     2.84   0.005     .1637573    .8966324
                           high_low_me_diff |   .3991994   .1691118     2.36   0.018     .0677465    .7306524
                ---------------------------------------------------------------------------------------------
                end of do-file
                
                .

                Comment


                • #9
                  Originally posted by Dimitriy V. Masterov View Post
                  You wanted the effect of concern going from 1 to 7 while varying binary recycling, so my example was backward.

                  Try this:

                  Code:
                  . sysuse auto, clear
                  (1978 automobile data)
                  
                  . generate high_mpg = mpg > 20
                  
                  . probit foreign i.high_mpg c.(rep78 price), nolog
                  
                  Probit regression Number of obs = 69
                  LR chi2(3) = 39.66
                  Prob > chi2 = 0.0000
                  Log likelihood = -22.570177 Pseudo R2 = 0.4677
                  
                  ------------------------------------------------------------------------------
                  foreign | Coefficient Std. err. z P>|z| [95% conf. interval]
                  -------------+----------------------------------------------------------------
                  1.high_mpg | 1.432869 .4931867 2.91 0.004 .466241 2.399497
                  rep78 | 1.108671 .2692634 4.12 0.000 .5809246 1.636418
                  price | .0000662 .0000786 0.84 0.399 -.0000878 .0002202
                  _cons | -5.871357 1.334911 -4.40 0.000 -8.487735 -3.254979
                  ------------------------------------------------------------------------------
                  
                  . margins, at(rep78 = (1 5) high_mpg = (0 1)) post // coeflegend
                  
                  Predictive margins Number of obs = 69
                  Model VCE: OIM
                  
                  Expression: Pr(foreign), predict()
                  1._at: high_mpg = 0
                  rep78 = 1
                  2._at: high_mpg = 0
                  rep78 = 5
                  3._at: high_mpg = 1
                  rep78 = 1
                  4._at: high_mpg = 1
                  rep78 = 5
                  
                  ------------------------------------------------------------------------------
                  | Delta-method
                  | Margin std. err. z P>|z| [95% conf. interval]
                  -------------+----------------------------------------------------------------
                  _at |
                  1 | .000011 .0000432 0.26 0.799 -.0000737 .0000958
                  2 | .5302059 .1869542 2.84 0.005 .1637823 .8966295
                  3 | .0021457 .0052769 0.41 0.684 -.0081968 .0124882
                  4 | .93154 .0536117 17.38 0.000 .826463 1.036617
                  ------------------------------------------------------------------------------
                  
                  . nlcom ///
                  > (me_rep78_5_vs_1_at_high_mpg:_b[4._at] - _b[3._at]) ///
                  > (me_rep78_5_vs_1_at_low_mpg: _b[2._at] - _b[1._at]) ///
                  > (high_low_me_diff: (_b[4._at] - _b[3._at]) - (_b[2._at] - _b[1._at]))
                  
                  me_rep78_5~g: _b[4._at] - _b[3._at]
                  me_rep78_5~g: _b[2._at] - _b[1._at]
                  high_low_m~f: (_b[4._at] - _b[3._at]) - (_b[2._at] - _b[1._at])
                  
                  ---------------------------------------------------------------------------------------------
                  | Coefficient Std. err. z P>|z| [95% conf. interval]
                  ----------------------------+----------------------------------------------------------------
                  me_rep78_5_vs_1_at_high_mpg | .9293943 .0566132 16.42 0.000 .8184346 1.040354
                  me_rep78_5_vs_1_at_low_mpg | .5301949 .1869614 2.84 0.005 .1637573 .8966324
                  high_low_me_diff | .3991994 .1691118 2.36 0.018 .0677465 .7306524
                  ---------------------------------------------------------------------------------------------
                  end of do-file
                  
                  .
                  I cannot thank you enough for your patience!
                  his is what I got, I don't know if I used it right

                  Click image for larger version

Name:	Capture d'écran 2024-07-20 011659.png
Views:	1
Size:	40.5 KB
ID:	1759300
                  Click image for larger version

Name:	Capture d'écran 2024-07-20 011706.png
Views:	1
Size:	23.0 KB
ID:	1759301

                  Comment

                  Working...
                  X