Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to show all 10 values of continuous variable (not truly continuous) in interaction with treatment?

    Dear Statalist,

    I have experimental data on how participants set up and manage financial portfolios. In my regression, I would like to understand better the interaction between the return figure and the treatment.
    Four return numbers are randomly drawn from an urn containing ten return numbers (because there are four asset classes per round). The return numbers are stored as double in Stata. The returns are expressed as percentages (i.e., 0.54 is 0.54%). In my regression so far, I used c.return1##i.T_C; however, the output showed only the coefficient for one return figure, not all ten.
    The binary treatment variable explains where the participant was assigned to be in the treatment group (T_C ==1) or control group (T_C==0). Action is a binary variable that measures whether an action was taken.

    If I omit the c. part in front of the return, I get an error that factor variables may not contain noninteger values.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double return1
                   .54
                   .14
                  1.63
                  1.19
    .45999999999999996
                  1.19
    -.4699999999999999
                   .14
                  -.31
                  1.63
                   .54
                  1.24
    .45999999999999996
                  1.24
                   .37
                  -.21
                   .37
                  -.31
                  1.24
                  1.63
                  1.24
                  -.21
                   .54
    -.4699999999999999
                  -.21
    -.4699999999999999
                  1.24
                   .14
                   .37
                  -.21
    .45999999999999996
                  1.19
                  1.63
    -.4699999999999999
                   .37
    .45999999999999996
                  1.24
                   .14
                  1.63
    .45999999999999996
                  -.21
                   .37
                   .54
                  1.19
                  -.21
                   .14
                  1.24
                  1.19
                   .14
                   .37
                  1.19
                  -.31
                  1.63
    -.4699999999999999
                   .54
                  -.31
                  -.21
                  1.24
    -.4699999999999999
                   .37
                  -.31
                   .54
                   .14
                  1.24
    .45999999999999996
                  -.21
                  1.19
    -.4699999999999999
                   .54
                  -.21
                  1.19
                  1.24
                   .37
                  -.21
                  -.31
    .45999999999999996
                   .37
    .45999999999999996
                  1.19
                  1.63
                  -.31
                   .54
    -.4699999999999999
                  1.24
                  -.31
    .45999999999999996
                  1.63
                  1.19
                  1.24
                  1.63
                   .54
                   .37
                   .54
                  -.31
                  1.19
                   .14
                   .37
                   .14
    .45999999999999996
                  -.21
    end
    Code:
     
    
     logit action c.return1##i.T_C, vce(cluster CASE)
    
    Iteration 0:   log pseudolikelihood = -511.78245  
    Iteration 1:   log pseudolikelihood = -498.77008  
    Iteration 2:   log pseudolikelihood = -498.51032  
    Iteration 3:   log pseudolikelihood = -498.50994  
    Iteration 4:   log pseudolikelihood = -498.50994  
    
    Logistic regression                                     Number of obs =    920
                                                            Wald chi2(3)  =  18.68
                                                            Prob > chi2   = 0.0003
    Log pseudolikelihood = -498.50994                       Pseudo R2     = 0.0259
    
                                     (Std. err. adjusted for 230 clusters in CASE)
    ------------------------------------------------------------------------------
                 |               Robust
        action | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         return1 |   .3855761   .1633776     2.36   0.018     .0653619    .7057903
                 |
             T_C |
             TG  |  -.6312159   .2128661    -2.97   0.003    -1.048426   -.2140061
                 |
      T_C#c.return1 |
             TG  |  -.2061176   .2225996    -0.93   0.354    -.6424049    .2301696
                 |
           _cons |   1.365024   .1642466     8.31   0.000     1.043106    1.686941
    ------------------------------------------------------------------------------

    Next, I tried to use if statements using only one particular return number, such as but the for me most important information was omitted.

    Code:
     logit action c.return1##i.T_C if return1 == .54, vce(cluster CASE)
    
    note: return1 omitted because of collinearity.
    note: 1.T_C#c.return1 omitted because of collinearity.
    Iteration 0:   log pseudolikelihood = -55.355109  
    Iteration 1:   log pseudolikelihood = -55.312488  
    Iteration 2:   log pseudolikelihood = -55.312477  
    Iteration 3:   log pseudolikelihood = -55.312477  
    
    Logistic regression                                     Number of obs =     97
                                                            Wald chi2(1)  =   0.08
                                                            Prob > chi2   = 0.7715
    Log pseudolikelihood = -55.312477                       Pseudo R2     = 0.0008
    
                                      (Std. err. adjusted for 97 clusters in CASE)
    ------------------------------------------------------------------------------
                 |               Robust
        action  | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         return1 |          0  (omitted)
                 |
             T_C |
             TG  |  -.1356126   .4670154    -0.29   0.772    -1.050946    .7797207
                 |
     T_C#c.return1 |
             TG  |          0  (omitted)
                 |
           _cons |   1.126011   .3339311     3.37   0.001     .4715184    1.780504
    ------------------------------------------------------------------------------

    I would be most grateful for any advice on how to deal with this matter.
    In addition, I would appreciate any guidance on why two coefficients individually are significant, however, their interaction is not.

    Thank you very much!

  • #2
    I would like to understand better the interaction between the return figure and the treatment.
    Code:
    levelsof return1, local(returns)
    logit action c.return1##i.T_C, vce(cluster CASE)
    margins, dydx(T_C) at(return1 = (`returns'))
    marginsplot
    This code will show you, in a table and graphically, the modeled average treatment effect (difference between probability of action in the treatment group and probability of action in the control group) at each level of return1 used in your study.

    I would appreciate any guidance on why two coefficients individually are significant, however, their interaction is not.
    The interaction of two variables is a measure of the extent to which the effect of one variable depends on the value of the other. If the treatment effect is the same regardless of which value of return1 was used, then the interaction is zero.

    Some additional thoughts. Interactions involving continuous variables are somewhat problematic. By using c.return1##i.T_C, you are stipulating that the treatment effect depends linearly on the value of return. This is a rather stringent constraint that may or may not be suitable for your situation. Is that a reasonable model from a conceptual/theoretical perspective? If there is no theoretical or conceptual basis for even answering that question, then you should at least verify that your logistic model is a reasonable fit to the data. -estat gof, group(10) table- will give you a display of how well the predicted probabilities of action match the observed.

    If there is neither theoretical nor empirical support for this model, then perhaps you need to graph the observed (not modeled) differences in action probability between the treatment and control groups at each level of return1 and graph that to see how you might improve the model with some transform of the return1 variable.

    Comment


    • #3
      Dear Clyde,

      Thank you so much for your help.

      Code:
      levelsof return1, local(returns)
      -.4699999999999999 -.31 -.21 .14 .37 .46 .54 1.19 1.24 1.63
      Code:
       logit action c.return1##i.T_C, vce(cluster CASE)
      
      Iteration 0:   log pseudolikelihood = -511.78245  
      Iteration 1:   log pseudolikelihood = -498.77008  
      Iteration 2:   log pseudolikelihood = -498.51032  
      Iteration 3:   log pseudolikelihood = -498.50994  
      Iteration 4:   log pseudolikelihood = -498.50994  
      
      Logistic regression                                     Number of obs =    920
                                                              Wald chi2(3)  =  18.68
                                                              Prob > chi2   = 0.0003
      Log pseudolikelihood = -498.50994                       Pseudo R2     = 0.0259
      
                                       (Std. err. adjusted for 230 clusters in CASE)
      ------------------------------------------------------------------------------
                   |               Robust
            action | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           return1 |   .3855761   .1633776     2.36   0.018     .0653619    .7057903
                   |
               T_C |
               TG  |  -.6312159   .2128661    -2.97   0.003    -1.048426   -.2140061
                   |
      T_C#c.return1|
               TG  |  -.2061176   .2225996    -0.93   0.354    -.6424049    .2301696
                   |
             _cons |   1.365024   .1642466     8.31   0.000     1.043106    1.686941
      ------------------------------------------------------------------------------
      When I used
      Code:
       margins, dydx(T_C) at(return1 = (`returns'))
      I received an error "invalid numlist has too few elements r(122);"

      hence, I used
      Code:
       margins, dydx(T_C) at(return1 = (-.4699999999999999 -.31 -.21 .14 .37 .46 .54 1.19 1.24 1.63))
      Code:
      Conditional marginal effects                               Number of obs = 920
      Model VCE: Robust
      
      Expression: Pr(rebal_sa), predict()
      dy/dx wrt:  1.T_C
      1._at:  return1 = -.47
      2._at:  return1 = -.31
      3._at:  return1 = -.21
      4._at:  return1 =  .14
      5._at:  return1 =  .37
      6._at:  return1 =  .46
      7._at:  return1 =  .54
      8._at:  return1 = 1.19
      9._at:  return1 = 1.24
      10._at: return1 = 1.63
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      0.T_C        |  (base outcome)
      -------------+----------------------------------------------------------------
      1.T_C        |
               _at |
                1  |  -.1087416    .054223    -2.01   0.045    -.2150168   -.0024664
                2  |  -.1131877   .0485318    -2.33   0.020    -.2083082   -.0180672
                3  |  -.1158115   .0453642    -2.55   0.011    -.2047237   -.0268994
                4  |  -.1240605   .0371875    -3.34   0.001    -.1969466   -.0511743
                5  |  -.1286993   .0347403    -3.70   0.000    -.1967891   -.0606095
                6  |  -.1303489   .0344553    -3.78   0.000    -.1978799   -.0628178
                7  |  -.1317381   .0345045    -3.82   0.000    -.1993656   -.0641106
                8  |  -.1404439   .0424476    -3.31   0.001    -.2236396   -.0572483
                9  |  -.1409317   .0433892    -3.25   0.001     -.225973   -.0558905
               10  |  -.1439114   .0512335    -2.81   0.005    -.2443272   -.0434957
      ------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.
      Click image for larger version

Name:	screenshot marginsplot return T_C.png
Views:	1
Size:	33.8 KB
ID:	1725461


      Regarding the Hosmer – Lemeshow goodness-of-fit test, I obtained the following:
      Code:
      . estat gof, group(10) table
      note: obs collapsed on 10 quantiles of estimated probabilities.
      
      Goodness-of-fit test after logistic model
      Variable: action
      
        Table collapsed on quantiles of estimated probabilities
        +--------------------------------------------------------+
        | Group |   Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
        |-------+--------+-------+-------+-------+-------+-------|
        |     1 | 0.6633 |    61 |  62.7 |    34 |  32.3 |    95 |
        |     2 | 0.6811 |    66 |  67.3 |    34 |  32.7 |   100 |
        |     3 | 0.6935 |    60 |  58.8 |    25 |  26.2 |    85 |
        |     4 | 0.7206 |    72 |  68.7 |    25 |  28.3 |    97 |
        |     5 | 0.7362 |    65 |  66.4 |    26 |  24.6 |    91 |
        |-------+--------+-------+-------+-------+-------+-------|
        |     6 | 0.7765 |    72 |  71.7 |    21 |  21.3 |    93 |
        |     7 | 0.8052 |    68 |  66.7 |    16 |  17.3 |    84 |
        |     8 | 0.8282 |   108 | 113.7 |    30 |  24.3 |   138 |
        |     9 | 0.8610 |    43 |  40.5 |     4 |   6.5 |    47 |
        |    10 | 0.8801 |    80 |  78.5 |    10 |  11.5 |    90 |
        +--------------------------------------------------------+
      
       Number of observations =    920
             Number of groups =     10
      Hosmer–Lemeshow chi2(8) =   4.05
                  Prob > chi2 = 0.8522
      According to the internet manual (https://www.stata.com/manuals13/restatgof.pdf), I cannot reject my model since the probability is not close to zero. Regarding a possible transformation of the return numbers, they used to be in decimal form, but since I am doing marginal analysis, my supervisor recommended I multiply the return by 100 and replace them, such that the returns are expressed in percentages. I'm unsure if other transformations, such as taking squares or transforming to log will really be useful for returns.

      Please find attached a screenshot of what I believe are the observed actions over treatment and control over the 10 different returns. I used
      Code:
       graph bar action, over(return1) over(T_C) blabel(total)
      Click image for larger version

Name:	screen2.png
Views:	1
Size:	32.0 KB
ID:	1725462


      Thank you so very much for your guidance and help.

      Comment


      • #4
        I received an error "invalid numlist has too few elements r(122);"
        I suspect this problem arose because you tried to run the code one line at a time. Because the code involves use of local macros, you can't do that. The code must be run without interruption. A local macro exists only in the space of the program in which it is defined. "Program" in this context also means a line or group of lines that are run by highlighting them in the do-editor and then running that. After the highlighted line(s) have finished running, any local macros defined within them disappear. So if the -levelsof- command, which defines local macro returns, was run either by itself or in a block of lines that ended before the -margins- command, by the time you get to the -margins- command, local macro returns no longer exists, which accounts for the message you received.

        The table you got from -estat gof- suggests that your model fits the data rather nicely. So I wouldn't pursue any further transforms like logs or quadratics. (Transforming by multiplying by 100 is, admittedly, a transformation, but a cosmetic one that won't change anything of substance. Feel free to do that.)

        Comment


        • #5
          Thank you for the information, Clyde!
          Yes, that's true. I entered every line individually.

          May I please kindly ask you two questions regarding the interpretation of the marginal effects analysis and returns?

          Since every return is significant, can I conclude that, for example, facing a return of -0.47% (first return line in the margins' analysis) and being in the treatment group decreases the likelihood of an action in the portfolio by 11 percentage points compared to the same return in the control group? And so on for the other returns?

          Code:
          Conditional marginal effects                               Number of obs = 920
          Model VCE: Robust
          
          Expression: Pr(action), predict()
          dy/dx wrt:  1.T_C
          1._at:  return1 = -.47
          2._at:  return1 = -.31
          3._at:  return1 = -.21
          4._at:  return1 =  .14
          5._at:  return1 =  .37
          6._at:  return1 =  .46
          7._at:  return1 =  .54
          8._at:  return1 = 1.19
          9._at:  return1 = 1.24
          10._at: return1 = 1.63
          
          ------------------------------------------------------------------------------
                       |            Delta-method
                       |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
          0.T_C        |  (base outcome)
          -------------+----------------------------------------------------------------
          1.T_C        |
                   _at |
                    1  |  -.1087416    .054223    -2.01   0.045    -.2150168   -.0024664
                    2  |  -.1131877   .0485318    -2.33   0.020    -.2083082   -.0180672
                    3  |  -.1158115   .0453642    -2.55   0.011    -.2047237   -.0268994
                    4  |  -.1240605   .0371875    -3.34   0.001    -.1969466   -.0511743
                    5  |  -.1286993   .0347403    -3.70   0.000    -.1967891   -.0606095
                    6  |  -.1303489   .0344553    -3.78   0.000    -.1978799   -.0628178
                    7  |  -.1317381   .0345045    -3.82   0.000    -.1993656   -.0641106
                    8  |  -.1404439   .0424476    -3.31   0.001    -.2236396   -.0572483
                    9  |  -.1409317   .0433892    -3.25   0.001     -.225973   -.0558905
                   10  |  -.1439114   .0512335    -2.81   0.005    -.2443272   -.0434957
          ------------------------------------------------------------------------------
          Note: dy/dx for factor levels is the discrete change from the base level.

          Secondly, I tried including the other three asset classes (and their interaction with treatment/control) in the logit model, which worked. However, a margins' analysis failed, as

          Code:
          margins, dydx(T_C) at return1 = (-.4699999999999999 -.31 -.21 .14 .37 .46 .54 1.19 1.24 1.63) at return2 = (-18.26 -12.35 2.65 3.55 6.87 9.56 12.51 15.79 25.48)
          option at not allowed
          Would it make sense to have four individual marginplots (each only with one return series) and then overlap them, or would this change the results or even an interpretation? In essence, I hope to understand the impact of all four returns per round on action (simultaneously?).


          Comment


          • #6
            The error message you are getting with -margins- is because of the way you wrote the -at- option. It isn't -at return1 = (-.46999999999 ... -, it's -at(return1 = (-4.6999999999....-. So the variable name goes inside the parentheses, and then the list of values is further enclosed in parentheses.

            I imagine you are working in the world of finance, and the null hypothesis significance testing paradigm reigns supreme there. So within that framework, your conclusion "in the treatment group decreases the likelihood of an action in the portfolio by 11 percentage points compared to the same return in the control group" is correct.

            I don't understand the data design for the four asset-class version. Was each person given the returns on all four assets in a single trial and then had the option, just once, to take or not take an action? If so, then your design makes sense to me. But if each person had four separate trials, one for each asset class, and was just given the return on that asset class in each trial and given, at each of the four trials, an option to take an action, then I don't see this data design as appropriate. And I'm reluctant to advise you on the best use of the -margins- command for this because I don't even know what the design is, and I don't know how you did the regression.

            Comment


            • #7
              Dear Clyde,
              Thank you so very much for the improvement of my margins command; I am now able to analyze marginal effects for more than one return series.

              Yes, that is correct, I am working in finance - Thank you for confirming the correct interpretation.

              I'm sorry, I should have been more clearer. In the experiment, the participants were endowed with capital to be invested in four different scenarios (each scenario is a single trial). Each scenario mimicks the business cycle. In each scenario, the task was to invest the capital fully across the four different asset classes. On the next screen, we simulated (using real returns) how the investments developed and asked to participants to decide how to proceed (if they wanted to take an action).

              So yes to the first part of your statement - four asset classes were presented in a single trial - one action per trial. Then we presented the next scenario, again asked for a portfolio, presented four new asset class returns, simulated the ending value of the investments in each asset class, and asked for any action to be taken. After this, the third scenario was presented, followed by the fourth.

              Thank you very much for all your support, guidance, and knowledge!

              Comment

              Working...
              X