How to show all 10 values of continuous variable (not truly continuous) in interaction with treatment?

Scott Forrester

Join Date: Aug 2023
Posts: 40

How to show all 10 values of continuous variable (not truly continuous) in interaction with treatment?

29 Aug 2023, 12:07

Dear Statalist,

I have experimental data on how participants set up and manage financial portfolios. In my regression, I would like to understand better the interaction between the return figure and the treatment.
Four return numbers are randomly drawn from an urn containing ten return numbers (because there are four asset classes per round). The return numbers are stored as double in Stata. The returns are expressed as percentages (i.e., 0.54 is 0.54%). In my regression so far, I used c.return1##i.T_C; however, the output showed only the coefficient for one return figure, not all ten.
The binary treatment variable explains where the participant was assigned to be in the treatment group (T_C ==1) or control group (T_C==0). Action is a binary variable that measures whether an action was taken.

If I omit the c. part in front of the return, I get an error that factor variables may not contain noninteger values.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double return1
               .54
               .14
              1.63
              1.19
.45999999999999996
              1.19
-.4699999999999999
               .14
              -.31
              1.63
               .54
              1.24
.45999999999999996
              1.24
               .37
              -.21
               .37
              -.31
              1.24
              1.63
              1.24
              -.21
               .54
-.4699999999999999
              -.21
-.4699999999999999
              1.24
               .14
               .37
              -.21
.45999999999999996
              1.19
              1.63
-.4699999999999999
               .37
.45999999999999996
              1.24
               .14
              1.63
.45999999999999996
              -.21
               .37
               .54
              1.19
              -.21
               .14
              1.24
              1.19
               .14
               .37
              1.19
              -.31
              1.63
-.4699999999999999
               .54
              -.31
              -.21
              1.24
-.4699999999999999
               .37
              -.31
               .54
               .14
              1.24
.45999999999999996
              -.21
              1.19
-.4699999999999999
               .54
              -.21
              1.19
              1.24
               .37
              -.21
              -.31
.45999999999999996
               .37
.45999999999999996
              1.19
              1.63
              -.31
               .54
-.4699999999999999
              1.24
              -.31
.45999999999999996
              1.63
              1.19
              1.24
              1.63
               .54
               .37
               .54
              -.31
              1.19
               .14
               .37
               .14
.45999999999999996
              -.21
end

Code:

 

 logit action c.return1##i.T_C, vce(cluster CASE)

Iteration 0:   log pseudolikelihood = -511.78245  
Iteration 1:   log pseudolikelihood = -498.77008  
Iteration 2:   log pseudolikelihood = -498.51032  
Iteration 3:   log pseudolikelihood = -498.50994  
Iteration 4:   log pseudolikelihood = -498.50994  

Logistic regression                                     Number of obs =    920
                                                        Wald chi2(3)  =  18.68
                                                        Prob > chi2   = 0.0003
Log pseudolikelihood = -498.50994                       Pseudo R2     = 0.0259

                                 (Std. err. adjusted for 230 clusters in CASE)
------------------------------------------------------------------------------
             |               Robust
    action | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     return1 |   .3855761   .1633776     2.36   0.018     .0653619    .7057903
             |
         T_C |
         TG  |  -.6312159   .2128661    -2.97   0.003    -1.048426   -.2140061
             |
  T_C#c.return1 |
         TG  |  -.2061176   .2225996    -0.93   0.354    -.6424049    .2301696
             |
       _cons |   1.365024   .1642466     8.31   0.000     1.043106    1.686941
------------------------------------------------------------------------------

Next, I tried to use if statements using only one particular return number, such as but the for me most important information was omitted.

Code:

 logit action c.return1##i.T_C if return1 == .54, vce(cluster CASE)

note: return1 omitted because of collinearity.
note: 1.T_C#c.return1 omitted because of collinearity.
Iteration 0:   log pseudolikelihood = -55.355109  
Iteration 1:   log pseudolikelihood = -55.312488  
Iteration 2:   log pseudolikelihood = -55.312477  
Iteration 3:   log pseudolikelihood = -55.312477  

Logistic regression                                     Number of obs =     97
                                                        Wald chi2(1)  =   0.08
                                                        Prob > chi2   = 0.7715
Log pseudolikelihood = -55.312477                       Pseudo R2     = 0.0008

                                  (Std. err. adjusted for 97 clusters in CASE)
------------------------------------------------------------------------------
             |               Robust
    action  | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     return1 |          0  (omitted)
             |
         T_C |
         TG  |  -.1356126   .4670154    -0.29   0.772    -1.050946    .7797207
             |
 T_C#c.return1 |
         TG  |          0  (omitted)
             |
       _cons |   1.126011   .3339311     3.37   0.001     .4715184    1.780504
------------------------------------------------------------------------------

I would be most grateful for any advice on how to deal with this matter.
In addition, I would appreciate any guidance on why two coefficients individually are significant, however, their interaction is not.

Thank you very much!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

29 Aug 2023, 13:26

I would like to understand better the interaction between the return figure and the treatment.

Code:

levelsof return1, local(returns) logit action c.return1##i.T_C, vce(cluster CASE) margins, dydx(T_C) at(return1 = (`returns')) marginsplot

This code will show you, in a table and graphically, the modeled average treatment effect (difference between probability of action in the treatment group and probability of action in the control group) at each level of return1 used in your study.

I would appreciate any guidance on why two coefficients individually are significant, however, their interaction is not.

The interaction of two variables is a measure of the extent to which the effect of one variable depends on the value of the other. If the treatment effect is the same regardless of which value of return1 was used, then the interaction is zero.

Some additional thoughts. Interactions involving continuous variables are somewhat problematic. By using c.return1##i.T_C, you are stipulating that the treatment effect depends linearly on the value of return. This is a rather stringent constraint that may or may not be suitable for your situation. Is that a reasonable model from a conceptual/theoretical perspective? If there is no theoretical or conceptual basis for even answering that question, then you should at least verify that your logistic model is a reasonable fit to the data. -estat gof, group(10) table- will give you a display of how well the predicted probabilities of action match the observed.

If there is neither theoretical nor empirical support for this model, then perhaps you need to graph the observed (not modeled) differences in action probability between the treatment and control groups at each level of return1 and graph that to see how you might improve the model with some transform of the return1 variable.
Comment

Scott Forrester

Join Date: Aug 2023
Posts: 40

30 Aug 2023, 03:17

Dear Clyde,

Thank you so much for your help.

Code:

levelsof return1, local(returns)
-.4699999999999999 -.31 -.21 .14 .37 .46 .54 1.19 1.24 1.63

Code:

 logit action c.return1##i.T_C, vce(cluster CASE)

Iteration 0:   log pseudolikelihood = -511.78245  
Iteration 1:   log pseudolikelihood = -498.77008  
Iteration 2:   log pseudolikelihood = -498.51032  
Iteration 3:   log pseudolikelihood = -498.50994  
Iteration 4:   log pseudolikelihood = -498.50994  

Logistic regression                                     Number of obs =    920
                                                        Wald chi2(3)  =  18.68
                                                        Prob > chi2   = 0.0003
Log pseudolikelihood = -498.50994                       Pseudo R2     = 0.0259

                                 (Std. err. adjusted for 230 clusters in CASE)
------------------------------------------------------------------------------
             |               Robust
      action | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     return1 |   .3855761   .1633776     2.36   0.018     .0653619    .7057903
             |
         T_C |
         TG  |  -.6312159   .2128661    -2.97   0.003    -1.048426   -.2140061
             |
T_C#c.return1|
         TG  |  -.2061176   .2225996    -0.93   0.354    -.6424049    .2301696
             |
       _cons |   1.365024   .1642466     8.31   0.000     1.043106    1.686941
------------------------------------------------------------------------------

When I used

Code:

 margins, dydx(T_C) at(return1 = (`returns'))

I received an error "invalid numlist has too few elements r(122);"

hence, I used

Code:

 margins, dydx(T_C) at(return1 = (-.4699999999999999 -.31 -.21 .14 .37 .46 .54 1.19 1.24 1.63))

Code:

Conditional marginal effects                               Number of obs = 920
Model VCE: Robust

Expression: Pr(rebal_sa), predict()
dy/dx wrt:  1.T_C
1._at:  return1 = -.47
2._at:  return1 = -.31
3._at:  return1 = -.21
4._at:  return1 =  .14
5._at:  return1 =  .37
6._at:  return1 =  .46
7._at:  return1 =  .54
8._at:  return1 = 1.19
9._at:  return1 = 1.24
10._at: return1 = 1.63

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.T_C        |  (base outcome)
-------------+----------------------------------------------------------------
1.T_C        |
         _at |
          1  |  -.1087416    .054223    -2.01   0.045    -.2150168   -.0024664
          2  |  -.1131877   .0485318    -2.33   0.020    -.2083082   -.0180672
          3  |  -.1158115   .0453642    -2.55   0.011    -.2047237   -.0268994
          4  |  -.1240605   .0371875    -3.34   0.001    -.1969466   -.0511743
          5  |  -.1286993   .0347403    -3.70   0.000    -.1967891   -.0606095
          6  |  -.1303489   .0344553    -3.78   0.000    -.1978799   -.0628178
          7  |  -.1317381   .0345045    -3.82   0.000    -.1993656   -.0641106
          8  |  -.1404439   .0424476    -3.31   0.001    -.2236396   -.0572483
          9  |  -.1409317   .0433892    -3.25   0.001     -.225973   -.0558905
         10  |  -.1439114   .0512335    -2.81   0.005    -.2443272   -.0434957
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Click image for larger version

Name: screenshot marginsplot return T_C.png
Views: 1
Size: 33.8 KB
ID: 1725461

Regarding the Hosmer – Lemeshow goodness-of-fit test, I obtained the following:

Code:

. estat gof, group(10) table
note: obs collapsed on 10 quantiles of estimated probabilities.

Goodness-of-fit test after logistic model
Variable: action

  Table collapsed on quantiles of estimated probabilities
  +--------------------------------------------------------+
  | Group |   Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
  |-------+--------+-------+-------+-------+-------+-------|
  |     1 | 0.6633 |    61 |  62.7 |    34 |  32.3 |    95 |
  |     2 | 0.6811 |    66 |  67.3 |    34 |  32.7 |   100 |
  |     3 | 0.6935 |    60 |  58.8 |    25 |  26.2 |    85 |
  |     4 | 0.7206 |    72 |  68.7 |    25 |  28.3 |    97 |
  |     5 | 0.7362 |    65 |  66.4 |    26 |  24.6 |    91 |
  |-------+--------+-------+-------+-------+-------+-------|
  |     6 | 0.7765 |    72 |  71.7 |    21 |  21.3 |    93 |
  |     7 | 0.8052 |    68 |  66.7 |    16 |  17.3 |    84 |
  |     8 | 0.8282 |   108 | 113.7 |    30 |  24.3 |   138 |
  |     9 | 0.8610 |    43 |  40.5 |     4 |   6.5 |    47 |
  |    10 | 0.8801 |    80 |  78.5 |    10 |  11.5 |    90 |
  +--------------------------------------------------------+

 Number of observations =    920
       Number of groups =     10
Hosmer–Lemeshow chi2(8) =   4.05
            Prob > chi2 = 0.8522

According to the internet manual (https://www.stata.com/manuals13/restatgof.pdf), I cannot reject my model since the probability is not close to zero. Regarding a possible transformation of the return numbers, they used to be in decimal form, but since I am doing marginal analysis, my supervisor recommended I multiply the return by 100 and replace them, such that the returns are expressed in percentages. I'm unsure if other transformations, such as taking squares or transforming to log will really be useful for returns.

Please find attached a screenshot of what I believe are the observed actions over treatment and control over the 10 different returns. I used

Code:

 graph bar action, over(return1) over(T_C) blabel(total)

Click image for larger version

Name: screen2.png
Views: 1
Size: 32.0 KB
ID: 1725462

Thank you so very much for your guidance and help.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

30 Aug 2023, 09:01

I received an error "invalid numlist has too few elements r(122);"

I suspect this problem arose because you tried to run the code one line at a time. Because the code involves use of local macros, you can't do that. The code must be run without interruption. A local macro exists only in the space of the program in which it is defined. "Program" in this context also means a line or group of lines that are run by highlighting them in the do-editor and then running that. After the highlighted line(s) have finished running, any local macros defined within them disappear. So if the -levelsof- command, which defines local macro returns, was run either by itself or in a block of lines that ended before the -margins- command, by the time you get to the -margins- command, local macro returns no longer exists, which accounts for the message you received.

The table you got from -estat gof- suggests that your model fits the data rather nicely. So I wouldn't pursue any further transforms like logs or quadratics. (Transforming by multiplying by 100 is, admittedly, a transformation, but a cosmetic one that won't change anything of substance. Feel free to do that.)
Comment

Scott Forrester

Join Date: Aug 2023
Posts: 40

30 Aug 2023, 10:59

Thank you for the information, Clyde!
Yes, that's true. I entered every line individually.

May I please kindly ask you two questions regarding the interpretation of the marginal effects analysis and returns?

Since every return is significant, can I conclude that, for example, facing a return of -0.47% (first return line in the margins' analysis) and being in the treatment group decreases the likelihood of an action in the portfolio by 11 percentage points compared to the same return in the control group? And so on for the other returns?

Code:

Conditional marginal effects                               Number of obs = 920
Model VCE: Robust

Expression: Pr(action), predict()
dy/dx wrt:  1.T_C
1._at:  return1 = -.47
2._at:  return1 = -.31
3._at:  return1 = -.21
4._at:  return1 =  .14
5._at:  return1 =  .37
6._at:  return1 =  .46
7._at:  return1 =  .54
8._at:  return1 = 1.19
9._at:  return1 = 1.24
10._at: return1 = 1.63

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.T_C        |  (base outcome)
-------------+----------------------------------------------------------------
1.T_C        |
         _at |
          1  |  -.1087416    .054223    -2.01   0.045    -.2150168   -.0024664
          2  |  -.1131877   .0485318    -2.33   0.020    -.2083082   -.0180672
          3  |  -.1158115   .0453642    -2.55   0.011    -.2047237   -.0268994
          4  |  -.1240605   .0371875    -3.34   0.001    -.1969466   -.0511743
          5  |  -.1286993   .0347403    -3.70   0.000    -.1967891   -.0606095
          6  |  -.1303489   .0344553    -3.78   0.000    -.1978799   -.0628178
          7  |  -.1317381   .0345045    -3.82   0.000    -.1993656   -.0641106
          8  |  -.1404439   .0424476    -3.31   0.001    -.2236396   -.0572483
          9  |  -.1409317   .0433892    -3.25   0.001     -.225973   -.0558905
         10  |  -.1439114   .0512335    -2.81   0.005    -.2443272   -.0434957
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Secondly, I tried including the other three asset classes (and their interaction with treatment/control) in the logit model, which worked. However, a margins' analysis failed, as

Code:

margins, dydx(T_C) at return1 = (-.4699999999999999 -.31 -.21 .14 .37 .46 .54 1.19 1.24 1.63) at return2 = (-18.26 -12.35 2.65 3.55 6.87 9.56 12.51 15.79 25.48)
option at not allowed

Would it make sense to have four individual marginplots (each only with one return series) and then overlap them, or would this change the results or even an interpretation? In essence, I hope to understand the impact of all four returns per round on action (simultaneously?).

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

30 Aug 2023, 11:17

The error message you are getting with -margins- is because of the way you wrote the -at- option. It isn't -at return1 = (-.46999999999 ... -, it's -at(return1 = (-4.6999999999....-. So the variable name goes inside the parentheses, and then the list of values is further enclosed in parentheses.

I imagine you are working in the world of finance, and the null hypothesis significance testing paradigm reigns supreme there. So within that framework, your conclusion "in the treatment group decreases the likelihood of an action in the portfolio by 11 percentage points compared to the same return in the control group" is correct.

I don't understand the data design for the four asset-class version. Was each person given the returns on all four assets in a single trial and then had the option, just once, to take or not take an action? If so, then your design makes sense to me. But if each person had four separate trials, one for each asset class, and was just given the return on that asset class in each trial and given, at each of the four trials, an option to take an action, then I don't see this data design as appropriate. And I'm reluctant to advise you on the best use of the -margins- command for this because I don't even know what the design is, and I don't know how you did the regression.
Comment
Scott Forrester

Join Date: Aug 2023

Posts: 40
#7

31 Aug 2023, 03:32

Dear Clyde,
Thank you so very much for the improvement of my margins command; I am now able to analyze marginal effects for more than one return series.

Yes, that is correct, I am working in finance - Thank you for confirming the correct interpretation.

I'm sorry, I should have been more clearer. In the experiment, the participants were endowed with capital to be invested in four different scenarios (each scenario is a single trial). Each scenario mimicks the business cycle. In each scenario, the task was to invest the capital fully across the four different asset classes. On the next screen, we simulated (using real returns) how the investments developed and asked to participants to decide how to proceed (if they wanted to take an action).

So yes to the first part of your statement - four asset classes were presented in a single trial - one action per trial. Then we presented the next scenario, again asked for a portfolio, presented four new asset class returns, simulated the ending value of the investments in each asset class, and asked for any action to be taken. After this, the third scenario was presented, followed by the fourth.

Thank you very much for all your support, guidance, and knowledge!
Comment

Announcement

How to show all 10 values of continuous variable (not truly continuous) in interaction with treatment?

Comment

Comment

Comment

Comment

Comment

Comment