Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do the Interaction plot with continuous variables and constrain other control variables at mean

    To whom can solve this problem:

    Please kindly help! I need to draw an interaction plot figure to see the moderator effect. However, I tried codes such as "margins" according to the state help guide but it did not work... Also I need to do additional control but do not know how...The data seting is as follows:

    I use "xtgls" to regress "y x1 x2 x1*x2 x3 x4", among which x1 x2 are independent variables, and x3 and x4 are controlled variables. x2 moderates the relationship between x1 and y.

    x1 and x2 are continuous variables. x1 ranges from -0.36 to 0.06. x2 ranges from -0.71 to 1.42.

    Except for x1 and x2, other variables (x3 and x4) need to be constrained to their mean values.

    It is expected that x-axis is for x1, and y-axis is for y; there are two lines in the plane, showing x1*high-level_x2's relationship with y and x1*low-level_x2's relationship with y respectively.

    It is also expected that there would be a second figure with the differences between the two lines in the first figure at the y-axis and with x1 at the x-axis.

    Both figures need labels of the variables at their places.

    Please help with this. Looking for your guidance!

    Thank you!

  • #2
    Well, everything looks straightforward until you start talking about "high level" and "low level" x2's. You have to say at what value of x2 you make that distinction. Then you will need to generate a new indicator variable that distinguishes high and low values of x2, and interact x1 with that, instead of with x2 itself, in your -xtgls- command. Similarly your -margins- command will have to use that variable. So it will look something like this (I am just making up 0.5 as the cutoff between low and high x2's):

    Code:
    gen x2_high = x2 > 0.5 if !missing(x2)
    xtgls y c.x1##i.x2_high x3 x4
    
    
    margins x2_high, at(x1 =  (-0.36 (0.1) 0.06)) atmeans
    marginsplot, name(plot1, replace)
    
    margins, dydx(x2_high) at(x1 = (-0.36 (0.1) 0.06)) atmeans
    marginsplot, name(plot2, replace)

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Well, everything looks straightforward until you start talking about "high level" and "low level" x2's. You have to say at what value of x2 you make that distinction. Then you will need to generate a new indicator variable that distinguishes high and low values of x2, and interact x1 with that, instead of with x2 itself, in your -xtgls- command. Similarly your -margins- command will have to use that variable. So it will look something like this (I am just making up 0.5 as the cutoff between low and high x2's):

      Code:
      gen x2_high = x2 > 0.5 if !missing(x2)
      xtgls y c.x1##i.x2_high x3 x4
      
      
      margins x2_high, at(x1 = (-0.36 (0.1) 0.06)) atmeans
      marginsplot, name(plot1, replace)
      
      margins, dydx(x2_high) at(x1 = (-0.36 (0.1) 0.06)) atmeans
      marginsplot, name(plot2, replace)
      Thank you so much for your kind and great help Clyde!

      Let me try first to see how it works...

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Well, everything looks straightforward until you start talking about "high level" and "low level" x2's. You have to say at what value of x2 you make that distinction. Then you will need to generate a new indicator variable that distinguishes high and low values of x2, and interact x1 with that, instead of with x2 itself, in your -xtgls- command. Similarly your -margins- command will have to use that variable. So it will look something like this (I am just making up 0.5 as the cutoff between low and high x2's):

        Code:
        gen x2_high = x2 > 0.5 if !missing(x2)
        xtgls y c.x1##i.x2_high x3 x4
        
        
        margins x2_high, at(x1 = (-0.36 (0.1) 0.06)) atmeans
        marginsplot, name(plot1, replace)
        
        margins, dydx(x2_high) at(x1 = (-0.36 (0.1) 0.06)) atmeans
        marginsplot, name(plot2, replace)
        Hi, Clyde.

        The codes work. Amazing! Thanks for the great help!

        May I ask a few more things:

        (1) The y-axis in the output is labeled as "Fitted Values" ranged from -2 to 1 in plot 1, and as "Effects on fitted values" ranged from -1 to 3 in plot 2. The ranged numbers here are not the same as the dependent variable y. How to understand and interpret this "fitted values"?

        (2) How to change the labels for x-axis, y-axis, lines, and title in the figure?

        (3) How to use another form to mark one of the line in plot1? The original output distinguishes line 1 and line 2 with different colors, and both lines are marked with little nodes at the points of intervals. May I use tiny squares to replace the little nodes in one of the lines? So the two lines can be clearly differentiated when printed in black and while.

        (4) How to show the CI ranges of each intervals with shaded colors and simultaneously keep the line in that shaded area?

        (5) Is it possible to integrate these two figures into one?

        (6) How to export the figures into a word file?

        Thank you so much for your kind attention!


        Comment


        • #5
          (1) You can re-title the axes to your liking by adding -xtitle()- and -ytitle()- options to the -marginsplot- command, just as you would with any -twoway- Stata graph. As for the different scales on the vertical axis, that's because in the second graph you are plotting the difference between the values in the first graph. So you could have two lines in the first graph that are up in, say, the 90-100 range, but the differences between them would range from -10 to 10.

          (2) Exactly the same way you would with any other Stata -twoway- graph. -marginsplot- supports -graph twoway- options.

          (3) Same answer as (2). Do it exactly the same way you would with any Stata -twoway- graph.

          (4) I don't know. I've seen graphs like this produced by others, but it's not something I've ever wanted to do, so I've never learned how it's done.

          (5) -help graph combine-.

          (6) Again, this is something I don't do. But in this case I know how it is done. Use -graph export- to save the graphs as .png, or .jpg or .emf or .wmf or .tif. Then within Microsoft Word you can import them as pictures. Or, you can also use the Stata -putdocx image- command to add them to a Microsoft Word document from within Stata.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            (1) You can re-title the axes to your liking by adding -xtitle()- and -ytitle()- options to the -marginsplot- command, just as you would with any -twoway- Stata graph. As for the different scales on the vertical axis, that's because in the second graph you are plotting the difference between the values in the first graph. So you could have two lines in the first graph that are up in, say, the 90-100 range, but the differences between them would range from -10 to 10.

            (2) Exactly the same way you would with any other Stata -twoway- graph. -marginsplot- supports -graph twoway- options.

            (3) Same answer as (2). Do it exactly the same way you would with any Stata -twoway- graph.

            (4) I don't know. I've seen graphs like this produced by others, but it's not something I've ever wanted to do, so I've never learned how it's done.

            (5) -help graph combine-.

            (6) Again, this is something I don't do. But in this case I know how it is done. Use -graph export- to save the graphs as .png, or .jpg or .emf or .wmf or .tif. Then within Microsoft Word you can import them as pictures. Or, you can also use the Stata -putdocx image- command to add them to a Microsoft Word document from within Stata.
            Get it.

            I will try to see how it works...

            Thank you so much for your great help Clyde!!!

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              (1) You can re-title the axes to your liking by adding -xtitle()- and -ytitle()- options to the -marginsplot- command, just as you would with any -twoway- Stata graph. As for the different scales on the vertical axis, that's because in the second graph you are plotting the difference between the values in the first graph. So you could have two lines in the first graph that are up in, say, the 90-100 range, but the differences between them would range from -10 to 10.

              (2) Exactly the same way you would with any other Stata -twoway- graph. -marginsplot- supports -graph twoway- options.

              (3) Same answer as (2). Do it exactly the same way you would with any Stata -twoway- graph.

              (4) I don't know. I've seen graphs like this produced by others, but it's not something I've ever wanted to do, so I've never learned how it's done.

              (5) -help graph combine-.

              (6) Again, this is something I don't do. But in this case I know how it is done. Use -graph export- to save the graphs as .png, or .jpg or .emf or .wmf or .tif. Then within Microsoft Word you can import them as pictures. Or, you can also use the Stata -putdocx image- command to add them to a Microsoft Word document from within Stata.
              Hi Clyde, thanks for your help.

              I have tried to refine the figure on the editor of Stata in terms of the labels. But here is one question I still feel confused and look for your opinion.

              The range of the dependent variable y is from 0 to 4.29, with mean valued at 0.81.

              The interaction plot by Stata shows "fitted values" of y, ranged from -2 to 1.

              What is the fitted values? How are the fitted values related to the values of y?

              Comment


              • #8
                The fitted values are the model predictions of y, not including any estimate of the error term. They are calculated as the sum of the products of the x variables with their corresponding coefficients. Putting it more algebraically, the -xtgls- model is y = b0 + b1 x1 + b2 x2 + b3 x3 + b4x4 + eps-. The fitted value is just the b0 + b1x1 + b2x2 + b3x3 + b4x4 part.

                Because these do not include any estimates of the error term, the range of these fitted values will normally be smaller than the range of y itself. In your case, they also seem to be skewed a bit to the left of the y distribution. This may be due to your decision to set the variables other than x1 at their means. If the distributions of the x1 variables are skewed, then the means will not really be central values and setting their values to the means for calculating the fitted values will result in values that are off the center of the y distribution.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  The fitted values are the model predictions of y, not including any estimate of the error term. They are calculated as the sum of the products of the x variables with their corresponding coefficients. Putting it more algebraically, the -xtgls- model is y = b0 + b1 x1 + b2 x2 + b3 x3 + b4x4 + eps-. The fitted value is just the b0 + b1x1 + b2x2 + b3x3 + b4x4 part.

                  Because these do not include any estimates of the error term, the range of these fitted values will normally be smaller than the range of y itself. In your case, they also seem to be skewed a bit to the left of the y distribution. This may be due to your decision to set the variables other than x1 at their means. If the distributions of the x1 variables are skewed, then the means will not really be central values and setting their values to the means for calculating the fitted values will result in values that are off the center of the y distribution.
                  Dear Clyde,

                  Thank you so much for this clarification. They are very inspiring and helpful!

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Well, everything looks straightforward until you start talking about "high level" and "low level" x2's. You have to say at what value of x2 you make that distinction. Then you will need to generate a new indicator variable that distinguishes high and low values of x2, and interact x1 with that, instead of with x2 itself, in your -xtgls- command. Similarly your -margins- command will have to use that variable. So it will look something like this (I am just making up 0.5 as the cutoff between low and high x2's):

                    Code:
                    gen x2_high = x2 > 0.5 if !missing(x2)
                    xtgls y c.x1##i.x2_high x3 x4
                    
                    
                    margins x2_high, at(x1 = (-0.36 (0.1) 0.06)) atmeans
                    marginsplot, name(plot1, replace)
                    
                    margins, dydx(x2_high) at(x1 = (-0.36 (0.1) 0.06)) atmeans
                    marginsplot, name(plot2, replace)
                    Dear Clyde,

                    Sorry to disturb you again.

                    I hope to learn from you for one more question: how to use Mean+ one SD for high and mean-one SD for low as the distinguished line for x2 in the above example?

                    I calculated the values for Mean+ one SD and Mean-one SD and describe them as x2_high and x2_low respectively. Since the margins code following only contains x2_high, I don't know how to proceed with the x2_high and x2_low together...It is expected to also generate two lines in one figure to see the interaction effect.

                    Thank you so much!


                    Comment


                    • #11
                      So, if I understand you, you are now slicing the range of x3 into three parts, one is below mean - 1SD, the next is between mean - 1SD and mean + 1SD, and the last is above mean + SD. And you have two indicator variables, x2_low and x2_high that indicate the lowest and highest slices. So then it's just a minor modification

                      Code:
                      xtgls y c.x1##i.(x2_high x2_low) x3 x4
                      
                      
                      margins x2_high x2_low, at(x1 = (-0.36 (0.1) 0.06)) atmeans
                      marginsplot, name(plot1, replace)
                      
                      margins, dydx(x2_high x2_low) at(x1 = (-0.36 (0.1) 0.06)) atmeans
                      marginsplot, name(plot2, replace)

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        So, if I understand you, you are now slicing the range of x3 into three parts, one is below mean - 1SD, the next is between mean - 1SD and mean + 1SD, and the last is above mean + SD. And you have two indicator variables, x2_low and x2_high that indicate the lowest and highest slices. So then it's just a minor modification

                        Code:
                        xtgls y c.x1##i.(x2_high x2_low) x3 x4
                        
                        
                        margins x2_high x2_low, at(x1 = (-0.36 (0.1) 0.06)) atmeans
                        marginsplot, name(plot1, replace)
                        
                        margins, dydx(x2_high x2_low) at(x1 = (-0.36 (0.1) 0.06)) atmeans
                        marginsplot, name(plot2, replace)
                        Thank you so much Clyde!

                        I tried the code and obtained the first figure with four lines. How can I do if the first figure is expected to have only two lines (one for x1*x2_high, the other for x1*x2_low, x3 and x4 at mean)?

                        Many thanks!

                        Comment


                        • #13
                          Instead of using separate x2_low and x2_high variables, create a new variable x2_cat with 3 levels (low, medium, and high) and run the regression with the medium category as base. Then run the -margins- command. So, like this:
                          Code:
                          gen x2_cat = 0 if !missing(x2)
                          replace x2_cat = 1 if x2_low == 1
                          replace x2_cat = 2 if x2_high == 2
                          label define x2_cat 0 "medium" 1 "low" 2 "high"
                          label values x2_cat x2_cat
                          
                          xtgls y c.x1##i.x2_cat x3 x4
                          margins  at(x2_cat = (1 2) x1 = (-0.36 (0.1) 0.06)) atmeans
                          marginsplot, name(plot1, replace)


                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            Instead of using separate x2_low and x2_high variables, create a new variable x2_cat with 3 levels (low, medium, and high) and run the regression with the medium category as base. Then run the -margins- command. So, like this:
                            Code:
                            gen x2_cat = 0 if !missing(x2)
                            replace x2_cat = 1 if x2_low == 1
                            replace x2_cat = 2 if x2_high == 2
                            label define x2_cat 0 "medium" 1 "low" 2 "high"
                            label values x2_cat x2_cat
                            
                            xtgls y c.x1##i.x2_cat x3 x4
                            margins at(x2_cat = (1 2) x1 = (-0.36 (0.1) 0.06)) atmeans
                            marginsplot, name(plot1, replace)

                            Thank you Clyde!

                            A few more questions I need to learn from you for a better clarification and understanding.

                            x2_high >= mean+ 1SD
                            x2_low <= mean-1SD
                            "replace x2_cat = 1 if x2_low == 1 replace x2_cat = 2 if x2_high == 2" 1. Here the "x2_low == 1" means? Is 1 refers to the value of "mean-1SD"? Does it include the situation of x2_low<mean-1SD? 2. What does "label values x2_cat x2_cat" mean? 3. For "gen x2_cat = 0 if !missing(x2)", does it mean that when the value of x2 is missing, x2_cat=0?

                            Comment


                            • #15
                              The variable x2_high was defined in #2 by -gen x2_high = (x2 > 0.5) if !missing(x2)-, and you were advised to change that to match whatever actual cutoff between high and non-high you intended. I see now that you chose to make the cut off 1 SD above the mean. So presumably you did something like this to create x2_high:
                              Code:
                              summ x2
                              gen x2_high = (x2 > `r(mean)' + `r(sd)'
                              That creates a variable that takes the values 1 when x2 is greater than mean + 1 SD, and 0 in all other observations (or missing if x2 is missing). Similarly for x2_low, but this time 1 when x2 is less than mean - SD, etc. So the code

                              Code:
                              replace x2_cat = 1 if x2_low == 1
                              will set x2_cat = 1 in precisely those observations where x2 < mean - 1 SD.

                              The code -replace x2_cat = 2 if x2_high == 2- was an error on my part. I meant to say
                              Code:
                              replace x2_cat = 2 if x2_high == 1
                              so that x2_cat would, after that, be 1 when x2 is less than mean - 1SD, 2 when x2 is greater than mean + 1SD, and 0 for all other non-missing values of x2.

                              What does "label values x2_cat x2_cat" mean?
                              That statement occurs just after the command which defines a value label named x2_cat (I chose that name deliberately to match the name of the variable I planned to use it with). -label values x2_cat x2_cat- then tells Stata to apply the label x2_cat to the variable x2_cat. That means that when the variable x2_cat is -list-ed or shown in the browser/editor, or when one of its values marks an indicator ("dummy") variable in a regression table, it will be shown as "medium", "low", or "high" instead of 0, 1, or 2, repsectively. That means you can read the output without having to remember what the 0, 1, and 2 codes stood for. This is a very useful thing to do with categorical variables, all the more so if the ordering of the underlying numbers 0, 1, and 2 does not correspond to a normal ordering of the words! It's not crucial to the calculations at all--and if it makes you uncomfortable you can skip it; but then you'll find the outputs harder to read because you'll have to constantly remind yourself that 0 = mid, 1 = low, and 2 = high.

                              For "gen x2_cat = 0 if !missing(x2)", does it mean that when the value of x2 is missing, x2_cat=0?
                              No, it means just the opposite. The ! operator in Stata is the logical not operator. So it sets x2_cat to 0 precisely when x2 is not missing. (And then, the next two lines of code change it to 1 or 2 if x2_low or x2_high is equal to 1.

                              Comment

                              Working...
                              X