How to test an interaction from a correlation matrix (if possible at all)?

Christophe Haon

Join Date: Apr 2017
Posts: 12

How to test an interaction from a correlation matrix (if possible at all)?

21 Sep 2018, 03:23

Hi,

I am trying to figure how —if possible— to test an interaction while I do not have data, just a correlation matrix.

What I have tried is to simulate data with the same correlational structure, calculate the product term, estimate, and repeat 1,000 times. In order to check the soundness of the approach, I have worked on an example to compare the results from the analysis of a dataset and from the corresponding correlation matrix. The results are inconsistent as you can see if you run the syntax below ('Step 2' section) and if you compare the results to those reported on the webpage which url is in the 'Step 1' section.

Any idea what I am doing wrong? Is the entire approach wrong or can it be fixed? How?

Thanks,
Christophe

Code:

********************************
* Step 1: Working from dataset *
********************************

* A worked example is available here: https://stats.idre.ucla.edu/stata/faq/how-can-i-explain-a-continuous-by-continuous-interaction-stata-12/

* NB: cannot replicate because the dataset is no longer available at the mentioned url


*******************************************
* Step 2: Working from correlation matrix *
*******************************************


* Set up of the steps to be repeated for the simulation in a program
program myprogram, rclass
    * drop of all variables to create an empty dataset
    drop _all
    * creation of a vector that contains the equivalent of a lower triangular correlation matrix
    matrix c = (1, 0.5445, 1, 0.6215, 0.6623, 1)
    
    * drawing of a sample of 1000 cases from a normal distribution with specified correlation structure (by default, means = 0 and s.d. = 1)
    drawnorm X W Y, n(1000) corr(c) cstorage(lower)

    * X refers to 'socst'
    * W refers to 'math'
    * Y refers to read

    * model estimation
    quietly summarize W
    global m=r(mean)
    global s=r(sd)
    capture generate WX=W*X

    sem (X W WX -> Y), standardized nocapslatent level(90)

    return scalar X_on_Y = [Y]_b[X]
    return scalar W_on_Y = [Y]_b[W]
    return scalar WX_on_Y = [Y]_b[WX]
    
    describe *
end

* use the simulate command to rerun myprogram 1000 times
* collect the betas (_b) and standard errors (_se) from the sem each time
simulate X_on_Y = [Y]_b[X] W_on_Y = [Y]_b[W] WX_on_Y = [Y]_b[WX], reps(1000) nodots: myprogram

describe *

summarize

ci mean X_on_Y W_on_Y WX_on_Y, level(90)

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

21 Sep 2018, 11:06

When I go to the url in Step 1, I just get a message that the page no longer exists and am automatically redirected to the UCLA Statistics home page. So I can't compare what you're getting to what they got to see what might be wrong.

But I think I see the problem.

I think that what you are accessing in your -simulate- command are the unstandardized coefficients and you are trying to compare them to standardized ones.

If you run

Code:

sysuse auto, clear sem (price <- mpg headroom), standardized display [price]_b[mpg] [price]_b[headroom]

you will see that. In order to collect the standardized coefficients you have to pull them from the matrix e(b_std), not from _b[].

Is that it?

Added: By the way, I'm not sure why you're trying to simulate in this way instead of just using -sem-'s -ssd- options.

Last edited by Clyde Schechter; 21 Sep 2018, 11:09.
Comment
Christophe Haon

Join Date: Apr 2017

Posts: 12
#3

23 Sep 2018, 03:57

Thank you.

The url in Step 1 works for me. You can try http://bit.ly/2DnvixN instead.

I am unclear how to edit my code in order to pull the coefficients from the matrix e(b_std). Could you please show me an example?

The reason why I am simulating this way is that I need to create interaction terms, which I cannot do if all I have is the correlation matrix.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

23 Sep 2018, 12:51

Code:

clear*

* Set up of the steps to be repeated for the simulation in a program
program myprogram, rclass
    * drop of all variables to create an empty dataset
    drop _all
    * creation of a vector that contains the equivalent of a lower triangular correlation matrix
    matrix c = (1, 0.5445, 1, 0.6215, 0.6623, 1)
    
    * drawing of a sample of 1000 cases from a normal distribution with specified correlation structure (by default, means = 0 and s.d. = 1)
    drawnorm X W Y, n(1000) corr(c) cstorage(lower)

    * X refers to 'socst'
    * W refers to 'math'
    * Y refers to read

    * model estimation
    quietly summarize W
    global m=r(mean)
    global s=r(sd)
    capture generate WX=W*X

    sem (X W WX -> Y), standardized nocapslatent level(90)
    matrix B = e(b_std)

    return scalar X_on_Y = B[1,1]
    return scalar W_on_Y = B[1, 2]
    return scalar WX_on_Y = B[1, 3]
    
    describe *
end

* use the simulate command to rerun myprogram 1000 times
* collect the betas (_b) and standard errors (_se) from the sem each time
simulate X_on_Y = r(X_on_Y) W_on_Y = r(W_on_Y) WX_on_Y = r(WX_on_Y), reps(1000) nodots: myprogram

describe *

summarize

ci mean X_on_Y W_on_Y WX_on_Y, level(90)

Added code and changes in bold face.

You may be wondering how I knew which elements of B to extract. I got that by running a single instance of the -sem- command and then -matrix list e(b_std)-. That showed me which coefficients are in which cells of e(b_std). (There is also a way to access these elements by names instead of numbers, but it involves a bit of local macro manipulation and, for this problem at least, it did not seem worth the trouble.

You new link takes me to a working page, but that page does not appear to be relevant to the current problem, unless I am missing something. Anyway, I hope this helps.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 4987
#5

23 Sep 2018, 13:13

Is this the original data set you are looking for? If so, this command works fine for me.

use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear

I haven't read all of the above posts carefully, but I think in order to test an interaction, the correlation with the interaction has to be included in the correlation matrix. A simulated data set is just one of an infinite number of data sets that will reproduce a correlation matrix. If you try to compute interaction terms, squared terms, log terms, or whatever, it won't work. I discuss these issues on pp. 8-10 of

https://www3.nd.edu/~rwilliam/stats2/OLS-Stata9.pdf

That is IF your goal is to replicate UCLA. You could create a new example with the correlational structure you want.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#6

23 Sep 2018, 13:25

Richard Williams, as always, makes an excellent point. I ought to have realized that myself.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#7

23 Sep 2018, 13:39

I once had a student complain that, with a simulated data set, she was getting values like .471, 583, .127, etc. for race, which is supposed to be a binary variable. I told her to read my handout on how Stata creates simulated data from a correlation matrix. She said she already understood how Stata worked, but she didn't understand why race had values like .471, .583, etc. I told her to trust me when I said she really really really needed to read the handout. After that she finally got it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Christophe Haon

Join Date: Apr 2017

Posts: 12
#8

24 Sep 2018, 01:22

Many thanks to both of you, for teaching me how to collect the standardized coefficients and for pointing the inappropriateness of the approach.
Am I correct if conclude that interactions cannot be estimated when all you have is a correlation matrix and the correlations with the product term are not in there?
Comment

Announcement

How to test an interaction from a correlation matrix (if possible at all)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment