Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to test an interaction from a correlation matrix (if possible at all)?

    Hi,

    I am trying to figure how —if possible— to test an interaction while I do not have data, just a correlation matrix.

    What I have tried is to simulate data with the same correlational structure, calculate the product term, estimate, and repeat 1,000 times. In order to check the soundness of the approach, I have worked on an example to compare the results from the analysis of a dataset and from the corresponding correlation matrix. The results are inconsistent as you can see if you run the syntax below ('Step 2' section) and if you compare the results to those reported on the webpage which url is in the 'Step 1' section.

    Any idea what I am doing wrong? Is the entire approach wrong or can it be fixed? How?

    Thanks,
    Christophe

    Code:
    ********************************
    * Step 1: Working from dataset *
    ********************************
    
    * A worked example is available here: https://stats.idre.ucla.edu/stata/faq/how-can-i-explain-a-continuous-by-continuous-interaction-stata-12/
    
    * NB: cannot replicate because the dataset is no longer available at the mentioned url
    
    
    *******************************************
    * Step 2: Working from correlation matrix *
    *******************************************
    
    
    * Set up of the steps to be repeated for the simulation in a program
    program myprogram, rclass
        * drop of all variables to create an empty dataset
        drop _all
        * creation of a vector that contains the equivalent of a lower triangular correlation matrix
        matrix c = (1, 0.5445, 1, 0.6215, 0.6623, 1)
        
        * drawing of a sample of 1000 cases from a normal distribution with specified correlation structure (by default, means = 0 and s.d. = 1)
        drawnorm X W Y, n(1000) corr(c) cstorage(lower)
    
        * X refers to 'socst'
        * W refers to 'math'
        * Y refers to read
    
        * model estimation
        quietly summarize W
        global m=r(mean)
        global s=r(sd)
        capture generate WX=W*X
    
        sem (X W WX -> Y), standardized nocapslatent level(90)
    
        return scalar X_on_Y = [Y]_b[X]
        return scalar W_on_Y = [Y]_b[W]
        return scalar WX_on_Y = [Y]_b[WX]
        
        describe *
    end
    
    * use the simulate command to rerun myprogram 1000 times
    * collect the betas (_b) and standard errors (_se) from the sem each time
    simulate X_on_Y = [Y]_b[X] W_on_Y = [Y]_b[W] WX_on_Y = [Y]_b[WX], reps(1000) nodots: myprogram
    
    describe *
    
    summarize
    
    ci mean X_on_Y W_on_Y WX_on_Y, level(90)

  • #2
    When I go to the url in Step 1, I just get a message that the page no longer exists and am automatically redirected to the UCLA Statistics home page. So I can't compare what you're getting to what they got to see what might be wrong.

    But I think I see the problem.

    I think that what you are accessing in your -simulate- command are the unstandardized coefficients and you are trying to compare them to standardized ones.

    If you run
    Code:
    sysuse auto, clear
    sem (price <- mpg headroom), standardized
    display [price]_b[mpg] [price]_b[headroom]
    you will see that. In order to collect the standardized coefficients you have to pull them from the matrix e(b_std), not from _b[].

    Is that it?

    Added: By the way, I'm not sure why you're trying to simulate in this way instead of just using -sem-'s -ssd- options.


    Last edited by Clyde Schechter; 21 Sep 2018, 11:09.

    Comment


    • #3
      Thank you.

      The url in Step 1 works for me. You can try http://bit.ly/2DnvixN instead.

      I am unclear how to edit my code in order to pull the coefficients from the matrix e(b_std). Could you please show me an example?

      The reason why I am simulating this way is that I need to create interaction terms, which I cannot do if all I have is the correlation matrix.

      Comment


      • #4
        Code:
        clear*
        
        * Set up of the steps to be repeated for the simulation in a program
        program myprogram, rclass
            * drop of all variables to create an empty dataset
            drop _all
            * creation of a vector that contains the equivalent of a lower triangular correlation matrix
            matrix c = (1, 0.5445, 1, 0.6215, 0.6623, 1)
            
            * drawing of a sample of 1000 cases from a normal distribution with specified correlation structure (by default, means = 0 and s.d. = 1)
            drawnorm X W Y, n(1000) corr(c) cstorage(lower)
        
            * X refers to 'socst'
            * W refers to 'math'
            * Y refers to read
        
            * model estimation
            quietly summarize W
            global m=r(mean)
            global s=r(sd)
            capture generate WX=W*X
        
            sem (X W WX -> Y), standardized nocapslatent level(90)
            matrix B = e(b_std)
        
            return scalar X_on_Y = B[1,1]
            return scalar W_on_Y = B[1, 2]
            return scalar WX_on_Y = B[1, 3]
            
            describe *
        end
        
        * use the simulate command to rerun myprogram 1000 times
        * collect the betas (_b) and standard errors (_se) from the sem each time
        simulate X_on_Y = r(X_on_Y) W_on_Y = r(W_on_Y) WX_on_Y = r(WX_on_Y), reps(1000) nodots: myprogram
        
        describe *
        
        summarize
        
        ci mean X_on_Y W_on_Y WX_on_Y, level(90)
        Added code and changes in bold face.

        You may be wondering how I knew which elements of B to extract. I got that by running a single instance of the -sem- command and then -matrix list e(b_std)-. That showed me which coefficients are in which cells of e(b_std). (There is also a way to access these elements by names instead of numbers, but it involves a bit of local macro manipulation and, for this problem at least, it did not seem worth the trouble.

        You new link takes me to a working page, but that page does not appear to be relevant to the current problem, unless I am missing something. Anyway, I hope this helps.

        Comment


        • #5
          Is this the original data set you are looking for? If so, this command works fine for me.

          use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear

          I haven't read all of the above posts carefully, but I think in order to test an interaction, the correlation with the interaction has to be included in the correlation matrix. A simulated data set is just one of an infinite number of data sets that will reproduce a correlation matrix. If you try to compute interaction terms, squared terms, log terms, or whatever, it won't work. I discuss these issues on pp. 8-10 of

          https://www3.nd.edu/~rwilliam/stats2/OLS-Stata9.pdf

          That is IF your goal is to replicate UCLA. You could create a new example with the correlational structure you want.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Richard Williams, as always, makes an excellent point. I ought to have realized that myself.

            Comment


            • #7
              I once had a student complain that, with a simulated data set, she was getting values like .471, 583, .127, etc. for race, which is supposed to be a binary variable. I told her to read my handout on how Stata creates simulated data from a correlation matrix. She said she already understood how Stata worked, but she didn't understand why race had values like .471, .583, etc. I told her to trust me when I said she really really really needed to read the handout. After that she finally got it.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Many thanks to both of you, for teaching me how to collect the standardized coefficients and for pointing the inappropriateness of the approach.
                Am I correct if conclude that interactions cannot be estimated when all you have is a correlation matrix and the correlations with the product term are not in there?

                Comment

                Working...
                X