I am trying to draw a normally distributed sample of observations with an underlying correlation structure (and standard deviation structure) from 40 variables. All of the examples I have found using drawnorm make use of at most 3 or 4 variables, meaning that manually defining a correlation matrix is pretty easy. I need to create 40 variables and define every possible correlation between unique pairs of two. That is, (40*39)/2=780 pairs. I want the correlations to fall between the interval [-0.5,0.5], so my understanding was that I should(could) do this with mkmat, and I could use this vector (1x820) directly in the drawnorm code. My understanding was that if I defined the 820 parameters in "correct" order (740 unique rho's+40 diagonal elements equal to 1) I could achieve this without having to define the mirrored rho's (780, since 40x40=1600). Maybe I am misunderstanding how Stata reads correlation matrices, but I have tried giving Stata the full 1600, the 820, and the 780 elements, and the error keeps showing up as "corr() incorrectly specified, diagonal elements should be 1". If I define the vector with any other number of elements it tells me that, based on the 40 variables, I must report of vector of 1x820. This might be a trivial issue, but hopefully someone has a suggestion that goes beyond -help matrix- since I could not find much there. My understanding is that the code for the standard deviations should be more basic (which is why it is in the code but not in my question), and since it is the standard deviation between pairs (from the sum of both) would only consists of 780 elements instead of 820.
Code (my method of defining the diagonal elements is elementary but I believe without errors...I am not very good at defining sequences. I was a liberal arts major, after all) (version 13.1):
I found an example (somewhere) that accepted a correlation matrix as a 1xn vector with the diagonal elements specified correctly (but excluding the mirrored rho's), but maybe I am understanding this wrong. Must a correlation matrix always be defined as an nxn matrix? Thanks in advance for your help!
Code (my method of defining the diagonal elements is elementary but I believe without errors...I am not very good at defining sequences. I was a liberal arts major, after all) (version 13.1):
Code:
** Create a simulated dataset //This needs to create a dataset with underlying correlations between variables (pairs) //(40*39)/2 = 780 pairs clear set obs 820 gen double u = (0.5-(-0.5))*runiform() + -0.5 //creates 820 unique rho's in [-0.5,0.5]0 sort u foreach i in 1 41 80 118 155 191 226 260 293 325 356 386 415 443 470 496 /// 521 545 568 590 611 631 650 668 685 701 716 730 743 755 766 776 /// 785 793 800 806 811 815 818 820 { replace u=1 if _n==`i' } mkmat u matrix c = u' clear set obs 780 gen double s=(.85-.05)*runiform() + .05 mkmat s matrix sd = s' * draw a sample of 3000 cases from a normal distribution with specified correlation structure * and specified means and standard deviations drawnorm ps1-ps40, n(3000) corr(c) cstorage(lower) sds(sd) clear
Comment