Acceptable correlation matrix structures for drawnorm?

Jonathan Karver

Join Date: Nov 2014

Posts: 11
#1

Acceptable correlation matrix structures for drawnorm?

18 Jun 2015, 07:52

I am trying to draw a normally distributed sample of observations with an underlying correlation structure (and standard deviation structure) from 40 variables. All of the examples I have found using drawnorm make use of at most 3 or 4 variables, meaning that manually defining a correlation matrix is pretty easy. I need to create 40 variables and define every possible correlation between unique pairs of two. That is, (40*39)/2=780 pairs. I want the correlations to fall between the interval [-0.5,0.5], so my understanding was that I should(could) do this with mkmat, and I could use this vector (1x820) directly in the drawnorm code. My understanding was that if I defined the 820 parameters in "correct" order (740 unique rho's+40 diagonal elements equal to 1) I could achieve this without having to define the mirrored rho's (780, since 40x40=1600). Maybe I am misunderstanding how Stata reads correlation matrices, but I have tried giving Stata the full 1600, the 820, and the 780 elements, and the error keeps showing up as "corr() incorrectly specified, diagonal elements should be 1". If I define the vector with any other number of elements it tells me that, based on the 40 variables, I must report of vector of 1x820. This might be a trivial issue, but hopefully someone has a suggestion that goes beyond -help matrix- since I could not find much there. My understanding is that the code for the standard deviations should be more basic (which is why it is in the code but not in my question), and since it is the standard deviation between pairs (from the sum of both) would only consists of 780 elements instead of 820.

Code (my method of defining the diagonal elements is elementary but I believe without errors...I am not very good at defining sequences. I was a liberal arts major, after all) (version 13.1):

Code:

** Create a simulated dataset //This needs to create a dataset with underlying correlations between variables (pairs) //(40*39)/2 = 780 pairs clear set obs 820 gen double u = (0.5-(-0.5))*runiform() + -0.5 //creates 820 unique rho's in [-0.5,0.5]0 sort u foreach i in 1 41 80 118 155 191 226 260 293 325 356 386 415 443 470 496 /// 521 545 568 590 611 631 650 668 685 701 716 730 743 755 766 776 /// 785 793 800 806 811 815 818 820 { replace u=1 if _n==`i' } mkmat u matrix c = u' clear set obs 780 gen double s=(.85-.05)*runiform() + .05 mkmat s matrix sd = s' * draw a sample of 3000 cases from a normal distribution with specified correlation structure * and specified means and standard deviations drawnorm ps1-ps40, n(3000) corr(c) cstorage(lower) sds(sd) clear

I found an example (somewhere) that accepted a correlation matrix as a 1xn vector with the diagonal elements specified correctly (but excluding the mirrored rho's), but maybe I am understanding this wrong. Must a correlation matrix always be defined as an nxn matrix? Thanks in advance for your help!
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5008
#2

18 Jun 2015, 11:12

I don't think you are assigning the ones correctly. According to the help for the cstorage(lower) option of corr2data, the order of the entries should be

C(11) C(21) C(22) C(31) C(32) C(33) ... C(k1) C(k2) ... C(kk)

So, the ones would be in entries 1, 3, 6, 10, 15, etc. I haven't checked the rest of your code.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#3

18 Jun 2015, 11:15

Also if there are 40 variables there should be 40 standard deviations.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jonathan Karver

Join Date: Nov 2014

Posts: 11
#4

18 Jun 2015, 11:45

Thanks, Richard. Your first comment makes sense, and I believe the matrix structure should be the same for drawnorm as for corr2data, so I will try this.

As it relates to the standard deviations, I am trying to generate the standard deviations from the sums of the pairs (vari+varj), not from the individual elements themselves, which is why I assumed this would be 780. But this reminds me that drawnorm does not know what I am trying to do, so I will keep it at 40 as you suggest. After all, the underlying variance of the data is with respect to the variables to be generated, not some idea I have in my head about what I want to do. Thanks for poiting this out...I have no idea how I let that slip...
Comment
Jonathan Karver

Join Date: Nov 2014

Posts: 11
#5

18 Jun 2015, 12:10

So I made the changes to the code, and the new error Stata gives me is that the matrix is "not positive (semi) definite". It has been a while, but I understand this to be related to the symmetry of the matrix or the non-negativitity of the rhos (I need both positive and negative correlations between the interval I defined). However, Stata tells me that the vector should be 820 elements, not 1600 (1560), so the symmetry is not being addressed here. Maybe I need to go back to Schaum's...
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

18 Jun 2015, 13:23

I might be tempted to do something like click on data/matrices, ado language/ input matrices by hand. Then enter numbers between -.5 and .5. You could still have problems with the matrix but you could tweak by hand.

Or, create a few vars with drawnorm, and then generate the others using the already existing variables. if correlations go out of range you could tweak the variances of the error terms. But if vars are generated from other vars the matrix should be positive definite.

I am not sure why anybody would want to do this in the first place but I assume you have your reasons!

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

Acceptable correlation matrix structures for drawnorm?

Comment

Comment

Comment

Comment

Comment