Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble generating random, *large* correlation matrices

    Hi all,

    I've written an .ado file to randomly generate a correlation matrix, in order to draw correlated vectors using drawnorm. (I looked for existing code doing this, but couldn't find anything.) I realize that for just a few vectors, it's easy to simply write in a made-up correlation matrix. But I would like to draw about 100 vectors, each standard normals, and all correlated with one another in a random fashion. It turns out that this task is much more difficult than I thought.

    I have posted the code below, first for the code that uses my randcorr function, and then for randcorr.ado, which is basically just the mata sub-routine. And it works, for small correlation matrices... up to about 40x40. Diagonals are all 1, and correlation coefficients (off-diags) fall between -0.4 and 0.4, drawn from a normal distribution. But here is the problem: certain, randomly drawn correlation matrices will NOT be positive semi-definite, even though they are symmetric, full rank, and have all off-diagonal values between -1 and 1. They look like correlation matrixes, but they are not.

    I dealt with this problem by checking the eigenvalues, after the construction of the correlation matrix, within the mata sub-routine of my .ado file. If the smallest eigenvalue is less than 0, I draw a new correlation matrix, so that the final correlation matrix handed back to randcorr will be PSD. That works... but as the matrix gets bigger (my parameter N) or as the vectors become more collinear (my parameter s, set in the code to be 0.1), the mata sub-routine has to draw more and more matrices before it finds a PSD one.

    So, if you run the code below with N=10 through N=40, it runs quite quickly. If you run the code w/ N=43 it takes a bit, and N=45 goes forever. I suspect the non-PSD problem relates to the transitiveness of correlations... the 43rd correlation coefficient is surely defined largely by the 1st-42rd correlation coefficients, in a way that I am not modeling.

    I'm wondering 2 things. (1) Has somebody else already written such a user command, such that I can give up my own effort on the matter? (2) If not, does anyone have ideas on how I can better generate the random correlation matrix, either in terms of better values (more mathematically correct) or just faster (so that I can more quickly go through all the non-PSD matrices).

    Thanks! Code below, and randcorr.ado below that.

    Code:
    clear
    clear all
    set more off
    set matsize 10000
    
    local N = 10            /* set number of IVs / size of corr matrix */
    matrix M = J(1,`N',0)    /* vector of means (all IVs centered at 0) */
    randcorr `N' .1            /* correlation matrix generated by randcorr.ado */
    
    global Z z1             /* global for all IVs to be drawn */
    forval i = 2/`N' {
        global Z $Z z`i'
    }
    
    ** Draw the N IVs, according to correlation matrix and mean vector
    drawnorm $Z , n(1000) corr(corr) means(M)

    Code:
    program randcorr
    version 14.2
        args n s
        mata: myfunction(`n',`s')
    
    end
    
    version 14.2
    mata:
    void myfunction(n,s)
          {
        iok=0
        while (iok==0) {
            corr = diag(J(1,n,1))
            for (j=1; j<=n; j++) {
                    for (i=j+1; i<=n; i++) {
                        corr[i,j]= rnormal(1,1,0,s)
                        corr[j,i]= corr[i,j];
                    }
            }
        
            rnk = rank(corr)
                eigenv =symeigenvalues(corr)
            imn = minmax(eigenv)[1,1]
            if (rnk ==n & imn>0) {
                iok=1
            }
        }
        st_matrix("corr", corr)
          }
    end

  • #2
    Update: after posting, I found this thread here addressing the same question. All methods for this seem to come out of Lewandowski, Kurowicka, and Joe (2009). So I'll lean on their paper, and perhaps previous python code, to tackle this problem more carefully.

    Comment


    • #3
      Adopting MATLAB code to Mata is usually easy; here is the Mata version of vineBeta() from the linked StackExchange thread.

      Code:
      real matrix vineBeta(n, b)
      {
      
          real matrix P, S, p
          real scalar i, k, m
      
          P = J(n,n,0)
          S = I(n)
          
          for (k=1; k<n; k++) {
              for (i=k+1; i<=n; i++) {
                  P[k,i] = rbeta(1,1,b,b)
                  P[k,i] = (P[k,i]-0.5)*2
                  p = P[k,i];
                  for (m=k-1; m>=1; m--) {
                      p = p * sqrt((1-P[m,i]^2)*(1-P[m,k]^2)) + P[m,i]*P[m,k];
                  }
                  S[k,i] = S[i,k] = p
              }
          }
          
          // implement randperm() by hand
          p = runiform(n,1) , (1::n)
          _sort(p,1)
          p = p[.,2]
          S = S[p,p]
          
          return(S)
      }
      Last edited by Rafal Raciborski (StataCorp); 23 May 2018, 10:34.

      Comment


      • #4
        Hi -- thank you so much for doing the translation, this works so well! I will figure out, later, how to map the correlation matrix into a pretty square figure like the ones in the stack exchange forum. But for now, it's clear that the matrices are always PSD, and the correlation coefficients can be easily controlled by the parameter b. I've posted the ado file I made below. I'm not sure if it's written in a standard way (this is my first time writing an ado file), but it works!

        Code:
        program randcorr
        version 14.2
            args n b 
            mata: vineBeta(`n',`b')
        
        end
        
        version 14.2
        mata: 
        void vineBeta(n, b) {
        
            P = J(n,n,0)
            S = I(n)
            
            for (k=1; k<n; k++) {
                for (i=k+1; i<=n; i++) {
                    P[k,i] = rbeta(1,1,b,b)
                    P[k,i] = (P[k,i]-0.5)*2
                    p = P[k,i];
                    for (m=k-1; m>=1; m--) {
                        p = p * sqrt((1-P[m,i]^2)*(1-P[m,k]^2)) + P[m,i]*P[m,k];
                    }
                    S[k,i] = S[i,k] = p
                }
            }
            
            // implement randperm() by hand
            p = runiform(n,1) , (1::n)
            _sort(p,1)
            p = p[.,2]
            S = S[p,p]
        
            st_matrix("S", S)
            
        }
        end

        Comment

        Working...
        X