Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting incorrect variance-covariance matrix with -drawnorm-?

    Hi. I want to generate data which is characterised by a multivariate normal distribution. I do the following:

    Code:
    clear all
    mat M = (4\5\2\2.8\6)
    mat C_a = ( .42250, .11430, .04225, .01521, .16900 \ ///
                .11430, .67640, .06764, .02435, .27056 \ ///
                .04225, .06764, .25000, .00900, .10000 \ ///
                .01521, .02435, .00900, .09000, .03600 \ ///
                .16900, .27056, .10000, .03600, 1.0000 )
    Then to check I've entered the above correctly I do:

    Code:
    mat list M
    mat list C_a
    And get:

    Code:
    . mat list M
    
    M[5,1]
         c1
    r1    4
    r2    5
    r3    2
    r4  2.8
    r5    6
    
    . mat list C_a
    
    symmetric C_a[5,5]
            c1      c2      c3      c4      c5
    r1   .4225
    r2   .1143   .6764
    r3  .04225  .06764     .25
    r4  .01521  .02435    .009     .09
    r5    .169  .27056      .1    .036       1
    Which is great. However now if I enter

    Code:
    set seed 55555
    drawnorm a_1 a_2 a_3 a_4 a_5, n(1000) means(M) cov(C_a)
    mean a_*
    mat list e(V)
    I get:

    Code:
    . set seed 55555
    
    . 
    . drawnorm a_1 a_2 a_3 a_4 a_5, n(1000) means(M) cov(C_a)
    (obs 1,000)
    
    . 
    . mean a_*
    
    Mean estimation                   Number of obs   =      1,000
    
    --------------------------------------------------------------
                 |       Mean   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
             a_1 |   4.024775   .0202762      3.984986    4.064563
             a_2 |   5.040562   .0264706      4.988617    5.092506
             a_3 |    2.02859   .0158169      1.997552    2.059628
             a_4 |   2.797371   .0098233      2.778094    2.816647
             a_5 |   6.001783    .031926      5.939133    6.064432
    --------------------------------------------------------------
    
    . mat list e(V)
    
    symmetric e(V)[5,5]
               a_1        a_2        a_3        a_4        a_5
    a_1  .00041113
    a_2  .00012935  .00070069
    a_3  .00002468  .00006461  .00025018
    a_4   .0000123  .00002889  6.978e-06   .0000965
    a_5  .00015814  .00027018  .00008073  .00002877  .00101927
    Can anybody tell me why the variance-covariance matrix in my data is so different from C_a, the matrix I enter for drawnorm? It is frequently 1/1000*C_a. Feel like I must be missing something extremely obvious here.

    Thanks for any help!



  • #2
    I just realised that I have probably been extremely silly. Is it just because variance is calculated as: sum of ((X-mu)^2)/N ? I suppose this one of the benefits of working with generated data,actually seeing things practically you know theoretically but perhaps don't take to heart! I mean I know variance decreases in N but wasn't thinking... embarrassing!

    Comment


    • #3
      This is because e(V) is not the covariance matrix of your data. The diagonal components of e(V) are the squares of the standard errors of the means, not the squares of the standard deviations of the variables. Analogously, the off-diagonal terms are estimated covariances of the joint sampling distribution of the means of your distribution, not the covariances of your variables.

      What you want is (code and output):

      Code:
      . clear all
      
      . mat M = (4\5\2\2.8\6)
      
      . mat C_a = ( .42250, .11430, .04225, .01521, .16900 \ ///
      >             .11430, .67640, .06764, .02435, .27056 \ ///
      >             .04225, .06764, .25000, .00900, .10000 \ ///
      >             .01521, .02435, .00900, .09000, .03600 \ ///
      >             .16900, .27056, .10000, .03600, 1.0000 )
      
      .                        
      . set seed 55555
      
      . drawnorm a_1 a_2 a_3 a_4 a_5, n(1000) mean(M) cov(C_a)
      (obs 1,000)
      
      .
      . summ a_*
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
               a_1 |      1,000    4.024775    .6411906   2.157554   6.157671
               a_2 |      1,000    5.040562    .8370743   2.334469   7.364285
               a_3 |      1,000     2.02859    .5001752   .6888211   3.562246
               a_4 |      1,000    2.797371    .3106413    1.85871     3.8358
               a_5 |      1,000    6.001783    1.009589   3.066305   9.202579
      
      . corr a_*, cov
      (obs=1,000)
      
                   |      a_1      a_2      a_3      a_4      a_5
      -------------+---------------------------------------------
               a_1 |  .411125
               a_2 |  .129352  .700693
               a_3 |  .024681  .064612  .250175
               a_4 |    .0123  .028895  .006978  .096498
               a_5 |  .158141  .270177  .080731  .028766  1.01927
      Added: Crossed with #2. Your reasoning there is not right. The formula for variance you give is correct, but as you increase the sample size, the sum of the (X-mu)2 terms also increases in proportion to N because there are more terms, each of about the same magnitude. This balances the growth of N in the denominator. So, in fact, the variance estimate does not decrease as N increases. (Standard errors of sampling means do, but that's because they have an additional factor of sqrt(N) in their denominator, with nothing going on in the numerator to counterbalance it.)
      Last edited by Clyde Schechter; 12 Mar 2018, 23:07.

      Comment


      • #4
        Ah, thank you so much Clyde. Thought it was maybe something like that and then confused myself more. Really appreciate it.

        Comment


        • #5
          mean is an estimation command, and e(V) are standard errors. Try
          Code:
          correlate a_*, covariance
          instead.

          Comment

          Working...
          X