Getting incorrect variance-covariance matrix with -drawnorm-?

Oliver Kendrick

Join Date: Apr 2016
Posts: 12

Getting incorrect variance-covariance matrix with -drawnorm-?

12 Mar 2018, 22:48

Hi. I want to generate data which is characterised by a multivariate normal distribution. I do the following:

Code:

clear all
mat M = (4\5\2\2.8\6)
mat C_a = ( .42250, .11430, .04225, .01521, .16900 \ ///
            .11430, .67640, .06764, .02435, .27056 \ ///
            .04225, .06764, .25000, .00900, .10000 \ ///
            .01521, .02435, .00900, .09000, .03600 \ ///
            .16900, .27056, .10000, .03600, 1.0000 )

Then to check I've entered the above correctly I do:

Code:

mat list M
mat list C_a

And get:

Code:

. mat list M

M[5,1]
     c1
r1    4
r2    5
r3    2
r4  2.8
r5    6

. mat list C_a

symmetric C_a[5,5]
        c1      c2      c3      c4      c5
r1   .4225
r2   .1143   .6764
r3  .04225  .06764     .25
r4  .01521  .02435    .009     .09
r5    .169  .27056      .1    .036       1

Which is great. However now if I enter

Code:

set seed 55555
drawnorm a_1 a_2 a_3 a_4 a_5, n(1000) means(M) cov(C_a)
mean a_*
mat list e(V)

I get:

Code:

. set seed 55555

. 
. drawnorm a_1 a_2 a_3 a_4 a_5, n(1000) means(M) cov(C_a)
(obs 1,000)

. 
. mean a_*

Mean estimation                   Number of obs   =      1,000

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         a_1 |   4.024775   .0202762      3.984986    4.064563
         a_2 |   5.040562   .0264706      4.988617    5.092506
         a_3 |    2.02859   .0158169      1.997552    2.059628
         a_4 |   2.797371   .0098233      2.778094    2.816647
         a_5 |   6.001783    .031926      5.939133    6.064432
--------------------------------------------------------------

. mat list e(V)

symmetric e(V)[5,5]
           a_1        a_2        a_3        a_4        a_5
a_1  .00041113
a_2  .00012935  .00070069
a_3  .00002468  .00006461  .00025018
a_4   .0000123  .00002889  6.978e-06   .0000965
a_5  .00015814  .00027018  .00008073  .00002877  .00101927

Can anybody tell me why the variance-covariance matrix in my data is so different from C_a, the matrix I enter for drawnorm? It is frequently 1/1000*C_a. Feel like I must be missing something extremely obvious here.

Thanks for any help!

Tags: None

Oliver Kendrick

Join Date: Apr 2016

Posts: 12
#2

12 Mar 2018, 23:00

I just realised that I have probably been extremely silly. Is it just because variance is calculated as: sum of ((X-mu)^2)/N ? I suppose this one of the benefits of working with generated data,actually seeing things practically you know theoretically but perhaps don't take to heart! I mean I know variance decreases in N but wasn't thinking... embarrassing!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

12 Mar 2018, 23:02

This is because e(V) is not the covariance matrix of your data. The diagonal components of e(V) are the squares of the standard errors of the means, not the squares of the standard deviations of the variables. Analogously, the off-diagonal terms are estimated covariances of the joint sampling distribution of the means of your distribution, not the covariances of your variables.

What you want is (code and output):

Code:

. clear all

. mat M = (4\5\2\2.8\6)

. mat C_a = ( .42250, .11430, .04225, .01521, .16900 \ ///
>             .11430, .67640, .06764, .02435, .27056 \ ///
>             .04225, .06764, .25000, .00900, .10000 \ ///
>             .01521, .02435, .00900, .09000, .03600 \ ///
>             .16900, .27056, .10000, .03600, 1.0000 )

.                        
. set seed 55555

. drawnorm a_1 a_2 a_3 a_4 a_5, n(1000) mean(M) cov(C_a)
(obs 1,000)

.
. summ a_*

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         a_1 |      1,000    4.024775    .6411906   2.157554   6.157671
         a_2 |      1,000    5.040562    .8370743   2.334469   7.364285
         a_3 |      1,000     2.02859    .5001752   .6888211   3.562246
         a_4 |      1,000    2.797371    .3106413    1.85871     3.8358
         a_5 |      1,000    6.001783    1.009589   3.066305   9.202579

. corr a_*, cov
(obs=1,000)

             |      a_1      a_2      a_3      a_4      a_5
-------------+---------------------------------------------
         a_1 |  .411125
         a_2 |  .129352  .700693
         a_3 |  .024681  .064612  .250175
         a_4 |    .0123  .028895  .006978  .096498
         a_5 |  .158141  .270177  .080731  .028766  1.01927

Added: Crossed with #2. Your reasoning there is not right. The formula for variance you give is correct, but as you increase the sample size, the sum of the (X-mu)² terms also increases in proportion to N because there are more terms, each of about the same magnitude. This balances the growth of N in the denominator. So, in fact, the variance estimate does not decrease as N increases. (Standard errors of sampling means do, but that's because they have an additional factor of sqrt(N) in their denominator, with nothing going on in the numerator to counterbalance it.)

Last edited by Clyde Schechter; 12 Mar 2018, 23:07.

Comment

Oliver Kendrick

Join Date: Apr 2016

Posts: 12
#4

12 Mar 2018, 23:04

Ah, thank you so much Clyde. Thought it was maybe something like that and then confused myself more. Really appreciate it.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#5

12 Mar 2018, 23:14

mean is an estimation command, and e(V) are standard errors. Try

Code:

correlate a_*, covariance

instead.
Comment

Announcement

Getting incorrect variance-covariance matrix with -drawnorm-?

Comment

Comment

Comment

Comment