Why is not covariance matrix fit to the result of sum?

Jiwon Nam

Join Date: Jul 2017

Posts: 13
#1

Why is not covariance matrix fit to the result of sum?

24 Aug 2017, 18:33

. Hi Stata masters!

I generated simulation data using covariance matrix.
I specified all the error variances as 1 in the covariance matrix as follows (diagonal numbers are all 1).

Because I also specified my true model by setting up the values of beta slopes, the variability of Y should be dependent on the only error term.
(In other words, in the equation y=0.1+0.6*d+0.6*w1+0.6*w2+0.6*w3+0.6*w4+0.6*w5+0.6 *w6+0.6*w7+0.6*w8+0.6*w9+0.6*w10+0.6*w11+0.6*w12+0 .6*w13+0.6*w14+0.6*w15+0.6*w16+u, every term is actually a constant because I defined the numbers, except for the error (u). So the variance of Y is the same as that of error).

However when I doublecked the descriptive statistics using the code "sum", the standard deviation of Y turned out as approximately 2 in the descriptive statistics table (actually 2.58 but when I run it again, it generated 2.07 something like that. I think the more covariates are the sd of Y becomes bigger and bigger and it is not close to be 1 as specified earlier in the covariance matrix).

Why did Stata generate wrong descriptive statistics? Is this because Stata miscalcuated the sd of Y, or is this because my syntax is something wrong ? Why isn't the covariance matrix matched with the descriptive statistics in terms Standard Deviation of Y?

The following syntax is what I made and would you please let me know what I should do in order to match the diagonal number 1 in the covariance matrix with the SD of Y in the descriptive statistics?

******************Simulation Codes ************************************
.*define covariance matrix
mat P = (1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)
matrix P = P\ (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)

*Draw from normal dist.
drawnorm x u w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16, n(4000) cov(P)
*Define treatment variable
generate d=(x>0)
*define outcome variable
generate y=0.1+0.6*d+0.6*w1+0.6*w2+0.6*w3+0.6*w4+0.6*w5+0.6 *w6+0.6*w7+0.6*w8+0.6*w9+0.6*w10+0.6*w11+0.6*w12+0 .6*w13+0.6*w14+0.6*w15+0.6*w16+u
*save dataset
save decideK16,replace
*Model decideK16 descriptive statistcs to check SD(Y) and SD(D)*
use decideK16,clear
sum y d w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16

. sum y d w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y | 4,000 .4063912 2.583068 -8.888543 11.30081
d | 4,000 .4905 .4999722 0 1
w1 | 4,000 .0130231 .9805118 -3.231501 3.674521
w2 | 4,000 .0148694 .9850495 -4.455201 3.480071
w3 | 4,000 .0190679 1.015951 -3.161212 3.477742
-------------+---------------------------------------------------------
w4 | 4,000 -.0083475 1.020264 -3.481928 3.343177
w5 | 4,000 .00874 1.003124 -4.002687 3.944189
w6 | 4,000 .0016545 .9941556 -3.615002 3.591007
w7 | 4,000 .0006976 1.01449 -3.297943 3.256187
w8 | 4,000 -.0037587 1.014113 -3.831774 3.498365
-------------+---------------------------------------------------------
w9 | 4,000 -.0095989 1.028091 -3.08722 3.859863
w10 | 4,000 .0187556 1.006128 -3.444188 3.649128
w11 | 4,000 -.002984 .9957664 -3.700773 3.406081
w12 | 4,000 .0078398 1.018744 -3.593562 4.082495
w13 | 4,000 -.0156626 1.010529 -3.729325 3.57018
-------------+---------------------------------------------------------
w14 | 4,000 .001853 1.006388 -3.269533 3.674261
w15 | 4,000 -.0226012 1.004162 -3.367424 3.241826
w16 | 4,000 -.0289284 .9867342 -3.406581 4.05419

.
end of do-file

Last edited by Jiwon Nam; 24 Aug 2017, 19:03.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

24 Aug 2017, 19:05

Because I also specified my true model by setting up the values of beta slopes, variability of Y should be dependent on the only error term.

No, that's not true. Given the way you have defined y in your -gen y = ...- command, the variance of Y is:

Code:

var(y) = 0.6²*var(w1) + 0.6²*var(w2) + ... + 0.6²*var(w16) + 1²*var(u)

Now, by design the variance of each w* is supposed to be 1: in the actual sample it will be close to 1, but not exactly so. Let's just ignore that and say var(w*) = 1 for all of the w's. Similarly we can say var(u) = 1. So our variance for y will be 16*0.6² + 1 = 5.76 + 1= 6.76. Which means the standard deviation of y should be sqrt(6.76), which is 2.6. The standard deviation you got from Stata is, instead, 2.58--that small discrepancy is due to the discrepancies between the actual variances of the w's and u and 1. So what Stata gave you is quite correct.

Added: I do not include any cross-product terms in the formula for var(y) because the covariances of all the w's and u are 0 by design.
Comment
Jiwon Nam

Join Date: Jul 2017

Posts: 13
#3

31 Aug 2017, 12:31

Hello Dr. Schechter,

Thank you for your kind explanation. I have another question.
After I changed betaD = 0 from 0.6 in the above outcome variable, which is
"generate y=0.1+0.6*d+0.6*w1+0.6*w2+0.6*w3+0.6*w4+0.6*w5+0.6 *w6+0.6*w7+....+0.6*w16+u" , I ran multiple regression and saved the dataset.

Using the same dataset, I ran another regression but at this time I applied Propensity score matching (PSM) to the regression using "psmatch2" .
In detail, I have three sub-questions about this PSM:

1. After I coded "psmatch2 d, out(y) pscore(phat) caliper(0.01) noreplacement common"

The following table was generated after PSM.

----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
y Unmatched | .093732179 .1406256 -.046893422 .06425678 -0.73
ATT | .173365067 .101062943 .072302124 .06584661 1.10
----------------------------+-----------------------------------------------------------

I know what "y Unmatched " is in the above table. It is the t-test result based on the raw data (not through PSM). This value is the same as what it is run from the t-test in SPSS based on the raw data. This means the two different software programs (SPSS and Stata) generated the same numbers through t-test based on raw data.

But after PSM, I was hoping that the ATT (T-stat) in the above table would be the same as the t-test in the SPSS (after the psm matching) but they were different. Why? Please tell me why ATT is not the same as the test-result in SPSS. Because they are difference between two sample independent t-test of SPSS (after matching) and ATT in Stata (after matching), I cannot trust ATT. In Stata so I can not use it. So whenever I report the t test value after PSM, I pull the matched (balanced) group observations in SPSS and run t-test in SPSS to report the PSM estimates in Group mean difference after matching. This takes a lot of time, so I wonder what to do in order to see t test result after matching in Stata.

ATT above is the same as the t-test in SPSS ? What is ATT (average treatment effect on the treated) in STATA? Did I miss something in Stata? Before I use the ATT (ex. 0.0723 for difference and t-test (1.10) in Stata), I wanted to see if the number (difference) is the same as the coefficient in t-test in SPSS after propensity score matching BUT they were different. Please tell me why.
(In detail, I read the data in SPSS to see if there is group mean difference (treatment effect) using t-test after propensity score matching. I run the t-test in the SPSS and compared the values in t-test with that of (ATT) in Stata. They were different, BUT I have no idea why they are different. Where are the result from the t-test based on the equivalency after you use matching to make equivalent groups by using propensity score?)

2. I used caliper (0.01). What is the most common value that empirical researchers typically use for the caliper value? Would you tell me the citation too?

3. The following table was generated after I used PSM.

psmatch2: | psmatch2: Common support
Treatment |
assignment | Off suppo On suppor | Total
-----------+----------------------+----------
Untreated | 0 1,982 | 1,982
Treated | 137 1,881 | 2,018
-----------+----------------------+----------
Total | 137 3,863 | 4,000

If I don't use the caliper, the numbers between treated and untreated groups are the same.
However, If I used the caliper, the only observations from the treated group were taken out for the off support. I understand the caliper decreases the chosen observations but I wonder why the numbers are taken out only from the treated group??

Based on my common sense, each observation in one group has a match within the specified "caliper" (the maximum distance ).
If the pair is not matched for the caliper, then the pair should be dropped, which is my thought
<--Is this true? then why is the above result showing the dropped numbers (137) are only from the treated group?
Would you tell me the procedure how to decide the taken-away numbers ? For the assumption of common support?

Thanks, Jiwon

Last edited by Jiwon Nam; 31 Aug 2017, 12:35.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

31 Aug 2017, 13:05

In general, even when replying to a response from a particular person, posts here should be addressed to the community at large, not the person who responded, unless there are several people who have responded and you need to identify which person's response you are addressing. (Even then, probably referring to the post number in the thread is better.)

In this particular case, I seldom use -psmatch2-, and haven't used it in a long time now. I don't know the answers to your questions. Hopefully somebody else will respond.

I also haven't used SPSS in many years now, so even if I were conversant with -psmatch2-, I would not be able to explain why its results differ from Stata's. I will say this: there are people on the forum who remain active users of both Stata and SPSS and may know the answer to your question. But they won't be able to help you unless you post the actual complete commands and output you got from both Stata and SPSS.

Finally, if there is a difference between Stata and SPSS, I don't know why you would automatically distrust Stata's results and trust SPSS's. The most likely explanation is that the code you used in Stata asks for something different from what your SPSS code asked for. Another possibility is that the propsenity score matching involves random selection to break ties, in which case you should not expect two packages to produce identical results, though they should be in fairly close agreement. If, in fact, one of the two programs is doing something incorrectly, I would say that it is just as likely to be a problem with SPSS as it is with Stata.
Comment
Jiwon Nam

Join Date: Jul 2017

Posts: 13
#5

31 Aug 2017, 13:40

Dr. Schechter,

"you need to identify which person's response you are addressing."<---In these posts, you and I are the only participants; therefore, I am not sure what you mean "which person's response". Maybe is this misunderstanding from my broken English? I do not know.

I am a doctoral student who just started to learn Stata. I am not sure what "ATT" is shown in Stata compared to t-test shown in SPSS. Of course, I know the meaning of ATT is average treatment effect on the treated, but I meant I do not know how the "ATT" shown in Stata has been computed. The logic of choosing the observations to be removed by PS, is pretty clear to me and I read the new data (namely, matched data) in SPSS to doublecheck. Then I run t-test in SPSS, and the result should be the same as the result shown in Stata, but not in my case. I thought "ATT" in Stata is a t-test but maybe I am wrong. I don't know.

I didn't do any other data manipulation in SPSS and it is straightforward because I used the pull-down menu. Therefore, I assume the discrepancy would be from my ignorance on "ATT" shown in Stata. Perhaps, how to describe my problem in the earlier post might not be appropriate enough to you, but hopefully many users would understand what I am trying to say here. Let me just run t-test using the matched data in Stata, instead of psm. If the result is the same as t-test in SPSS, then no problem. It implies ATT in Stata might be something different from just t-test using the matched data. This is my conjecture. And I think this is logical enough.

Last edited by Jiwon Nam; 31 Aug 2017, 13:51.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3861
#6

31 Aug 2017, 13:53

My apologies for not reading every detail here and not having given it much thought, but a couple of quick suggestions

1. What happens when you run a t-Test in Stata? Does the result match ATT?

2. Does the results of the t-Test in Stata match those of SPSS?

3. If propensity score matching is possible in SPSS, does SPSS report the same ATT as Stata?

If you answer questions 1 with "no" and question 2 with "yes", then it is probably a misunderstanding of what psmatch2 estimates rather than a difference between Stata and SPSS.

Also, psmatch2 is a user-written command (probably from SSC, but you are supposed to tell us). Have you tried official Stata's teffects psmatch routine? The latter is fully documented so you can read further on what it actually estimates.

Best
Daniel

Last edited by daniel klein; 31 Aug 2017, 13:55.
1 like
Comment
Jiwon Nam

Join Date: Jul 2017

Posts: 13
#7

31 Aug 2017, 16:46

Dr. Klein, This is exactly what I want to check right now. I just came to school from my work, so I will post the answers to your questions tonight. Thank you so very much! I have used "psmatch2" so far because I didn't know "teffects". Let me check it out also. Thanks.Jiwon
Comment

Announcement

Why is not covariance matrix fit to the result of sum?

Comment

Comment

Comment

Comment

Comment

Comment