Hi all --
I've been trying to simulate power analysis for regression and had good luck using corr2data and drawnorm for generating data. I wanted to create data by constructing an equation with random variables rather than use corr2data or drawnorm. I create three random, standardized (m=0, sd=1) variables: z1, z2, and ze. I then create my regression equation as
generate y = .30*z1 + .40*z2 + ze
I expected that the correlation between Y and z1 would be .30 and Y and z2 to be .40, However, when I correlate these data, the correlations are always, on average, lower than what I expect at r = .26 (instead of .30) and r = .35 (instead of .40).
I set replications between 500 and 5000 and still I get correlations of the magnitude noted. Changed seed every time, still similar results.
No doubt that I am overlooking something that will be obvious to others, but I'm not seeing the problem,
Any help that can be offered would be welcomed.
Below is my simulation code. Using Version 12.1 of Stata.
Bryan
***********************************
clear all
set more off
set seed 1681
local alpha = .05
local samplesize = 92
local reps = 1500
postfile coefsv1 n_size beta1 se1 t1 pv1 beta2 se2 t2 pv2 bcons Model_R2 power_pv1 power_pv2 cor_z1z2 cor_yz1 cor_yz2 fz_z1z2 fz_yz1 fz_yz2 using version1, replace every(1)
tempname betamatv1
local rep = 0
quietly forv i=1/`reps'{
drop _all
set obs `samplesize'
generate x1= invnorm(uniform())
egen z1 = std(x1)
generate x2= invnorm(uniform())
egen z2 = std(x2)
generate e = invnorm(uniform())
egen ze = std(e)
generate y = .30*z1 + .40*z2 + ze
/* Ger correlations of this magnitude consistently */
*r y Z1 = .26
*r y Z2 = .35
reg y z1 z2
matrix betamatv1=e(b)
matrix semat=e(V)
local beta1 = betamatv1[1,1]
local se1 = semat[1,1]^.5
local t1 = `beta1'/`se1'
local pv1 = ttail(e(df_r),`t1')*2
local beta2 = betamatv1[1,2]
local se2 = semat[2,2]^.5
local t2 = `beta2'/`se2'
local pv2 = ttail(e(df_r),`t2')*2
local bcons = betamatv1[1,3]
local Model_R2 = e(r2)
local n_size = e(N)
if `pv1' < `alpha' {
local power_pv1 = 1
}
else {
local power_pv1 = 0
}
if `pv2' < `alpha' {
local power_pv2 = 1
}
else {
local power_pv2 = 0
}
corr y z1 z2 ze
mat C = r(C)
matrix list r(C)
local cor_z1z2 = C[2,3]
local cor_yz1 = C[1,2]
local cor_yz2 = C[1,3]
local fz_z1z2 = atanh(C[2,3])
local fz_yz1 = atanh(C[1,2])
local fz_yz2 = atanh(C[1,3])
noisily if `rep' == 50 {
display "Replications completed: " `i'
local rep = 1
}
noisily else if `rep' == 0 {
display " "
display " "
display " "
display "Replications commencing..."
local rep = 2
}
else {
local rep = `rep' +1
}
post coefsv1 (`n_size') (`beta1') (`se1') (`t1') (`pv1') (`beta2') (`se2') (`t2') (`pv2') (`bcons') (`Model_R2') (`power_pv1') (`power_pv2') (`cor_z1z2') (`cor_yz1') (`cor_yz2') (`fz_z1z2') (`fz_yz1') (`fz_yz2')
}
postclose coefsv1
use version1, clear
summ
I've been trying to simulate power analysis for regression and had good luck using corr2data and drawnorm for generating data. I wanted to create data by constructing an equation with random variables rather than use corr2data or drawnorm. I create three random, standardized (m=0, sd=1) variables: z1, z2, and ze. I then create my regression equation as
generate y = .30*z1 + .40*z2 + ze
I expected that the correlation between Y and z1 would be .30 and Y and z2 to be .40, However, when I correlate these data, the correlations are always, on average, lower than what I expect at r = .26 (instead of .30) and r = .35 (instead of .40).
I set replications between 500 and 5000 and still I get correlations of the magnitude noted. Changed seed every time, still similar results.
No doubt that I am overlooking something that will be obvious to others, but I'm not seeing the problem,
Any help that can be offered would be welcomed.
Below is my simulation code. Using Version 12.1 of Stata.
Bryan
***********************************
clear all
set more off
set seed 1681
local alpha = .05
local samplesize = 92
local reps = 1500
postfile coefsv1 n_size beta1 se1 t1 pv1 beta2 se2 t2 pv2 bcons Model_R2 power_pv1 power_pv2 cor_z1z2 cor_yz1 cor_yz2 fz_z1z2 fz_yz1 fz_yz2 using version1, replace every(1)
tempname betamatv1
local rep = 0
quietly forv i=1/`reps'{
drop _all
set obs `samplesize'
generate x1= invnorm(uniform())
egen z1 = std(x1)
generate x2= invnorm(uniform())
egen z2 = std(x2)
generate e = invnorm(uniform())
egen ze = std(e)
generate y = .30*z1 + .40*z2 + ze
/* Ger correlations of this magnitude consistently */
*r y Z1 = .26
*r y Z2 = .35
reg y z1 z2
matrix betamatv1=e(b)
matrix semat=e(V)
local beta1 = betamatv1[1,1]
local se1 = semat[1,1]^.5
local t1 = `beta1'/`se1'
local pv1 = ttail(e(df_r),`t1')*2
local beta2 = betamatv1[1,2]
local se2 = semat[2,2]^.5
local t2 = `beta2'/`se2'
local pv2 = ttail(e(df_r),`t2')*2
local bcons = betamatv1[1,3]
local Model_R2 = e(r2)
local n_size = e(N)
if `pv1' < `alpha' {
local power_pv1 = 1
}
else {
local power_pv1 = 0
}
if `pv2' < `alpha' {
local power_pv2 = 1
}
else {
local power_pv2 = 0
}
corr y z1 z2 ze
mat C = r(C)
matrix list r(C)
local cor_z1z2 = C[2,3]
local cor_yz1 = C[1,2]
local cor_yz2 = C[1,3]
local fz_z1z2 = atanh(C[2,3])
local fz_yz1 = atanh(C[1,2])
local fz_yz2 = atanh(C[1,3])
noisily if `rep' == 50 {
display "Replications completed: " `i'
local rep = 1
}
noisily else if `rep' == 0 {
display " "
display " "
display " "
display "Replications commencing..."
local rep = 2
}
else {
local rep = `rep' +1
}
post coefsv1 (`n_size') (`beta1') (`se1') (`t1') (`pv1') (`beta2') (`se2') (`t2') (`pv2') (`bcons') (`Model_R2') (`power_pv1') (`power_pv2') (`cor_z1z2') (`cor_yz1') (`cor_yz2') (`fz_z1z2') (`fz_yz1') (`fz_yz2')
}
postclose coefsv1
use version1, clear
summ
Comment