Hello everyone,
I raise a question as I face a trouble in coding syntax for detecting outliers for meta analysis.
In principle, I followed the method suggested by Daniel J. Beal, David M. Corey, and William P. Dunlap (2002, JAP).
Originally, I expect to find at best 10% of outliers among sample studies; however, I was shocked that my syntax extracted 40~50% of cases as outlier, which is non-sense.
Given that, if there's someone who is familiar this procedure, I'd like to ask opinion regarding
1) My following syntax has some problems
2) The procedure of Beal et al. (2002) (called SAMD) has some problems in detecting outlier; indeed, the objects of our meta-analyses have varied correlation with diverse sample size.
========================
by studyid x_new1 y_new1, sort: egen r_bar=mean(r)
by studyid x_new1 y_new1, sort: egen n_mean=mean(n)
by studyid x_new1 y_new1, sort: egen x_rel=mean(xrel1)
by studyid x_new1 y_new1, sort: egen y_rel=mean(yrel1)
sort x_new1 y_new1
duplicates drop studyid x_new1 y_new1, force
*******************************
*Calculate SAMD to identify outliers*
*******************************
by x_new1 y_new1, sort: gen k=_N
*Fisher Z transformation of correlation*
gen z=(1/2)*ln((1+r_bar)/(1-r_bar))
*generate fisher z without the focal study*
by x_new1 y_new1, sort: egen z_sum=sum(z)
gen zwo=(z_sum-z)/(k-1)
gen vari=(1-zwo^2)^2/(n -1)
gen varr=(1-zwo^2)^2/((n_mean-1)*(k-1))
gen samdi= (z-zwo)/(sqrt(vari+varr))
gen abss=abs(samdi)
*compare samdi with critical cutoff*
gen outlier=.
replace outlier=1 if abs(samdi)>3
by x_new1 y_new1, sort: egen outlier_n=sum(outlier)
gen outlier_100=outlier_n/k
*drop if outlier=1
drop k
==========================
I raise a question as I face a trouble in coding syntax for detecting outliers for meta analysis.
In principle, I followed the method suggested by Daniel J. Beal, David M. Corey, and William P. Dunlap (2002, JAP).
Originally, I expect to find at best 10% of outliers among sample studies; however, I was shocked that my syntax extracted 40~50% of cases as outlier, which is non-sense.
Given that, if there's someone who is familiar this procedure, I'd like to ask opinion regarding
1) My following syntax has some problems
2) The procedure of Beal et al. (2002) (called SAMD) has some problems in detecting outlier; indeed, the objects of our meta-analyses have varied correlation with diverse sample size.
========================
by studyid x_new1 y_new1, sort: egen r_bar=mean(r)
by studyid x_new1 y_new1, sort: egen n_mean=mean(n)
by studyid x_new1 y_new1, sort: egen x_rel=mean(xrel1)
by studyid x_new1 y_new1, sort: egen y_rel=mean(yrel1)
sort x_new1 y_new1
duplicates drop studyid x_new1 y_new1, force
*******************************
*Calculate SAMD to identify outliers*
*******************************
by x_new1 y_new1, sort: gen k=_N
*Fisher Z transformation of correlation*
gen z=(1/2)*ln((1+r_bar)/(1-r_bar))
*generate fisher z without the focal study*
by x_new1 y_new1, sort: egen z_sum=sum(z)
gen zwo=(z_sum-z)/(k-1)
gen vari=(1-zwo^2)^2/(n -1)
gen varr=(1-zwo^2)^2/((n_mean-1)*(k-1))
gen samdi= (z-zwo)/(sqrt(vari+varr))
gen abss=abs(samdi)
*compare samdi with critical cutoff*
gen outlier=.
replace outlier=1 if abs(samdi)>3
by x_new1 y_new1, sort: egen outlier_n=sum(outlier)
gen outlier_100=outlier_n/k
*drop if outlier=1
drop k
==========================