I have a question about triple standardization and weighting. I came across a published article that standardizes 3 times the same data. My statistical knowledge is limited, by I am wondering if this makes any sense. Here is the situation: imagine you have a dataset with 40 variables measuring, let’s say, different forms of well-being. What the authors of this article do is first standardize ((var-(mean))/sd) each of the 40 variables. Then they grouped the 40 variables into 5 indexes and standardized the indexes. Finally, they combined the five indexes into a single measure and standardized it again. It goes approximately like this:
1. Standardize the raw data (which is on the same scale)
foreach var of varlist x1-x40 {
summ `var’ [aw=weight]
replace `var’ = (`var’-r(mean))/r(sd)
}
2. Combine the raw data into 5 indexes and then standardizing again
egen Index1=rowtotal(x1…x8)
…
egen Index5=rowtotal(x32-x40)
foreach var of varlist Index1-Index5 {
summ `var’ [aw=weight]
replace `var’ = (`var’-r(mean))/r(sd)
}
4. After presenting some descriptive evidence using the 5 indexes the authors combine them into a single measure, and then standardize it again.
gen Composite=Index1+Index2+Index3+index4-Index5
foreach var of varlist Composite {
summ `var’ [aw=weight]
replace `var’ = (`var’-r(mean))/r(sd)
}
My question is: Does this procedure makes sense? Does the triple standardization influence the results (in terms of significance)? And is it correctly, to use population weights 3 times?
I am asking these questions because I have a similar dataset as these authors and would like to follow their procedure, but I am not 100% convinced that it is correct.
1. Standardize the raw data (which is on the same scale)
foreach var of varlist x1-x40 {
summ `var’ [aw=weight]
replace `var’ = (`var’-r(mean))/r(sd)
}
2. Combine the raw data into 5 indexes and then standardizing again
egen Index1=rowtotal(x1…x8)
…
egen Index5=rowtotal(x32-x40)
foreach var of varlist Index1-Index5 {
summ `var’ [aw=weight]
replace `var’ = (`var’-r(mean))/r(sd)
}
4. After presenting some descriptive evidence using the 5 indexes the authors combine them into a single measure, and then standardize it again.
gen Composite=Index1+Index2+Index3+index4-Index5
foreach var of varlist Composite {
summ `var’ [aw=weight]
replace `var’ = (`var’-r(mean))/r(sd)
}
My question is: Does this procedure makes sense? Does the triple standardization influence the results (in terms of significance)? And is it correctly, to use population weights 3 times?
I am asking these questions because I have a similar dataset as these authors and would like to follow their procedure, but I am not 100% convinced that it is correct.
Comment