Hello Statalisters,
I am trying to create a final socioeconomic (SE) measure (binary) out of multiple, binary, socioeconomic indicators (occupation, participant education education, crowding in house, presence or absence of window, drinking water, material of wall) etc. I guess PCA is the way to go rather than factor analysis as I am trying to summarize these variables into a single SE measure. Am I correct? I learn that since my variables are binary (and I have predetermined and fixed which indicators to use from descriptive analysis), I cannot do this straight forward but have to first output a polychoric (tetrachoric correlation matrix to be precise). The steps that I need to undertake include; 1) Getting the tetrachoric correlation matrix, 2) using this matrix to get the components, 3) rotating, 4) deciding how many components to use, 5) getting the score for the component(s) using predict, 6) dichotomizing the predicted score to get the final SE binary measure ( I will be using this binary measure for other analysis). Please correct me if there is anything wrong in these steps.
Getting into the analysis, I am able to perform a straightforward pca in Stata 13, but I am totally confused (-polychoric-, -polychoricpca-, -tetrachoric-, pcamat) as to how to do this after creating the matrix. Example code of what I tried with -tetrachoric- command is
1) Is this the way to go?
2) I used varimax here but I have also seen quartimin and promax rotations being used for creating final SE scores. How can I decide which one to use in my case?
3) How will the whole scenario differ if one or two indicators are ordinal, categorical variables?
Given below is an example data set produced by -dataex-.
Thankyou
I am trying to create a final socioeconomic (SE) measure (binary) out of multiple, binary, socioeconomic indicators (occupation, participant education education, crowding in house, presence or absence of window, drinking water, material of wall) etc. I guess PCA is the way to go rather than factor analysis as I am trying to summarize these variables into a single SE measure. Am I correct? I learn that since my variables are binary (and I have predetermined and fixed which indicators to use from descriptive analysis), I cannot do this straight forward but have to first output a polychoric (tetrachoric correlation matrix to be precise). The steps that I need to undertake include; 1) Getting the tetrachoric correlation matrix, 2) using this matrix to get the components, 3) rotating, 4) deciding how many components to use, 5) getting the score for the component(s) using predict, 6) dichotomizing the predicted score to get the final SE binary measure ( I will be using this binary measure for other analysis). Please correct me if there is anything wrong in these steps.
Getting into the analysis, I am able to perform a straightforward pca in Stata 13, but I am totally confused (-polychoric-, -polychoricpca-, -tetrachoric-, pcamat) as to how to do this after creating the matrix. Example code of what I tried with -tetrachoric- command is
Code:
tetrachoric Occup crowd water window wall edu
matrix C=r(corr)
pcamat C, n(102) // 102 observations in the sample data set
rotate, varimax
predict pc1 pc2
2) I used varimax here but I have also seen quartimin and promax rotations being used for creating final SE scores. How can I decide which one to use in my case?
3) How will the whole scenario differ if one or two indicators are ordinal, categorical variables?
Given below is an example data set produced by -dataex-.
Thankyou
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(Occup edu crowd) byte(wall window water) 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 end label values edu Edu label def Edu 0 "high", modify label def Edu 1 "low", modify
Comment