Dear Statalisters,
I am using Stata 13. I am analysing the Survey of Health, Aging and Retirement in Europe (SHARE) data, waves 1, 2, 4, 5 and 6, to investigate the effects of health on labour force participation of the older workers. The main explanatory variable, sph, is an ordinal variable, coded 1 as Excellent, 2 as Very good, 3 as Good, 4 as Fair, and 5 as Poor. I computed a health index to address the "state-dependent reporting bias" in self-reported health, by firstly running generalised order probit regression (goprobit) of self-reported health on a set of quasi-objective health indicators, i.e. self-reports of chronic conditions. From the goprobit results I calculate the disability weight for each condition, then substract the total disability weights from 1 to obtain a health index. The health index, z_index, now is a continuous variable, ranging from 0 to 1 after a normalisation.
Given the health index variable, I want to calculate the country-specific thresholds [as the exact quantiles of the country-specific health index distribution that correspond to the proportion of respondents that report up to a specific health level] (Jurges 2004). As I understand, I need to tabulate the original self-reported health variable (sph) by country and wave to obtain the cumulative percentages of the (country) population reported their health status in each categories, then _pctile the z_index using these cumulative percentages. However the stored results after tabulation contain only two scalars r(N) for total observation and r(r) for total number of categories of the dependent variable, without the cumulative percentages.
As there are between 12 to 28 countries in each wave so it would take a long while to do tab - _pctile by hand as it involves typing a lot of numbers. Therefore, I think it may be quicker writing a program then loop it for each country and each wave.
My trial codes are as follows:
When I tried to run the program for the first country (austria==1), as a dummy variable coded 1 as Austria and 0 as Not Austria, and first wave (wave==1), it returns error code r(198) option p() incorrectly specified.
I know that my program might be very basic for many of you here but this is the best I can do (as a very beginner Stata user). Please kindly suggest how should I fix the program and are there better options to work it out around the issue?
Your comments are greatly welcome.
Thanks,
Tho
I am using Stata 13. I am analysing the Survey of Health, Aging and Retirement in Europe (SHARE) data, waves 1, 2, 4, 5 and 6, to investigate the effects of health on labour force participation of the older workers. The main explanatory variable, sph, is an ordinal variable, coded 1 as Excellent, 2 as Very good, 3 as Good, 4 as Fair, and 5 as Poor. I computed a health index to address the "state-dependent reporting bias" in self-reported health, by firstly running generalised order probit regression (goprobit) of self-reported health on a set of quasi-objective health indicators, i.e. self-reports of chronic conditions. From the goprobit results I calculate the disability weight for each condition, then substract the total disability weights from 1 to obtain a health index. The health index, z_index, now is a continuous variable, ranging from 0 to 1 after a normalisation.
Given the health index variable, I want to calculate the country-specific thresholds [as the exact quantiles of the country-specific health index distribution that correspond to the proportion of respondents that report up to a specific health level] (Jurges 2004). As I understand, I need to tabulate the original self-reported health variable (sph) by country and wave to obtain the cumulative percentages of the (country) population reported their health status in each categories, then _pctile the z_index using these cumulative percentages. However the stored results after tabulation contain only two scalars r(N) for total observation and r(r) for total number of categories of the dependent variable, without the cumulative percentages.
As there are between 12 to 28 countries in each wave so it would take a long while to do tab - _pctile by hand as it involves typing a lot of numbers. Therefore, I think it may be quicker writing a program then loop it for each country and each wave.
My trial codes are as follows:
Code:
[capture program drop pcal program define pcal // tabulate sph to get frequencies of each sph category & save the frequency matrix tab sph if `1'==1 & `2'==1, matcell(A) // extract frequency of each category and put in scalars scalar n=r(N) forval i=1/5 { scalar r`i'=A[`i',1] } // generate scalars as cumulative percentages scalar c1=r1/n forval i=2/5 { scalar c`i'=c`i-1'+r`i'/n } // store scalar values forval i=1/5 { scalar ce`i'=e(c`i') } // calculate percentiles of z_index based on determined cumulative percentages of sph _pctile z_index, p(`ce1', `ce2', `ce3', `ce4', `ce5') return list // drop all generated scalars and matrices scalar drop _all matrix drop _all end // Trial executation of the program pcal pcal austria 1 Self-percei | ved health | Freq. Percent Cum. ------------+----------------------------------- 1 | 1,304 7.96 7.96 2 | 3,863 23.58 31.54 3 | 5,844 35.67 67.21 4 | 4,115 25.12 92.33 5 | 1,257 7.67 100.00 ------------+----------------------------------- Total | 16,383 100.00 option p() incorrectly specified r(198);]
I know that my program might be very basic for many of you here but this is the best I can do (as a very beginner Stata user). Please kindly suggest how should I fix the program and are there better options to work it out around the issue?
Your comments are greatly welcome.
Thanks,
Tho
Comment