Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate country-specific thresholds using continuous health index and country health distribution

    Dear Statalisters,

    I am using Stata 13. I am analysing the Survey of Health, Aging and Retirement in Europe (SHARE) data, waves 1, 2, 4, 5 and 6, to investigate the effects of health on labour force participation of the older workers. The main explanatory variable, sph, is an ordinal variable, coded 1 as Excellent, 2 as Very good, 3 as Good, 4 as Fair, and 5 as Poor. I computed a health index to address the "state-dependent reporting bias" in self-reported health, by firstly running generalised order probit regression (goprobit) of self-reported health on a set of quasi-objective health indicators, i.e. self-reports of chronic conditions. From the goprobit results I calculate the disability weight for each condition, then substract the total disability weights from 1 to obtain a health index. The health index, z_index, now is a continuous variable, ranging from 0 to 1 after a normalisation.

    Given the health index variable, I want to calculate the country-specific thresholds [as the exact quantiles of the country-specific health index distribution that correspond to the proportion of respondents that report up to a specific health level] (Jurges 2004). As I understand, I need to tabulate the original self-reported health variable (sph) by country and wave to obtain the cumulative percentages of the (country) population reported their health status in each categories, then _pctile the z_index using these cumulative percentages. However the stored results after tabulation contain only two scalars r(N) for total observation and r(r) for total number of categories of the dependent variable, without the cumulative percentages.

    As there are between 12 to 28 countries in each wave so it would take a long while to do tab - _pctile by hand as it involves typing a lot of numbers. Therefore, I think it may be quicker writing a program then loop it for each country and each wave.

    My trial codes are as follows:

    Code:
    [capture program drop pcal
    program define pcal
    
    // tabulate sph to get frequencies of each sph category & save the frequency matrix
    tab sph if `1'==1 & `2'==1, matcell(A)
    
    // extract frequency of each category and put in scalars
    scalar n=r(N)
    forval i=1/5 {
    scalar r`i'=A[`i',1]
    }
    
    // generate scalars as cumulative percentages
    scalar c1=r1/n
    forval i=2/5 {
    scalar c`i'=c`i-1'+r`i'/n
    }
    
    // store scalar values
    forval i=1/5 {
    scalar ce`i'=e(c`i')
    }
    
    // calculate percentiles of z_index based on determined cumulative percentages of sph
    _pctile z_index, p(`ce1', `ce2', `ce3', `ce4', `ce5')
    return list
    
    // drop all generated scalars and matrices
    scalar drop _all
    matrix drop _all
    end
    
    // Trial executation of the program pcal
    pcal austria 1
    Self-percei |
     ved health |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |      1,304        7.96        7.96
              2 |      3,863       23.58       31.54
              3 |      5,844       35.67       67.21
              4 |      4,115       25.12       92.33
              5 |      1,257        7.67      100.00
    ------------+-----------------------------------
          Total |     16,383      100.00
    option p() incorrectly specified
    r(198);]
    When I tried to run the program for the first country (austria==1), as a dummy variable coded 1 as Austria and 0 as Not Austria, and first wave (wave==1), it returns error code r(198) option p() incorrectly specified.

    I know that my program might be very basic for many of you here but this is the best I can do (as a very beginner Stata user). Please kindly suggest how should I fix the program and are there better options to work it out around the issue?

    Your comments are greatly welcome.

    Thanks,
    Tho

  • #2
    Well, -pcal- is not part of official Stata, nor does -search pcal- turn up any information about it. (I also could find nothing about it with a Google search.) But -pcal- is the program that is throwing the error message. Since you apparently found this program somewhere and have it installed, I suggest you read its help file (-help pcal-) to see what option p() is supposed to be, and then add it, properly specified in accordance with whatever the help file says, to your -pcal- command.

    If that doesn't solve your problem you have two choices. You can wait for a day or two to see if some other Forum member who is familiar with -pcal- chooses to respond to your questions. Or, you can contact the author of -pcal- directly for advice. (Most authors of community-contributed Stata programs put their names and contact information in the help file.)

    Comment


    • #3
      Dear Clyde,

      Many thanks for your quick response. Actually -pcal- is the program that I have written in order to solve my stated issue and am asking for comments on it. I know that the problem causing -pcal- not executing properly is something wrong with the p() option. I suppose that locals, i.e. the `ce1', `ce2' within the parentheses are not accepted as a numlist for p(). But I do not know how to fix it.

      Any further advice please.

      Kind regards,
      Tho

      Comment


      • #4
        The only p() option I see is the call to _pctile. Since in
        Code:
        _pctile z_index, p(`ce1', `ce2', `ce3', `ce4', `ce5')
        your arguments were created as scalars, not locals, you should instead code
        Code:
        _pctile z_index, p(ce1, ce2, ce3, ce4, ce5)
        See help scalar and the associated PDF documentation for more details. For example,
        Code:
        . scalar a = 4
        
        . display sqrt(a)
        2

        Comment


        • #5
          Dear William,

          Thanks you. I revise the program as you suggested, and Stata still returns the same error code.

          Code:
          option p() incorrectly specified
          r(198);
          The revised program is as follow, where I edit the second step (// generate scalars as cumulative percentages) a bit, to make it clearer.

          Code:
          capture program drop pcal
          program define pcal
          
          // tabulate sph to get frequencies of each sph category & save the frequency matrix
          tab sph if `1'==1 & `2'==1, matcell(A)
          
          // extract frequency of each category and put in scalars
          scalar n=r(N)
          forval i=1/5 {
          scalar r`i'=A[`i',1]
          }
          
          // generate scalars as cumulative percentages 
          scalar c1=r1/n
          scalar c2=c1+r2/n
          scalar c3=c2+r3/n
          scalar c4=c3+r4/n
          scalar c5=c4+r5/n
          
          // store scalar values
          forval i=1/5 {
          scalar ce`i'=e(c`i')
          }
          
          // calculate percentiles of z_index based on determined cumulative percentages of sph
          _pctile z_index, p(ce1, ce2, ce3, ce4, ce5)
          
          // drop all generated scalars and matrices
          scalar drop _all
          matrix drop _all
          end
          
          // Trial executation of the program pcal
          pcal austria 1
          return list
          
          Self-percei |
           ved health |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |      1,304        7.96        7.96
                    2 |      3,863       23.58       31.54
                    3 |      5,844       35.67       67.21
                    4 |      4,115       25.12       92.33
                    5 |      1,257        7.67      100.00
          ------------+-----------------------------------
                Total |     16,383      100.00
          option p() incorrectly specified
          r(198);
          I am doubting if any of the previous commands before _pctile might not be right?

          Comment


          • #6
            Let us look at this command, from the middle of a loop with i running from 1 to 5.
            Code:
            scalar ce`i'=e(c`i')
            Now suppose we're on the third pass through the loop, so i is 3 and this command becomes
            Code:
            scalar ce3=e(c3)
            where c3 will be a cumulative percentage, per the comment in the program, so it will be a number between 0 and 1. So suppose it is 0.42, the command is thus equivalent to
            Code:
            scalar ce3=e(0.42)
            What does that mean? What is it supposed to accomplish?

            My thought is that the loop creating the ce scalars is not necessary, and your _pctile command should be
            Code:
            _pctile z_index, p(c1, c2, c3, c4, c5)

            Comment

            Working...
            X