Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Design weight for survey data in order to calculate Disability-Adjusted Life Years (DALYs)

    Dear all,

    I would like to calculate design weight for my survey data. In my research, I want to calculate DALYs = YLDs + YLLs.
    DALYs: Disability-Adjusted Life Years
    YLDs: Years lost due to Disability;
    YLLs: Years of life lost due to premature disability.
    My data has two parts: one part for YLDs and one part for YLLs.

    Let me makes it clear by the following example

    The size of my target population is 10.000 (N) which contains 30 clusters. Then, I choose 10 out of 30 clusters for sampling (Size of 10 clusters equal 5000)
    With the sample for YLDs: use equal probability of selection method for choosing 3000 subjects (300 subjects for each clusters choosen);
    With the sample for YLLs: choose all deaths in the 10 clusters during the study period.

    the sample size is 3.000 (n). And I used Probability Proportional to Size (PPS) sampling method. Then I can calculate the design weight equals N/n = 6.7.

    With the values of YLDs and YLLs calculated from the sample equal 1000 and 100, respectively.

    Then I calculate DALYs of the target population equals 1000*6.7 (YLDs) + 100*6.7 (YLLs).

    Finally, I use the -svyset- command
    Code:
    gen dw=20000/3000
    gen fpc=20000
    svyset cluster [pweight=dw],fpc(fpc)
    Am I right? Thank you all in advance.

    Last edited by Thong Nguyen; 11 Jul 2016, 22:32.

  • #2
    You've asked two versions of this question, in this post and an earlier one, but the details differ. In this post, the population size is 100,000; in the other it is 10,000 in one place and 20,000 in another. You say that you studied all of deaths "in 10 clusters"; I assume that these are the same clusters that were selected PPPS.

    To answer your question: your calculations of design weights and fpc s are not correct.

    Let be the population size and Nj be the size the j-th selected cluster. Then the probability of selection for the j-th cluster is f1j= Nj/N

    In each sampled cluster, the conditional probability that a dead person is selected for study is f2 = 1. The probability of selecting a living person for study is f_2 = 300/Nj.

    The overall probability of selecting a person is the product: f = f1 x f2. For a live person f = (Nj/N) x (300/Nj) = 300/N, a constant (equal probability method of selection); for a dead person in cluster j the probability of selection is f = (Nj/N x 1)= Nj/N. The design (sampling) weight is 1/f. This is N/300 for live people and N/Nj for dead people.

    For more on calculations like these, I suggest that you consult Lohr, 2009 and the Stata Manual entry for svyset.

    Here's an outline of code to do the analysis.

    Code:
    gen Nclus = cluster population size (Nj)
    gen stratum = 1 if alive
    replace stratum = 2 if dead
    gen dwt = N/300 if stratum==1
    replace dwt = N/Nclus if stratum==2
    gen fpc1 = 1/3   // 10 of 30 clusters
    gen fpc2 = 300/Nclus   if stratum ==1
    replace fpc2 = 1 if stratum==2
    svyset  psu [pw = dwt] fpc(fpc1) || _n strata(stratum) fpc(fpc2)
    For records of live people, definea new variable y equal to an indivdual's YLD (if alive) and equal to the individuals's YLL (if dead) You want the total and mean of X

    Then to estimate the mean of YLL + YLD
    Code:
    gen x = YLD if stratum==1
    replace x = YLL if stratum == 2
    svy: mean  x
    Reference: Lohr, Sharon L. 2009. Sampling: Design and Analysis. Boston, MA: Cengage Brooks/Cole.

    Last edited by Steve Samuels; 17 Jul 2016, 20:40.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Dear Steve,
      Firstly, thank you very much for your help and codes.

      However, when I read the Stata Survey Data Reference Manual (Release 13), on page # 172, the way to set -fpc- is not what you mentioned above. I mean if I understood correctly, fpc1=30 for both strata (alive and dead), and fpc2=Nculs for stratum alive and fpc2=1 for stratum dead.

      In this case, the only difference between to two stratum is design weight, I mean the variable -dw- will be calculated as
      Code:
      dw=N/300 if stratum==1 (alive)
      dw=N/Nculs if stratum==2 (dead)

      And then, if at the second stage, I choose different number of individuals for each cluster when selecting living individuals (unequal probability)
      Code:
      dw=N/nj if stratum==1 alive
      Because in this case, dw equals
      Code:
      (N/Nj) * (Nj/nj)
      with nj represents the number of living individuals I chose for each clusters.

      Am I right, Steve?
      Last edited by Thong Nguyen; 17 Jul 2016, 21:52.

      Comment


      • #4
        You have not read the complete description of fpc in the manual,
        fpc(varname) requests a finite population correction for the variance estimates. If varname has values less than or equal to 1, it is interpreted as a stratum sampling rate fh = nh/Nh, where nh = number of units sampled from stratum h and Nh = total number of units in the population belonging to stratum h. .
        Your second equation for dw of live subjects is the same as the first, because you specified that nj = 300 for all clusters. Note that even if the dead subjects were counted in the Nj used to sample the clusters, the design weight for live subjects is still N/300.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Dear Steve,

          I do read the description of fpc option in the manual, including the part you mentioned. However, I do not understand the values of varname, what does it mean?

          For example, when we use the command -svyset-, then -fpc option- as fpc(fpc1). The varname in this case is fpc1, I am confused that how do I determine fpc1 has values less than or equal to 1 or not, how to do that?

          Another question is the probability of selection for each cluster at the first stage. It is
          Code:
          f1j=Nj/N
          or
          Code:
          f1j=10*(Nj/N)
          Last edited by Thong Nguyen; 17 Jul 2016, 23:35.

          Comment


          • #6
            Originally posted by Thong Nguyen View Post
            Dear Steve,

            I do read the description of fpc option in the manual, including the part you mentioned. However, I do not understand the values of varname, what does it mean?

            For example, when we use the command -svyset-, then -fpc option- as fpc(fpc1). The varname in this case is fpc1, I am confused that how do I determine fpc1 has values less than or equal to 1 or not, how to do that?

            Another question is the probability of selection for each cluster at the first stage. It is
            Code:
            f1j=Nj/N
            or
            Code:
            f1j=10*(Nj/N)
            I also have another question for you.
            When I try to search and figure it out by myself, then I found some documents which make me confused. You can see them in the links below.

            In Ucla's workshop, fpc is calculated via the formula
            Code:
            fpc = sqrt((N-n)/(N-1))
            N represents population size;
            n represents sample size
            On the contrary, from the 2010 Mexican Stata Users Group meeting, fpc is proportion of PSUs sampled within each stratum (only for sampling without replacement) as you mentioned earlier in thread # 2.

            I guess that I need to use the formula for calculating fpc, and then if fpc is less or equal 1, I declare it to Stata as a sampling rate. I mean in this case, fpc becomes
            Code:
            fpc = nh/Nh = 10/30 = 1/3
            Am I right?


            Could you help me to clarify that? Thank you again, Steve.
            Last edited by Thong Nguyen; 18 Jul 2016, 03:31.

            Comment


            • #7
              fpc1 = 10/30= 1/3 is a correct specification, as I stated in my first post. ; How you got a 10 in your formula for f1j is mystery to me. You don't need to use the theoretical formula (n-1)/(N-n), because Stata will calculate it once you specify an fpc variable. At this point, I'm going to bow out of the discussion since I really have nothing to add. Good luck!
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Let me correct one error: In the fpc formula \(\sqrt{1 -f}\) (Survey Manual Chapter "Variance Estimation"; equation 2 in the Stratified Two-Stage Design section) , Stata uses the definition \(f = \frac{n}{N}\), instead of \(f = \frac{n-1}{N-1}\). This makes no difference in practice. In your example, the first stage fpc is 0.81 to two decimal places for both versions.
                Last edited by Steve Samuels; 18 Jul 2016, 10:21.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  ​Correction: The "10" in your formula is no longer a mystery. In fact it's a good idea with PPS sampling. The design weights as I originally specified them would be okay for estimating statistics other than totals. For estimating totals, the correct factor to use in the first stage is

                  \[
                  f_{1j} = n \frac{N_j}{N}
                  \]
                  or with n = 10
                  \[
                  f_{1j} = 10 \frac{N_j}{N}
                  \]

                  This is the probability that cluster j will be selected by a proper PPS sampling algorithm. If, however, \(\frac{N_j}{N}>1/10\) for any cluster, you would have been forced to treat that cluster as a certainty unit and to restart the algorithm with the reduced n.

                  One thing unclear from your description is whether deaths were included in the population \(N_j\) and \(N\) totals. If so, then for living people

                  \[
                  f_{2j} = \frac{300}{N_j - D_j}
                  \]
                  where \(D_j\) is the number of deaths in cluster j.

                  Then form the design weight

                  \[
                  W_j = (1/f_{1j})(1/f_{2j})
                  \]
                  as before.
                  Last edited by Steve Samuels; 19 Jul 2016, 14:11.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    Dear Steve,
                    I mean at the first stage, I chose 10 clusters from 30 clusters, then within 10 clusters selected, I chose all of deaths occurred in the population of each cluster. I think fpc2 =1 for the stratum ==2 (dead) is the right one (because sampling rate for death is 100%).

                    However, I read the Survey Data Reference Manual, on page #186: " The factor (1 - fh) is the FPC, and fh is the sampling rate for the first stage of sampling. The factor (1 - fhi) is the FPC, and fhi is the sampling rate for PSU (h; i). The sampling rate fhi is derived in the same manner as fh".

                    Then, I think fpc for any stage of sampling is calculated as a sampling rate of the stage.

                    I still confused and do not understand "fpc(varname) requests a finite population correction for the variance estimates. If varname has values less than or equal to 1, it is interpreted as a stratum sampling rate fh = nh/Nh, where nh = number of units sampled from stratum h and Nh = total number of units in the population belonging to stratum h. ."

                    I also read the following document from Standford university which I attached below (page #7).

                    On the page #173 in the Stata's Manual: "
                    Rather than having a variable that represents the total number of PSUs per stratum in the sampling
                    frame, we sometimes have a variable that represents a sampling rate fh = nh/Nh. The syntax for svyset is the same whether the FPC variable contains Nh or fh. The survey variance-estimation routines in Stata are smart enough to identify what type of FPC information has been specified. If the FPC variable is less than or equal to 1, it is interpreted as a sampling rate; if it is greater than or equal to nh , it is interpreted as containing Nh . It is an error for the FPC variable to have values between 1 and nh or to have a mixture of sampling rates and stratum sizes."

                    To sum up, I wonder whether I can simply declare to Stata the value of -fpc- variable as a sampling rate for any cases or not. If not, in which case when I have to declare Stata -fpc- as a Nh. You can help me using example of my first stage sampling, using range of sampling rate such as 1//30, 5/30, 10/30 and so on.
                    Thank you Steve and sorry if I am bothering you.
                    Attached Files
                    Last edited by Thong Nguyen; 19 Jul 2016, 16:57.

                    Comment


                    • #11
                      I try to use both sampling rate and Nh for -fpc- option, then I got the same results. Turning back to the Stata 's manual, I guess that we can specify both of them for same results.
                      Code:
                      .         gen psu=thon
                      
                      .         bysort thon: gen ts1=1827786/(dsthon*32)
                      
                      .         gen fpc1=32/2467
                      
                      .         bysort thon: gen ts2=dsthon/cases_m
                      
                      .         bysort thon: gen ts=ts1*ts2
                      
                      .         gen fpc2=cases_m/dsthon
                      
                      .         tab gioi
                      
                        gioi tinh |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                              Nam |      2,309       47.73       47.73
                               Nu |      2,529       52.27      100.00
                      ------------+-----------------------------------
                            Total |      4,838      100.00
                      
                      .         
                      .         svyset psu [pweight=ts], fpc(fpc1) || _n, fpc(fpc2)
                      
                            pweight: ts
                                VCE: linearized
                        Single unit: missing
                           Strata 1: <one>
                               SU 1: psu
                              FPC 1: fpc1
                           Strata 2: <one>
                               SU 2: <observations>
                              FPC 2: fpc2
                      
                      .         svy: proportion gioi
                      (running proportion on estimation sample)
                      
                      Survey: Proportion estimation
                      
                      Number of strata =       1         Number of obs    =     4838
                      Number of PSUs   =      32         Population size  =  1827786
                                                         Design df        =       31
                      
                      --------------------------------------------------------------
                                   |             Linearized
                                   | Proportion   Std. Err.     [95% Conf. Interval]
                      -------------+------------------------------------------------
                      gioi         |
                               Nam |   .4755179   .0095807      .4560254    .4950853
                                Nu |   .5244821   .0095807      .5049147    .5439746
                      --------------------------------------------------------------
                      
                      . restore
                      
                      . 
                      . preserve
                      
                      .         use CD_Thon_YLD, clear
                      (FILE SÔ LIÊU)
                      
                      .         gen psu=thon
                      
                      .         bysort thon: gen ts1=1827786/(dsthon*32)
                      
                      .         gen fpc1=2467
                      
                      .         bysort thon: gen ts2=dsthon/cases_m
                      
                      .         bysort thon: gen ts=ts1*ts2
                      
                      .         gen fpc2=dsthon
                      
                      .         tab gioi
                      
                        gioi tinh |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                              Nam |      2,309       47.73       47.73
                               Nu |      2,529       52.27      100.00
                      ------------+-----------------------------------
                            Total |      4,838      100.00
                      
                      .         
                      .         svyset psu [pweight=ts], fpc(fpc1) || _n, fpc(fpc2)
                      
                            pweight: ts
                                VCE: linearized
                        Single unit: missing
                           Strata 1: <one>
                               SU 1: psu
                              FPC 1: fpc1
                           Strata 2: <one>
                               SU 2: <observations>
                              FPC 2: fpc2
                      
                      .         svy: proportion gioi
                      (running proportion on estimation sample)
                      
                      Survey: Proportion estimation
                      
                      Number of strata =       1         Number of obs    =     4838
                      Number of PSUs   =      32         Population size  =  1827786
                                                         Design df        =       31
                      
                      --------------------------------------------------------------
                                   |             Linearized
                                   | Proportion   Std. Err.     [95% Conf. Interval]
                      -------------+------------------------------------------------
                      gioi         |
                               Nam |   .4755179   .0095807      .4560254    .4950853
                                Nu |   .5244821   .0095807      .5049147    .5439746
                      --------------------------------------------------------------
                      
                      . restore
                      .
                      However, I also have another problem with the stratum dead. Several clusters have no death cases. For example, in this case, I have two clusters like that and after using svyset command, Stata shows that I got just 8 PSUs. Does it make any error for later analytical process? I mean when I face with a survey data set which has some PSUs with no observation, using sampling rate or Nh will result in different statistics includes Standard errors and CIs, however, estimations are the same.

                      Code:
                      . use CD_Appended_Data, clear
                      
                      . 
                      . * TAO FILE TINH YLL CHO DAN SO
                      .         drop if nhom==1
                      (4838 observations deleted)
                      
                      .         gen c=[_N]-[_n]
                      
                      .         collapse (count) c,by(thon)
                      
                      .         ren c cases_c
                      
                      .         tab cases_c
                      
                        (count) c |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                                1 |          3       11.54       11.54
                                2 |          3       11.54       23.08
                                3 |          2        7.69       30.77
                                4 |          2        7.69       38.46
                                5 |          3       11.54       50.00
                                6 |          4       15.38       65.38
                                7 |          3       11.54       76.92
                                9 |          3       11.54       88.46
                               14 |          2        7.69       96.15
                               16 |          1        3.85      100.00
                      ------------+-----------------------------------
                            Total |         26      100.00
                      
                      .         la var cases_c "Tong so ca chet o tung thon"
                      
                      .         replace cases_c=0 if thon==3 | thon==8 | thon==15 | thon==18 | thon==19 | thon==26
                      (0 real changes made)
                      
                      .         save cases_YLL_thon, replace
                      file cases_YLL_thon.dta saved
                      
                      . 
                      .         use CD_Appended_Data, clear
                      (FILE SÔ LIÊU)
                      
                      .         drop if nhom==1
                      (4838 observations deleted)
                      
                      .         merge m:1 thon using cases_YLL_thon
                      (label thon already defined)
                      
                          Result                           # of obs.
                          -----------------------------------------
                          not matched                             0
                          matched                               154  (_merge==3)
                          -----------------------------------------
                      
                      .         drop _merge
                      
                      .         
                      .         save CD_Thon_YLL, replace
                      file CD_Thon_YLL.dta saved
                      
                      .         
                      .         tab1 nhom gioi
                      
                      -> tabulation of nhom  
                      
                             Nhom |
                          benh/tu |
                             vong |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                          Tu vong |        154      100.00      100.00
                      ------------+-----------------------------------
                            Total |        154      100.00
                      
                      -> tabulation of gioi  
                      
                        gioi tinh |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                              Nam |         86       55.84       55.84
                               Nu |         68       44.16      100.00
                      ------------+-----------------------------------
                            Total |        154      100.00
                      
                      .         gen psu=thon
                      
                      .         gen ts=1827786/(dsthon*32)
                      
                      .         gen fpc1=2467
                      
                      .         gen fpc2=1
                      
                      .         svyset psu [pweight=ts], fpc(fpc1) || _n, fpc(fpc2)
                      
                            pweight: ts
                                VCE: linearized
                        Single unit: missing
                           Strata 1: <one>
                               SU 1: psu
                              FPC 1: fpc1
                           Strata 2: <one>
                               SU 2: <observations>
                              FPC 2: fpc2
                      
                      .         svy: proportion gioi
                      (running proportion on estimation sample)
                      
                      Survey: Proportion estimation
                      
                      Number of strata =       1          Number of obs    =     154
                      Number of PSUs   =      26          Population size  = 11132.1
                                                          Design df        =      25
                      
                      --------------------------------------------------------------
                                   |             Linearized
                                   | Proportion   Std. Err.     [95% Conf. Interval]
                      -------------+------------------------------------------------
                      gioi         |
                               Nam |   .5778172   .0322671      .5103484    .6425017
                                Nu |   .4221828   .0322671      .3574983    .4896516
                      --------------------------------------------------------------
                      
                      . 
                      . 
                      . *
                      . preserve
                      
                      .         use CD_Appended_Data, clear
                      (FILE SÔ LIÊU)
                      
                      .         drop if nhom==1
                      (4838 observations deleted)
                      
                      .         merge m:1 thon using cases_YLL_thon
                      (label thon already defined)
                      
                          Result                           # of obs.
                          -----------------------------------------
                          not matched                             0
                          matched                               154  (_merge==3)
                          -----------------------------------------
                      
                      .         drop _merge
                      
                      .         save CD_Thon_YLL, replace
                      file CD_Thon_YLL.dta saved
                      
                      .         
                      .         tab1 nhom gioi
                      
                      -> tabulation of nhom  
                      
                             Nhom |
                          benh/tu |
                             vong |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                          Tu vong |        154      100.00      100.00
                      ------------+-----------------------------------
                            Total |        154      100.00
                      
                      -> tabulation of gioi  
                      
                        gioi tinh |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                              Nam |         86       55.84       55.84
                               Nu |         68       44.16      100.00
                      ------------+-----------------------------------
                            Total |        154      100.00
                      
                      .         gen psu=thon
                      
                      .         gen ts=1827786/(dsthon*32)
                      
                      .         gen fpc1=32/2467
                      
                      .         gen fpc2=1
                      
                      .         svyset psu [pweight=ts], fpc(fpc1) || _n, fpc(fpc2)
                      
                            pweight: ts
                                VCE: linearized
                        Single unit: missing
                           Strata 1: <one>
                               SU 1: psu
                              FPC 1: fpc1
                           Strata 2: <one>
                               SU 2: <observations>
                              FPC 2: fpc2
                      
                      .         svy: proportion gioi
                      (running proportion on estimation sample)
                      
                      Survey: Proportion estimation
                      
                      Number of strata =       1          Number of obs    =     154
                      Number of PSUs   =      26          Population size  = 11132.1
                                                          Design df        =      25
                      
                      --------------------------------------------------------------
                                   |             Linearized
                                   | Proportion   Std. Err.     [95% Conf. Interval]
                      -------------+------------------------------------------------
                      gioi         |
                               Nam |   .5778172   .0322274      .5104321    .6424247
                                Nu |   .4221828   .0322274      .3575753    .4895679
                      --------------------------------------------------------------
                      
                      . restore

                      Do you have any idea or suggestion, Steve.

                      Last edited by Thong Nguyen; 23 Jul 2016, 08:07.

                      Comment


                      • #12
                        I don't think I can add anything to what I've said before. And, I suggest that you start a new Topic with your question about strata.
                        Last edited by Steve Samuels; 24 Jul 2016, 15:34.
                        Steve Samuels
                        Statistical Consulting
                        [email protected]

                        Stata 14.2

                        Comment


                        • #13
                          I don't really follow your code & results, Thong. (Your last svyset lost six PSUs, not two). With only living people in some PSUs, the stratum option in the second stage won't work and it should be dropped as you've done. This implicitly creates a single stratum for the second stage. However then you must also drop the fpc2 option, as the fpc must be constant within a stratum (and clearly it's not). So my final advice: drop fpc2 and stratum in the second stage and end svyset with:

                          Code:
                          || _n
                          Last edited by Steve Samuels; 25 Jul 2016, 08:21.
                          Steve Samuels
                          Statistical Consulting
                          [email protected]

                          Stata 14.2

                          Comment


                          • #14
                            Thank you very much for your advice and your time, Steve.

                            Comment

                            Working...
                            X