Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • error using svy with khb

    Hello,

    I am trying to use the khb command with complex data and I am getting an error message that I cannot understand. From my research, it seems like khb is able to deal with svy data, so I'm a little stumped as to the problem.

    I am using Stata 13.

    My data is svyset with the following code:

    Code:
     svyset IDNUMR [pw=WEIGHT_CATI], strata(stratacross)
    The model I'm trying to run is the following:

    Code:
    svy, subpop(DX_aut_11 if nomiss==1): khb logit causes_genetic ///
    hispanic black otherrace parented_aboveHS FPL100to199 FPL200to399 FPL400andup ///
    || all_limit sdq_CATI_high DX_dev_11 DX_int_11 pervasivedevdis autisticdis ///
    multipledx autyp_unknown, concomitant(male AGE) disentangle ape summary or
    And the error message I get when I do that is "khb is not supported by svy with vce(linearized); see help svy estimation for a list
    of Stata estimation commands that are supported by svy."

    I've tried changing the vce type and get the same message but with that type where linearized is in the above error message (So for example when I change to vce(cluster) I get the exact same error message above, except it says "khb is not supported by svy with vce(cluster); see help svy...").

    When I run the exact same model except without the svy prefix, but still subsetting the data and including the pweight, it runs no problem. That code looks like this:

    Code:
    khb logit causes_genetic ///
    hispanic black otherrace parented_aboveHS FPL100to199 FPL200to399 FPL400andup ///
    || all_limit sdq_CATI_high DX_dev_11 DX_int_11 pervasivedevdis autisticdis ///
    multipledx autyp_unknown if DX_aut_11==1  & nomiss==1 [pweight=WEIGHT_CATI], concomitant(male AGE) disentangle ape summary or
    I know that subsetting the data this way instead of subpop & using just the pweight instead of pweight + strata + cluster are affecting only my standard errors and not the coefficients, but I'm having trouble understanding why the code for the the first model above won't work (svy + subpop + khb). Any help is much appreciated; I've been through most of the khb posts on here and I didn't see anything similar, so if I missed something please point me in that direction. Thank you!

    Lydia

  • #2
    Welcome to the Stata Forum / Statalist,

    I have no experience with the user-written khb, whose author is Ulrich Kohler.

    That said, according to the help files, you may use probability weights, and that's the reason the second command of yours worked fine. However, there is no indication to use under svy design, which adds extra dimensions.
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Lydia DiSabatino View Post
      I know that subsetting the data this way instead of subpop & using just the pweight instead of pweight + strata + cluster are affecting only my standard errors and not the coefficients, but I'm having trouble understanding why the code for the the first model above won't work (svy + subpop + khb).
      The answer is trivial from a technical point of view: khb does not work with svy because the authors of khb did not set it up to work with svy. From a theoretical/statistical point, I cannot really give a good answer, because I honestly do not know whether khb should work with svy and how. Regarding the standard errors that khb reports for the differences/decomposition, it is always based on the delta-method and the original vce() that was specified for the underlying estimation command. We have discussed this issue before in another context.

      Best
      Daniel

      Comment


      • #4
        Thank you both for your helpful responses! I believed that I could use svy + khb because I have seen several works which appear to do so, including the following articles which both explicitly mention using svy and khb:

        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779738/

        https://academic.oup.com/psychsocger.../1/140/2632088

        I think my next steps are finding out more from those authors about if/how they combined these two commands. Thanks again for your help!

        Lydia

        Comment


        • #5
          Thank you for your feed back, Lydie. You may also contact the author of the program and check whether there is something coming on this.
          Best regards,

          Marcos

          Comment


          • #6
            Did anyone ever find a solution to this problem? I am having the same dilemma, trying to use multiply imputed data with KHB and the subpop command as a part of the svyset commands.

            Comment


            • #7
              Welcome to Statalist, Rebecca!

              I wish you had read the FAQ, especially FAQ 12, and told us about your model and survey setup. Without that information, I can only sketch a solution.

              You can come very close to a svy analysis. Specify the sampling weight with [pw = ] and the psu with vce(cluster). You'll lose the advantage of survey stratification. However, if the strata have descriptive characteristics like "region" or "urban/rural", perhaps you could add those to the model some place. You'll be forced to use if instead of subpop(). This means that standard errors will be slightly too small.

              Imputation: khb is an estimation command so you can estimate coefficients with Stata's multiple imputation facility. After you have mi set your data and identified the imputed variable, your command will look like.

              Code:
              mi estimate: khb <your model> [pw = sampling_weight  if <your subpop definition>, vce(cluster psuid)
              If, by chance, your survey data set is intended for replication analysis, then this method won't work. Unfortunately, I don't know how to combine mi and bootstrapping.

              Please note the long-standing practice in Statalist of using first and last names . See FAQ 6 and point 3 of this advice. Reading and following the FAQ, especially FAQ 12, will greatly enhance your chance of good answers.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                Sorry for the missing bracket in the pweight option, which should have read
                Code:
                 [pw = sampling_weight]
                [
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Thanks for responding so quickly with this. The command I have been working with without the survey weights was:

                  mim2, cat(fit): khb logit property ib2.pubertycat || dating6, c(male ib14.age i.race i.highestpareduc4 i.career4 singleparent4 bmin4 parentkessler4 external4 lowbirthweight) verbose or

                  This command runs but does not accept the additional of [pweight=weight] at any point in the model.
                  Alternatively, without imputation you can add the pweights with this code:

                  khb logit property ib2.pubertycat || dating6 [pweight=FOVWT2], c(male ib14.age i.race i.highestpareduc4 i.career4 singleparent4 bmin4 parentkessler4 external4 lowbirthweight) verbose or

                  I am currently using Stata 15, but also ran the code you offered on 14 to no avail. Any help is greatly appreciated and I apologize for not having read the FAQs before posting.
                  Thanks!
                  Rebecca

                  Comment


                  • #10
                    The template that I laid out works for me after I add option cmdok to mi: estimate. You said running my code was "to no avail". That isn't useful information and it is for that reason that FAQ 12 asks posters to show all the code and results, even if the results consist of an error message.
                    Code:
                    . cap net get st0236.pkg  // get sample data set
                    . use dlsy_khb, clear
                    . gen psu = mod(_n, 10) +1  // create 10  clusters
                    . gen wt =10                // sampling weight
                    
                    . replace abil = . in 1/2   //create missing data
                    (2 real changes made, 2 to missing)
                    
                    . mi set mlong
                    . mi register imputed abil
                    (2 m=0 obs. now marked as incomplete)
                    
                    . mi impute regress abil = edu upsec univ fgroup fses intact boy, add(5)
                    note: univ omitted because of collinearity
                    
                    Univariate imputation                       Imputations =        5
                    Linear regression                                 added =        5
                    Imputed: m=1 through m=5                        updated =        0
                    
                    ------------------------------------------------------------------
                                       |               Observations per m            
                                       |----------------------------------------------
                              Variable |   Complete   Incomplete   Imputed |     Total
                    -------------------+-----------------------------------+----------
                                  abil |       1894            2         2 |      1896
                    ------------------------------------------------------------------
                    (complete + incomplete = total; imputed is the minimum across m
                     of the number of filled-in observations.)
                    
                    . mi estimate, cmdok: khb logit univ fses  ||abil [pw=wt], ///
                    >              vce(cluster psu)  concomitant(intact boy)
                    
                    Multiple-imputation estimates                   Imputations       =          5
                    Decomposition                                   Number of obs     =      1,896
                                                                    Average RVI       =     0.0003
                                                                    Largest FMI       =     0.0005
                    DF adjustment:   Large sample                   DF:     min       =   1.40e+07
                                                                            avg       =   4.18e+08
                    Within VCE type:       Robust                           max       =   1.19e+09
                    
                    ------------------------------------------------------------------------------
                            univ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                         Reduced |   .5467557   .0735491     7.43   0.000     .4026022    .6909092
                            Full |   .3815932   .0752828     5.07   0.000     .2340417    .5291447
                            Diff |   .1651625   .1335212     1.24   0.216    -.0965342    .4268593
                    ------------------------------------------------------------------------------
                    Last edited by Steve Samuels; 30 Oct 2018, 18:41.
                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment

                    Working...
                    X