Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Imputing several variables with hotdeck stata command

    Hi all,

    I'm currently working with the national household survey of my country and ran into the problem that income variables often have missing values. For this reason, I've decided to use the hot deck method to impute these missing values and found the hotdeck command in STATA. To make it easier to explain I copy the command's syntax:

    hotdeck [varlist] [using] [if exp] [in exp] , [ by(varlist) store impute(varlist) noise keep(varlist) command(command) parms(varlist) seed(#)
    infiles(filename filename ...) ]
    where varlist are the variable/s I'd like to impute.

    I have a doubt regarding the usage of the command. More specifically, I'm not sure whether I should run the command on all the variables I want to impute at once or if I should do it variable by variable. If I've understood the ado file correctly, if a unit has at least one missing value in one of the variables in varlist, it's considered a missing observation and, therefore, the command imputes all variables in varlist, even when it has observed values for the rest of the variables in varlist. So, my first thought was to run the command variable by variable. But then I've started reading about the method and now I'm not sure anymore which one is the right way.
    I would be extremely grateful for your guidance.

  • #2
    Caterina:
    why not considering -mi-?
    As an aside, the FAQ kindly request that you specify the usage of community-contributed Stata (not STATA, please, as per FAQ again) commands, for sound practical reasons. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo Lazzaro, thanks for your comment. Answering your question: for the time being, I'm trying to reproduce the methodology the national institute of statistics of my country used to use to impute variables in the past, and they opted for the random hotdeck method. That's why, for the time being, I'm not considering other methods.

      Regarding your comment on community-contributed Stata (thanks for the correction!) commands, I now specify the usage of the hotdeck command:

      I'm using hotdeck from SSC in Stata 15.1, which imputes missing values using the hotdeck method.

      A final comment, to be more precise about my problem: I need to impute 21 income variables with missing values. Hence, I wonder if I should impute all 21 at once or impute one by one.

      Comment


      • #4
        Caterina:
        I would sponsor imputing them all at once (if computationally feasible).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks for your answer, Carlo Lazzaro!

          Comment


          • #6
            Caterina, I'm a bit late here and this is off the top of my head but there are 2 other community-contributed Stata programs I'm aware of, and both are available via "ssc install":

            whotdeck -- Also by Adrian Mander, but I think more recent than hotdeck. I'm not sure of all the differences between hotdeck & whotdeck. The "w" is for weighted but note that it does not refer to frequency or survey weights but rather importance weights w.r.t. the hotdeck itself.

            hotdeckvar -- by Mattias Schonlau. This appears to be a simpler program with simpler syntax. For your purposes, I think this is probably all you need in which case simpler is probably better. It will hotdeck all the variables at the same time, not separately.

            I've also been working on a hotdeck command of my own, with the added feature of allowing frequency or survey weights to affect the selection of donor observations. If you care about weights, let me know. If not, I think my command is otherwise comparable to "hotdeckvar".

            Comment


            • #7
              John Eiler, I only read your message now. Thanks for the advice!! I will take a look at hotdeckvar program as you suggested then! For the time being, I'm not including frequency weights but I will keep it mind. Thanks again!

              Comment


              • #8
                Caterina Brest Lopez No problem. Btw I've since uploaded my program (wtd_hotdeck) and help file to here: https://github.com/johne13/wtd_hotdeck

                You can also use wtd_hotdeck without weights, but hotdeckvar has been around much longer than mine, so it would be the safer choice for sure. In the event you try out wtd_hotdeck I'd appreciate any comments especially if you notice any differences in results compared to hotdeckvar. (I've been meaning to do this comparison myself but haven't got around to it yet.)

                Btw, I also think hotdeck & whotdeck are good programs too. I used them a couple of years ago and also remembering struggling with the syntax but once I got it right it seemed to work well.
                Last edited by John Eiler; 01 May 2020, 10:39.

                Comment

                Working...
                X