Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • HotDeck results not reproducible

    Hi everyone,

    I'm imputing data using HotDeck.

    gen income1 = income if treat == 0
    hotdeck income1, store by(eligible) seed(1) impute(5) keep(income1 householdid)
    merge 1:1 householdid using "$dtemp/imp2.dta", gen(_mnewx) keepusing(income1)
    drop _mnewx

    It somehow didn't give me reproducible results when I ran the do file several times. I did get the same results when I quit Stata and started all over again. Does anyone know the solution to this?

    Thanks so much,
    Last edited by ZH ZH; 15 May 2019, 15:44.

  • #2
    The hotdeck command (user written, from Stata Journal or SSC) uses a bootstrapping technique, which means random numbers are being generated. Read the output of help set seed for advice on gaining reproducible results from random processes. Beyond that, the output of help hotdeck tells us

    seed(#) specifies the random number generator seed. When using the seed option the hotdeck command must be used in the correct way. The key point is that ALL variables in the analysis command must be in the variable list, this ensures that the correlations between the variables are maintained post imputation.
    Since you don't describe how your results are not reproducible, it's hard to say more about what went wrong.
    Last edited by William Lisowski; 15 May 2019, 16:40.

    Comment


    • #3
      Thank you so much, William! Essentially for income1, when I summarize it, it shows different summary stats. Obs, mean, min and max are the same. However, I get different standard deviations if I re-run the do file.

      I'm also not using the analysis command, so I'm not sure what went wrong based on the help file...

      Comment


      • #4
        By "analysis command" the help hotdeck output means whatever command(s) you use to analyze the data that has been produced by the hotdeck command. In your case, it is apparently the summarize command. Which suggests, if I understand correctly, that perhaps you should include in the list of variables given to hotdeck all the variables you summarize, not just income1, as the list of variables to hotdeck. I understand that the help file talks about correlations, and you aren not computing correlations, but perhaps other measures are affected as well.

        You might try this summarizing income1 and just one or two of your other variables. See if the problem persists with just income1 in the variable list; then put the other summarized variables in the list and see if the problem goes away. This is all deeper than my dated knowledge of hot deck techniques, so perhaps I'm wrong, but it shouldn't take much to try it.

        Comment

        Working...
        X