Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MI estimate

    Dear all,

    I have the following survey data, where Loan-to-Value (LTV) refers to the LTV ratio of each household in the dataset. I want to find the median value of LTV for each year. For example, in 2011 get a single value for LTV based on the number of observations falling into 2011. The same for each year. My very inefficient way of going about it is:

    mi estimate, vceok: svy: medianize LTV if (year==2010)

    medianize is an ado file.

    My question is whether the above command will give me the correct results.

    Year Loan-to-value
    2010 78.85555
    2011
    2011 75
    2011 70.10143
    2011
    2011
    2012
    2012 34.29438
    2012
    2012
    2012
    2012
    2012
    2012
    2012 83.78378
    2013
    2013
    2013
    2013 97.29729
    Thanks
    Ilias

  • #2
    Welcome to Statalist, Ilias.

    It's worth noting that searching the internet for "medianize" leads to references for the Household Finance and Consumption Survey (HFCS), for which "medianize" was apparently written to assist Stata users in analyzing. It's also worth noting that vceok is an undocumented option of the mi estimate command.

    From one of the links below, "Using the HFCS is not trivial, due to the large size of the files, the use of multiple imputation (MI), and bootstrap replicate weights."

    I expect that the answer to your question depends more on understanding the design of the HFCS than on understanding Stata syntax.

    https://groups.google.com/forum/#!forum/hfcs-users
    https://groups.google.com/forum/#!to...rs/gTfF6ikyKr4

    Comment


    • #3
      Ilias:
      welcome to this forum.
      setting aside for a while the -svy- structure of your dataset (actually, you do not provide any detail about HFCS to feed the -mi svy- prefix, as William wisely noted), the MI of your data example can be performed as follows:
      Code:
      . input Year Loan_to_value
                Year  Loan_to~e
        1. 2010 78.85555
        2. 2011  .
        3.
      . 2011 75
        4.
      . 2011 70.10143
        5.
      . 2011  .
        6.
      . 2011  .
        7.
      . 2012  .
        8.
      . 2012 34.29438
        9.
      . 2012  .
       10.
      . 2012  .
       11.
      . 2012  .
       12.
      . 2012  .
       13.
      . 2012  .
       14.
      . 2012  .
       15.
      . 2012 83.78378
       16.
      . 2013  .
       17.
      . 2013  .
       18.
      . 2013  .
       19.
      . 2013 97.29729
       20.
      . end
      
      . mi set flong
      
      . set seed 12345
      
      
      . mi register impute Loan_to_value
      (13 m=0 obs. now marked as incomplete)
      
      . mi regress Loan_to_value= Year, add(20)
      subcommand mi regress is unrecognized
      r(199);
      
      . mi impute regress Loan_to_value= Year, add(20)
      
      Univariate imputation                       Imputations =       20
      Linear regression                                 added =       20
      Imputed: m=1 through m=20                       updated =        0
      
      ------------------------------------------------------------------
                         |               Observations per m            
                         |----------------------------------------------
                Variable |   Complete   Incomplete   Imputed |     Total
      -------------------+-----------------------------------+----------
           Loan_to_value |          6           13        13 |        19
      ------------------------------------------------------------------
      (complete + incomplete = total; imputed is the minimum across m
       of the number of filled-in observations.)
      
      . misum Loan_to_value, d
      
      m=1/20 data
      
          Variable |      Mean         SD        min        max          N
      -------------+-------------------------------------------------------
      Loan_to_va~e |  72.62088   31.44061   8.367192   128.2861         19
      
      . return list
      
      scalars:
        r(Loan_to_value_p99) =  128.2861404418945
        r(Loan_to_value_p95) =  128.2861404418945
        r(Loan_to_value_p90) =  111.1736999511719
        r(Loan_to_value_p75) =  91.79726181030273
        r(Loan_to_value_p25) =  53.17518768310547
        r(Loan_to_value_p10) =  31.43407810926437
         r(Loan_to_value_p5) =  8.367191761732101
         r(Loan_to_value_p1) =  8.367191761732101
        r(Loan_to_value_kur
          tosis)             =  3.249060808000262
        r(Loan_to_value_ske
          wness)             =  -.2647977815045786
        r(Loan_to_value_p50) =  74.88788089752197
        r(Loan_to_value_sum) =  1379.796798998118
        r(Loan_to_value_sum
          _w)                =  19
          r(Loan_to_value_N) =  19
        r(Loan_to_value_max) =  128.2861404418945
        r(Loan_to_value_min) =  8.367191761732101
        r(Loan_to_value_Var) =  988.5122034019607
         r(Loan_to_value_sd) =  31.44061391579307
        r(Loan_to_value_mea
          n)                 =  72.62088415779564
      
      .
      Credits for the useful user-written command -misum- go to Daniel Klein (-search misum-).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        A word of caution: misum blindly combines estimates according to Rubin's rules. Usually, results for the median are reliable. An alternative, though slower, approach is

        Code:
        mi estimate : qreg Loan_to_vale
        Best
        Daniel

        Comment


        • #5
          Thanks to both of you for pointing this command out. I wasn't aware of misum, and I bet it would be useful in my own work. I also appreciate the caution about the quantile estimates and Rubin's rules. Daniel, do you think that the standard deviation of MI data is appropriately estimated using Rubin's Rules?

          The motivation for my question here is this has come up not infrequently in my own work. I have a survey of quality of life among cognitively impaired nursing residents. Many have one or two of the questions missing, but only about 8% of the total information is. Having a better sense of SD would be useful to think about effect sizes. I had been estimating SD from the standard error of the MI-estimated mean (then multiplying by sqrt(N)), but the scores are skewed and have a probable ceiling effect, so I would like to check this another way.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Weiwen:
            Rubin's rule is the only correct approach for estimating standard deviation after MI, as it combines both within and between variance (for more details see: http://eu.wiley.com/WileyCDA/WileyTi...471655740.html).
            You may want to take a look at: https://s3.amazonaws.com/academia.ed...in_the_pre.pdf for a short but comprehensive coverage of this topic.
            As an aside, my personal experience with Daniel's -misum- statistics is fully satisfying.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Further clarification seems to be needed. Sorry for still not being clear enough.

              Rubin's rule is the only correct approach for estimating standard deviation after MI, as it combines both within and between variance
              But this is not what misum does and it is not what I think it should do. misum treats any statistic returned by summarize as a point estimate. The reported standard deviation is just the square root of the mean of M dataset specific variances. The issue of combining within- and between dataset variances is relevant when the variance of a point estimate is estimated. Since summarize is a descriptive command, it does not estimate the variance of the statistics, e.g.,the mean, the SD, etc. and neither does misum.

              Daniel, do you think that the standard deviation of MI data is appropriately estimated using Rubin's Rules?
              White et al. (2011: 389) suggest that some transformation may have to be applied when combining standard deviations. Whether this is the preferred way of obtaining effect sizes (that I am not a great fan of, personally), is not clear. The mibeta command (from http://www.stata.com/users/ymarchenko), for example, combines standardized regression coefficients (after applying an appropriate transformation). The standardization is based on the dataset specific standard deviations not a combined standard deviation over M datasets.

              Best
              Daniel


              White, I. R., Royston, P., Wood, A. M. 2011. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30(4), pp. 377-399.
              Last edited by daniel klein; 17 Nov 2017, 14:19.

              Comment


              • #8
                Daniel:
                thanks for your relevant clarifications.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo and Daniel, thanks for the input! Carlo, the link to the Clark and Altman article you posted doesn't seem to work. For those interested in reading, the link to the article on the Journal of Clinical Epidemiology is here, but you will need a library subscription.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment


                  • #10
                    Weiwen:
                    the link seems to work when clicked from Google Scholar results page.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X