MI estimate

Ilias Geo

Join Date: Nov 2017

Posts: 24
#1

MI estimate

17 Nov 2017, 00:47

Dear all,

I have the following survey data, where Loan-to-Value (LTV) refers to the LTV ratio of each household in the dataset. I want to find the median value of LTV for each year. For example, in 2011 get a single value for LTV based on the number of observations falling into 2011. The same for each year. My very inefficient way of going about it is:

mi estimate, vceok: svy: medianize LTV if (year==2010)

medianize is an ado file.

My question is whether the above command will give me the correct results.

Year Loan-to-value
2010 78.85555

2011

2011 75

2011 70.10143

2011

2011

2012

2012 34.29438

2012

2012

2012

2012

2012

2012

2012 83.78378

2013

2013

2013

2013 97.29729

Thanks
Ilias
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

17 Nov 2017, 06:08

Welcome to Statalist, Ilias.

It's worth noting that searching the internet for "medianize" leads to references for the Household Finance and Consumption Survey (HFCS), for which "medianize" was apparently written to assist Stata users in analyzing. It's also worth noting that vceok is an undocumented option of the mi estimate command.

From one of the links below, "Using the HFCS is not trivial, due to the large size of the files, the use of multiple imputation (MI), and bootstrap replicate weights."

I expect that the answer to your question depends more on understanding the design of the HFCS than on understanding Stata syntax.

https://groups.google.com/forum/#!forum/hfcs-users
https://groups.google.com/forum/#!to...rs/gTfF6ikyKr4
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17741

17 Nov 2017, 06:29

Ilias:
welcome to this forum.
setting aside for a while the -svy- structure of your dataset (actually, you do not provide any detail about HFCS to feed the -mi svy- prefix, as William wisely noted), the MI of your data example can be performed as follows:

Code:

. input Year Loan_to_value
          Year  Loan_to~e
  1. 2010 78.85555
  2. 2011  .
  3.
. 2011 75
  4.
. 2011 70.10143
  5.
. 2011  .
  6.
. 2011  .
  7.
. 2012  .
  8.
. 2012 34.29438
  9.
. 2012  .
 10.
. 2012  .
 11.
. 2012  .
 12.
. 2012  .
 13.
. 2012  .
 14.
. 2012  .
 15.
. 2012 83.78378
 16.
. 2013  .
 17.
. 2013  .
 18.
. 2013  .
 19.
. 2013 97.29729
 20.
. end

. mi set flong

. set seed 12345


. mi register impute Loan_to_value
(13 m=0 obs. now marked as incomplete)

. mi regress Loan_to_value= Year, add(20)
subcommand mi regress is unrecognized
r(199);

. mi impute regress Loan_to_value= Year, add(20)

Univariate imputation                       Imputations =       20
Linear regression                                 added =       20
Imputed: m=1 through m=20                       updated =        0

------------------------------------------------------------------
                   |               Observations per m            
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
     Loan_to_value |          6           13        13 |        19
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

. misum Loan_to_value, d

m=1/20 data

    Variable |      Mean         SD        min        max          N
-------------+-------------------------------------------------------
Loan_to_va~e |  72.62088   31.44061   8.367192   128.2861         19

. return list

scalars:
  r(Loan_to_value_p99) =  128.2861404418945
  r(Loan_to_value_p95) =  128.2861404418945
  r(Loan_to_value_p90) =  111.1736999511719
  r(Loan_to_value_p75) =  91.79726181030273
  r(Loan_to_value_p25) =  53.17518768310547
  r(Loan_to_value_p10) =  31.43407810926437
   r(Loan_to_value_p5) =  8.367191761732101
   r(Loan_to_value_p1) =  8.367191761732101
  r(Loan_to_value_kur
    tosis)             =  3.249060808000262
  r(Loan_to_value_ske
    wness)             =  -.2647977815045786
  r(Loan_to_value_p50) =  74.88788089752197
  r(Loan_to_value_sum) =  1379.796798998118
  r(Loan_to_value_sum
    _w)                =  19
    r(Loan_to_value_N) =  19
  r(Loan_to_value_max) =  128.2861404418945
  r(Loan_to_value_min) =  8.367191761732101
  r(Loan_to_value_Var) =  988.5122034019607
   r(Loan_to_value_sd) =  31.44061391579307
  r(Loan_to_value_mea
    n)                 =  72.62088415779564

.

Credits for the useful user-written command -misum- go to Daniel Klein (-search misum-).

Kind regards,
Carlo
(Stata 19.0)

Comment

daniel klein

Join Date: Mar 2014

Posts: 3887
#4

17 Nov 2017, 07:05

A word of caution: misum blindly combines estimates according to Rubin's rules. Usually, results for the median are reliable. An alternative, though slower, approach is

Code:

mi estimate : qreg Loan_to_vale

Best
Daniel
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

17 Nov 2017, 10:19

Thanks to both of you for pointing this command out. I wasn't aware of misum, and I bet it would be useful in my own work. I also appreciate the caution about the quantile estimates and Rubin's rules. Daniel, do you think that the standard deviation of MI data is appropriately estimated using Rubin's Rules?

The motivation for my question here is this has come up not infrequently in my own work. I have a survey of quality of life among cognitively impaired nursing residents. Many have one or two of the questions missing, but only about 8% of the total information is. Having a better sense of SD would be useful to think about effect sizes. I had been estimating SD from the standard error of the MI-estimated mean (then multiplying by sqrt(N)), but the scores are skewed and have a probable ceiling effect, so I would like to check this another way.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17741
#6

17 Nov 2017, 11:49

Weiwen:
Rubin's rule is the only correct approach for estimating standard deviation after MI, as it combines both within and between variance (for more details see: http://eu.wiley.com/WileyCDA/WileyTi...471655740.html).
You may want to take a look at: https://s3.amazonaws.com/academia.ed...in_the_pre.pdf for a short but comprehensive coverage of this topic.
As an aside, my personal experience with Daniel's -misum- statistics is fully satisfying.

Kind regards,
Carlo
(Stata 19.0)
Comment
daniel klein

Join Date: Mar 2014

Posts: 3887
#7

17 Nov 2017, 14:16

Further clarification seems to be needed. Sorry for still not being clear enough.

Rubin's rule is the only correct approach for estimating standard deviation after MI, as it combines both within and between variance

But this is not what misum does and it is not what I think it should do. misum treats any statistic returned by summarize as a point estimate. The reported standard deviation is just the square root of the mean of M dataset specific variances. The issue of combining within- and between dataset variances is relevant when the variance of a point estimate is estimated. Since summarize is a descriptive command, it does not estimate the variance of the statistics, e.g.,the mean, the SD, etc. and neither does misum.

Daniel, do you think that the standard deviation of MI data is appropriately estimated using Rubin's Rules?

White et al. (2011: 389) suggest that some transformation may have to be applied when combining standard deviations. Whether this is the preferred way of obtaining effect sizes (that I am not a great fan of, personally), is not clear. The mibeta command (from http://www.stata.com/users/ymarchenko), for example, combines standardized regression coefficients (after applying an appropriate transformation). The standardization is based on the dataset specific standard deviations not a combined standard deviation over M datasets.

Best
Daniel

White, I. R., Royston, P., Wood, A. M. 2011. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30(4), pp. 377-399.

Last edited by daniel klein; 17 Nov 2017, 14:19.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17741
#8

18 Nov 2017, 07:11

Daniel:
thanks for your relevant clarifications.

Kind regards,
Carlo
(Stata 19.0)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

20 Nov 2017, 07:25

Carlo and Daniel, thanks for the input! Carlo, the link to the Clark and Altman article you posted doesn't seem to work. For those interested in reading, the link to the article on the Journal of Clinical Epidemiology is here, but you will need a library subscription.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17741
#10

20 Nov 2017, 08:55

Weiwen:
the link seems to work when clicked from Google Scholar results page.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment