I'd like to come up with a way of generating random numbers between 0.84 and 1 that follow the right tail of a random normal distribution PDF with a mean and standard deviation that I provide.
Some background in case it helps. I'm working with survey data in Stata 15. There is an income variable, income, that is top-coded at a fixed nominal level of $100,000. Records below this are coded as-is; records above this are all coded at $100,000. The survey has a probability weight, weight. I've created a dummy variable tc equal to 1 if a record is top-coded. About 16% of records are so top-coded.
I would like to test the implications of imputing income for top-coded records under different distributional assumptions. The first imputation I would like to test is under the assumption that income is log-normal distributed. I've created a lincome variable for this purpose. Assume that the mean of lincome is 10.5 and the standard deviation is 1.5.
To do the imputation, I'd need a strategy for randomly generating a normal variable given a mean of 10.5 and a standard deviation of 1.5, but only at the values between the 84-100% percentiles in the normal PDF.
Here's code that generates a synthetic version of my data.
Some background in case it helps. I'm working with survey data in Stata 15. There is an income variable, income, that is top-coded at a fixed nominal level of $100,000. Records below this are coded as-is; records above this are all coded at $100,000. The survey has a probability weight, weight. I've created a dummy variable tc equal to 1 if a record is top-coded. About 16% of records are so top-coded.
I would like to test the implications of imputing income for top-coded records under different distributional assumptions. The first imputation I would like to test is under the assumption that income is log-normal distributed. I've created a lincome variable for this purpose. Assume that the mean of lincome is 10.5 and the standard deviation is 1.5.
To do the imputation, I'd need a strategy for randomly generating a normal variable given a mean of 10.5 and a standard deviation of 1.5, but only at the values between the 84-100% percentiles in the normal PDF.
Here's code that generates a synthetic version of my data.
Code:
clear all set seed 10016 set obs 1000 gen lincome = rnormal(10, 1.5) gen income = exp(lincome) gen weight = 100 gen tc = income > 100000 replace income = 100000 if income > 100000 replace lincome = ln(100000) if lincome > ln(100000)
Comment