Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random normal variable between two PDF p-values

    I'd like to come up with a way of generating random numbers between 0.84 and 1 that follow the right tail of a random normal distribution PDF with a mean and standard deviation that I provide.

    Some background in case it helps. I'm working with survey data in Stata 15. There is an income variable, income, that is top-coded at a fixed nominal level of $100,000. Records below this are coded as-is; records above this are all coded at $100,000. The survey has a probability weight, weight. I've created a dummy variable tc equal to 1 if a record is top-coded. About 16% of records are so top-coded.

    I would like to test the implications of imputing income for top-coded records under different distributional assumptions. The first imputation I would like to test is under the assumption that income is log-normal distributed. I've created a lincome variable for this purpose. Assume that the mean of lincome is 10.5 and the standard deviation is 1.5.

    To do the imputation, I'd need a strategy for randomly generating a normal variable given a mean of 10.5 and a standard deviation of 1.5, but only at the values between the 84-100% percentiles in the normal PDF.

    Here's code that generates a synthetic version of my data.

    Code:
    clear all
    set seed 10016
    set obs 1000
    gen lincome = rnormal(10, 1.5)
    gen income = exp(lincome)
    gen weight = 100
    
    gen tc = income > 100000
    
    replace income = 100000 if income > 100000
    replace lincome = ln(100000) if lincome > ln(100000)
    Last edited by Ernesto Vincenti; 05 Jul 2019, 15:04.

  • #2
    Here's one approach: Create a column vector of such values, save it as a variable, and use the values as needed.

    Code:
    mat newinc = J(`=_N', 1, -1)  // -1 is out of range
    forval i = 1/`=_N' {
       while newinc[`i', 1] < 0.84 {
            mat newinc[`i',1] = rnormal(0,1) 
       }
    }
    svmat newinc  
    replace lincome = 10.5 + 1.5 * newinc1 if !tc
    drop newinc

    Comment


    • #3
      Thanks Mike Lacy. This is very close. The rnormal call should be wrapped in a normal function, and lincome should be replaced if tc, not if !tc. Also, to ensure that the imputed values stayed above the top code, I changed up the final replace call so that it added the marginal standard deviation to the top-code limit.

      Code:
      mat newinc = J(`=_N', 1, -1)  // -1 is out of range
      forval i = 1/`=_N' {
         while newinc[`i', 1] < 0.84 {
              mat newinc[`i',1] = normal(rnormal(0,1))
         }
      }
      svmat newinc  
      replace lincome = lincome + (invnormal(newinc1)-invnormal(0.84))*1.5  if tc

      Comment

      Working...
      X