Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heavily censored data.

    Hello all,

    I try to run a regression with Interest rate spread (imagine mortgage rate minus the treasure of comparable maturity) on a list of control variables. I have about 300,000 observations. The problem is that, for some legal reason, I only observe the rate spread if it is above certain threshold (3% in my case). This threshold is set so high that about 96% of my observations have censored rate spread (i.e., missing value). My question is, what is the best econometric method to choose? I tried tobit and heckman two stage. But I am concerned with the strong assumptions needed (particularly the normality assumption as interest rate spread is by no way normal). As it is already a rate measure, I don't think taking log is a good idea here.

    Another general concern is the heavy censor rate. Can I trust tobit (or any method) at all if over 96% of my observations are censored below?

    Thank you so much for the help.

    Best
    Hua

  • #2
    Dear Hua,

    I do not why you are doing this, but if it is at all possible, I would simply focus on the sub-population with spread above 3%. For this sub-population you have no censoring and it may be that your model reveals something interesting. Or maybe you also want to model the probability that the spread is above 3%.

    Trying to make inference about the population from the 4% of data in the upper tail does not look sensible/credible to me.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Joao,

      Thanks a lot for the response. Yes I did run logit and get expected result. I also run the sub-sample but find no significance on the interested parameter. When I run tobit or heckman two stage, I actually get highly significant result I want to see. But as you said, the reliability is a big concern.

      Cheers
      Hua

      Comment

      Working...
      X