Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interval Regression Dependent Variables- 2 Questions

    Hello everyone!

    I am a graduate student using interval regressions for the first time. The data I am using is from the Behavior Risk Factor Surveillance System (collected by the CDC). My dependent variable is income, which is collected from respondents in ordinal categories (i.e., it is censored from collection). In following the Stata help file, I have created two dependent variables for my interval regressions, Depvar(1) equal to the lower bound for each level of the original income variable and Depvar(2) equal to the upper bound for each level of the original income variable. Here is an example:

    Original level 3 of the income variable "_incomg1" is collected as "$25,000 to < $35000." For level 3, I have set Depvar(1) equal to 25,000 and Depvar(2) equal to 34,999.

    I have two questions for the group that I would greatly appreciate help with:

    1. Coming directly from the survey language, the highest income category is top coded (i.e., "$200,000 or more"). I am curious how to decide on the value I should set for the Depvar(2) for this top coded category since one is not originally provided. Please find the original variable levels below:

    1: Less than $15,000
    2: $15,000 to < $25,000
    3: $25,000 to < $35,000
    4: $35,000 to < $50,000
    5: $50,000 to < $100,000
    6: $100,000 to < $200,000
    7: $200,000 or more
    9: Don’t know/Not sure/Missing

    2. Given the benefits of using log transformed income as opposed to income directly as a dependent variable, I would prefer to use ln(income) for my project. Given the ordinal categories, and the structure of dependent variables for interval regressions, I am wondering how to do this properly. Is it as simple as generating a new Depvar(1) and (2) equal to the natural log of the original Depvar(1) and (2), as I would if it were continuous? ex. gen logDepvar1 = ln(Depvar1)

    ---

    Using the system "auto" dataset as an example, I have recoded the price variable into ordinal categories, with the top variable as "$12,000 and more." I have roughly sorted this variable into equal categories, as found below:

    sysuse auto
    recode price (min/3999 = 1) (4000/4399 = 2) (4400/4899 = 3) (4900/5799 = 4) (5800/8999 = 5) (9000/11999 = 6) (12000/max = 7), into(pricecats)
    recode price (min/3999 = 0) (4000/4399 = 4000) (4400/4899 = 4400) (4900/5799 = 4900) (5800/8999 = 5800) (9000/11999 = 9000) (12000/max = 12000), into(lowprice)
    recode price (min/3999 = 3999) (4000/4399 = 4399) (4400/4899 = 4899) (4900/5799 = 5799) (5800/8999 = 8999) (9000/11999 = 11999) (12000/max = ?), into(highprice)

    My questions, then are

    1. If I wanted to use this ordinal price variable for interval regressions, and I didn't have the original price values, how could I top code level 7 for Depvar(2)? (Bolded in above code)

    2. If I wanted to set lowprice (i.e., Depvar1) and highprice (i.e., Depvar2) to the log(price), how would I go about doing this?

    ---

    I hope I have provided all the information needed to help me with these questions. I appreciate any and all help you all can provide to me.

    Thanks,

    Hannah

  • #2
    I'm not sure why you are recoding your variable at all - why not use -intereg- and treat the highest category as right-censored and all other categories as interval censored

    not sure what you mean by "benefits of using log transformed income" but yes you could log the category boundaries if you insisted - I am not at all sure how the censoring and the log-transform would work together, however

    Comment


    • #3
      Hi Rich,

      Thanks for the response. From what I understand, the interval regression syntax requires recoding the dependent variable, as its form is:

      intreg depvar1 depvar2 indepvars if in weight , options

      Can you please elaborate on how I would just treat the highest category as right-censored and all the others as interval censored? If there is a simpler way to do this (without recoding), I am all for it!

      As for the log of income, as I understand it, it is beneficial over using a regular income variable for a couple reasons-- the log of income is usually more normally distributed and it makes interpretation slightly easier, as the numbers are in a generally smaller range (and then you can always convert back to dollars if needed). I am just trying to figure out if it's possible to log income with the ordinal nature of the variable and with the two dependent variable setup of intreg.

      Thanks again,

      Hannah

      Comment


      • #4
        re: your first question, see the example at the bottom of the help file

        forgot to respond to #2 - normality of the data is not important (certainly as compared with conditional normality of the residuals and that, in my opinion, is not all that important either) and my clients, at least, do not know how to interpret logs (and, after I explain, they quickly forget anyway) - but maybe yours do
        Last edited by Rich Goldstein; 30 Nov 2023, 12:29.

        Comment


        • #5
          Thanks!

          Comment

          Working...
          X