Interval Regression Dependent Variables- 2 Questions

Hannah Jacobs

Join Date: Nov 2023

Posts: 3
#1

Interval Regression Dependent Variables- 2 Questions

30 Nov 2023, 08:29

Hello everyone!

I am a graduate student using interval regressions for the first time. The data I am using is from the Behavior Risk Factor Surveillance System (collected by the CDC). My dependent variable is income, which is collected from respondents in ordinal categories (i.e., it is censored from collection). In following the Stata help file, I have created two dependent variables for my interval regressions, Depvar(1) equal to the lower bound for each level of the original income variable and Depvar(2) equal to the upper bound for each level of the original income variable. Here is an example:

Original level 3 of the income variable "_incomg1" is collected as "$25,000 to < $35000." For level 3, I have set Depvar(1) equal to 25,000 and Depvar(2) equal to 34,999.

I have two questions for the group that I would greatly appreciate help with:

1. Coming directly from the survey language, the highest income category is top coded (i.e., "$200,000 or more"). I am curious how to decide on the value I should set for the Depvar(2) for this top coded category since one is not originally provided. Please find the original variable levels below:

1: Less than $15,000
2: $15,000 to < $25,000
3: $25,000 to < $35,000
4: $35,000 to < $50,000
5: $50,000 to < $100,000
6: $100,000 to < $200,000
7: $200,000 or more
9: Don’t know/Not sure/Missing

2. Given the benefits of using log transformed income as opposed to income directly as a dependent variable, I would prefer to use ln(income) for my project. Given the ordinal categories, and the structure of dependent variables for interval regressions, I am wondering how to do this properly. Is it as simple as generating a new Depvar(1) and (2) equal to the natural log of the original Depvar(1) and (2), as I would if it were continuous? ex. gen logDepvar1 = ln(Depvar1)

---

Using the system "auto" dataset as an example, I have recoded the price variable into ordinal categories, with the top variable as "$12,000 and more." I have roughly sorted this variable into equal categories, as found below:

sysuse auto
recode price (min/3999 = 1) (4000/4399 = 2) (4400/4899 = 3) (4900/5799 = 4) (5800/8999 = 5) (9000/11999 = 6) (12000/max = 7), into(pricecats)
recode price (min/3999 = 0) (4000/4399 = 4000) (4400/4899 = 4400) (4900/5799 = 4900) (5800/8999 = 5800) (9000/11999 = 9000) (12000/max = 12000), into(lowprice)
recode price (min/3999 = 3999) (4000/4399 = 4399) (4400/4899 = 4899) (4900/5799 = 5799) (5800/8999 = 8999) (9000/11999 = 11999) (12000/max = ?), into(highprice)

My questions, then are

1. If I wanted to use this ordinal price variable for interval regressions, and I didn't have the original price values, how could I top code level 7 for Depvar(2)? (Bolded in above code)

2. If I wanted to set lowprice (i.e., Depvar1) and highprice (i.e., Depvar2) to the log(price), how would I go about doing this?

---

I hope I have provided all the information needed to help me with these questions. I appreciate any and all help you all can provide to me.

Thanks,

Hannah
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

30 Nov 2023, 11:34

I'm not sure why you are recoding your variable at all - why not use -intereg- and treat the highest category as right-censored and all other categories as interval censored

not sure what you mean by "benefits of using log transformed income" but yes you could log the category boundaries if you insisted - I am not at all sure how the censoring and the log-transform would work together, however
Comment
Hannah Jacobs

Join Date: Nov 2023

Posts: 3
#3

30 Nov 2023, 12:11

Hi Rich,

Thanks for the response. From what I understand, the interval regression syntax requires recoding the dependent variable, as its form is:

intreg depvar1 depvar2 indepvars if in weight , options

Can you please elaborate on how I would just treat the highest category as right-censored and all the others as interval censored? If there is a simpler way to do this (without recoding), I am all for it!

As for the log of income, as I understand it, it is beneficial over using a regular income variable for a couple reasons-- the log of income is usually more normally distributed and it makes interpretation slightly easier, as the numbers are in a generally smaller range (and then you can always convert back to dollars if needed). I am just trying to figure out if it's possible to log income with the ordinal nature of the variable and with the two dependent variable setup of intreg.

Thanks again,

Hannah
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#4

30 Nov 2023, 12:25

re: your first question, see the example at the bottom of the help file

forgot to respond to #2 - normality of the data is not important (certainly as compared with conditional normality of the residuals and that, in my opinion, is not all that important either) and my clients, at least, do not know how to interpret logs (and, after I explain, they quickly forget anyway) - but maybe yours do

Last edited by Rich Goldstein; 30 Nov 2023, 12:29.
Comment
Hannah Jacobs

Join Date: Nov 2023

Posts: 3
#5

30 Nov 2023, 13:15

Thanks!
Comment

Announcement

Interval Regression Dependent Variables- 2 Questions

Comment

Comment

Comment

Comment