Hello everyone!
I am a graduate student using interval regressions for the first time. The data I am using is from the Behavior Risk Factor Surveillance System (collected by the CDC). My dependent variable is income, which is collected from respondents in ordinal categories (i.e., it is censored from collection). In following the Stata help file, I have created two dependent variables for my interval regressions, Depvar(1) equal to the lower bound for each level of the original income variable and Depvar(2) equal to the upper bound for each level of the original income variable. Here is an example:
Original level 3 of the income variable "_incomg1" is collected as "$25,000 to < $35000." For level 3, I have set Depvar(1) equal to 25,000 and Depvar(2) equal to 34,999.
I have two questions for the group that I would greatly appreciate help with:
1. Coming directly from the survey language, the highest income category is top coded (i.e., "$200,000 or more"). I am curious how to decide on the value I should set for the Depvar(2) for this top coded category since one is not originally provided. Please find the original variable levels below:
1: Less than $15,000
2: $15,000 to < $25,000
3: $25,000 to < $35,000
4: $35,000 to < $50,000
5: $50,000 to < $100,000
6: $100,000 to < $200,000
7: $200,000 or more
9: Don’t know/Not sure/Missing
2. Given the benefits of using log transformed income as opposed to income directly as a dependent variable, I would prefer to use ln(income) for my project. Given the ordinal categories, and the structure of dependent variables for interval regressions, I am wondering how to do this properly. Is it as simple as generating a new Depvar(1) and (2) equal to the natural log of the original Depvar(1) and (2), as I would if it were continuous? ex. gen logDepvar1 = ln(Depvar1)
---
Using the system "auto" dataset as an example, I have recoded the price variable into ordinal categories, with the top variable as "$12,000 and more." I have roughly sorted this variable into equal categories, as found below:
sysuse auto
recode price (min/3999 = 1) (4000/4399 = 2) (4400/4899 = 3) (4900/5799 = 4) (5800/8999 = 5) (9000/11999 = 6) (12000/max = 7), into(pricecats)
recode price (min/3999 = 0) (4000/4399 = 4000) (4400/4899 = 4400) (4900/5799 = 4900) (5800/8999 = 5800) (9000/11999 = 9000) (12000/max = 12000), into(lowprice)
recode price (min/3999 = 3999) (4000/4399 = 4399) (4400/4899 = 4899) (4900/5799 = 5799) (5800/8999 = 8999) (9000/11999 = 11999) (12000/max = ?), into(highprice)
My questions, then are
1. If I wanted to use this ordinal price variable for interval regressions, and I didn't have the original price values, how could I top code level 7 for Depvar(2)? (Bolded in above code)
2. If I wanted to set lowprice (i.e., Depvar1) and highprice (i.e., Depvar2) to the log(price), how would I go about doing this?
---
I hope I have provided all the information needed to help me with these questions. I appreciate any and all help you all can provide to me.
Thanks,
Hannah
I am a graduate student using interval regressions for the first time. The data I am using is from the Behavior Risk Factor Surveillance System (collected by the CDC). My dependent variable is income, which is collected from respondents in ordinal categories (i.e., it is censored from collection). In following the Stata help file, I have created two dependent variables for my interval regressions, Depvar(1) equal to the lower bound for each level of the original income variable and Depvar(2) equal to the upper bound for each level of the original income variable. Here is an example:
Original level 3 of the income variable "_incomg1" is collected as "$25,000 to < $35000." For level 3, I have set Depvar(1) equal to 25,000 and Depvar(2) equal to 34,999.
I have two questions for the group that I would greatly appreciate help with:
1. Coming directly from the survey language, the highest income category is top coded (i.e., "$200,000 or more"). I am curious how to decide on the value I should set for the Depvar(2) for this top coded category since one is not originally provided. Please find the original variable levels below:
1: Less than $15,000
2: $15,000 to < $25,000
3: $25,000 to < $35,000
4: $35,000 to < $50,000
5: $50,000 to < $100,000
6: $100,000 to < $200,000
7: $200,000 or more
9: Don’t know/Not sure/Missing
2. Given the benefits of using log transformed income as opposed to income directly as a dependent variable, I would prefer to use ln(income) for my project. Given the ordinal categories, and the structure of dependent variables for interval regressions, I am wondering how to do this properly. Is it as simple as generating a new Depvar(1) and (2) equal to the natural log of the original Depvar(1) and (2), as I would if it were continuous? ex. gen logDepvar1 = ln(Depvar1)
---
Using the system "auto" dataset as an example, I have recoded the price variable into ordinal categories, with the top variable as "$12,000 and more." I have roughly sorted this variable into equal categories, as found below:
sysuse auto
recode price (min/3999 = 1) (4000/4399 = 2) (4400/4899 = 3) (4900/5799 = 4) (5800/8999 = 5) (9000/11999 = 6) (12000/max = 7), into(pricecats)
recode price (min/3999 = 0) (4000/4399 = 4000) (4400/4899 = 4400) (4900/5799 = 4900) (5800/8999 = 5800) (9000/11999 = 9000) (12000/max = 12000), into(lowprice)
recode price (min/3999 = 3999) (4000/4399 = 4399) (4400/4899 = 4899) (4900/5799 = 5799) (5800/8999 = 8999) (9000/11999 = 11999) (12000/max = ?), into(highprice)
My questions, then are
1. If I wanted to use this ordinal price variable for interval regressions, and I didn't have the original price values, how could I top code level 7 for Depvar(2)? (Bolded in above code)
2. If I wanted to set lowprice (i.e., Depvar1) and highprice (i.e., Depvar2) to the log(price), how would I go about doing this?
---
I hope I have provided all the information needed to help me with these questions. I appreciate any and all help you all can provide to me.
Thanks,
Hannah
Comment