Estimation of censored value

Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#1

Estimation of censored value

12 Feb 2019, 14:53

Dear All,

Situation: I have some biological variables containing left censored observation (below the detection limit).
I would like to use the gsem function to estimate these censored values.
Please just one specification that I use repeated measures on the same individual
STATA 14.0

Question: someone could please explain me the gsem function and how can I run it to calculate the censored value.
I find an example at this page but I don't really understand how would this command help me to estimate censored observation: https://www.stata.com/features/overv...ored-outcomes/

Thank you for your help

Best
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

12 Feb 2019, 18:14

Radhouene,

If you are talking about just one outcome at a time, and you're trying to estimate the relationship of one variable to some independent variables, then you can just use the xttobit command. You can run an equivalent command in gsem, but you don't need to. If you want to simultaneously do this for multiple outcomes, then you use gsem.

Can you show us in xttobit syntax what you are trying to do?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

13 Feb 2019, 08:29

I found this thread you posted earlier.

So, you have calorie intake as a dependent variable. I'm not sure if calorie intake is a censored variable. If you have only one dependent variable that's censored, then you can use tobit for cross sectional regression, and xttobit for repeated measures.

If you have multiple dependent variables, some of which are censored, then I suppose you can learn gsem. It's not required, and the syntax is more complex. However, the Stata page you linked to does give you two equivalent commands:

Code:

tobit income education age, ul(150000) gsem income <- education age, family(gaussian, rcensored(150000))

Again, these commands will do the same thing. In the options for gsem, the code says that the outcome family is gaussian, and that it's right censored (same as upper limit). Because the syntax for gsem is more complex, I would just use xttobit.

It's not clear from your other post, but it sounds like you may have some independent variables that are censored. Tobit regression is only useful (as far as I know) when your dependent variable is censored. If your independent variables are censored, then I don't really see an alternative. You could just use them as normal, but note that there are detection limits involved in the methods. How acceptable this will depend on the context.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#4

13 Feb 2019, 15:31

Originally posted by Weiwen Ng View Post

I found this thread you posted earlier.

So, you have calorie intake as a dependent variable. I'm not sure if calorie intake is a censored variable. If you have only one dependent variable that's censored, then you can use tobit for cross sectional regression, and xttobit for repeated measures.

If you have multiple dependent variables, some of which are censored, then I suppose you can learn gsem. It's not required, and the syntax is more complex. However, the Stata page you linked to does give you two equivalent commands:

Code:

tobit income education age, ul(150000) gsem income <- education age, family(gaussian, rcensored(150000))

Again, these commands will do the same thing. In the options for gsem, the code says that the outcome family is gaussian, and that it's right censored (same as upper limit). Because the syntax for gsem is more complex, I would just use xttobit.

It's not clear from your other post, but it sounds like you may have some independent variables that are censored. Tobit regression is only useful (as far as I know) when your dependent variable is censored. If your independent variables are censored, then I don't really see an alternative. You could just use them as normal, but note that there are detection limits involved in the methods. How acceptable this will depend on the context.

Dear Weiwen,

First of All I would like to thank for your efforts to help me for resolving this question of research

Below I will detail my the situation:

1] sample: n=60
Dependent variables: CRP, IL-6, IL-1 (biological marker: basically these are continuous variables)
independant variables: age, body mass index...
Protocol: repeated measure of the biomarkers at three different time intervals among the same subjects and each time point correspond to a phase of intervention
time coded: 1, 2 and 3

2] par example for for IL-1 and IL-6: I have some left censored observations (below the detection limit)

How can I estimate these censored values by using the Tobit regression? Please, I would like to know the specific command to use in my case.

3] In general, the use of tobit regression is possible if censored data is representing less than 20% of the total value? if the frequency is high let's say above 50% what is the best way to impute the censored observations ?

Best regards.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

13 Feb 2019, 16:36

I don't mean to pick on your English, but I want to clear something up just to be sure we are talking about the same thing. You are looking to estimate the relationship between 3 independent variables (2 of which have left censoring) and some independent variables. Correct?

Plenty of people would probably run 3 separate regressions. It could look like this (with fictitious lower limits):

Code:

xtset id time xtreg crp i.time bmi age xttobit il1 i.time bmi age, ll(0.5) xttobit il6 i.time bmi age, ll(1.2)

You can simultaneously estimate 3 regressions in gsem if you wish:

Code:

local indepvars bmi age gsem (crp i.time `indepvars' M1[id], gaussian) /// (il1 i.time `indepvars' M2[id], gaussian lcensored(0.5) /// (il6 i.time `indepvars' M3[id], gaussian lcensored(1.2), covstructure(_LEx, unstructured)

Syntax not tested, but you need to type the random effects in each equation (e.g. M1[id]), and I think that substantively, you'd want to let the random effects be correlated (I believe the default is that they're uncorrelated). I think that they're latent exogenous variables (hence _LEx in the covstructure option). This is why I said you should avoid the gsem syntax unless you need it and you understand what you're doing. It's a bit complicated. The lcensored option tells the command that the outcome variable is left censored. The link you found earlier indicates that it's equivalent to the same sytax in tobit.

I'm not familiar with how many percent of observations can be censored before Tobit regression becomes unadvisable. However, if over 50% of observations are censored, I think that would be very bad. If one of three variables has a censoring proportion of 50%, then you could present it and warn readers, who can then make their own judgments, but I doubt the estimates would be very precise.

Last, you mentioned imputation. I don't think that Tobit regression really imputes the censored dependent variables. And I don't think there is any good way to proceed if you have 50% of data missing or censored.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#6

14 Feb 2019, 12:27

Originally posted by Weiwen Ng View Post

I don't mean to pick on your English, but I want to clear something up just to be sure we are talking about the same thing. You are looking to estimate the relationship between 3 independent variables (2 of which have left censoring) and some independent variables. Correct?

Plenty of people would probably run 3 separate regressions. It could look like this (with fictitious lower limits):

Code:

xtset id time xtreg crp i.time bmi age xttobit il1 i.time bmi age, ll(0.5) xttobit il6 i.time bmi age, ll(1.2)

You can simultaneously estimate 3 regressions in gsem if you wish:

Code:

local indepvars bmi age gsem (crp i.time `indepvars' M1[id], gaussian) /// (il1 i.time `indepvars' M2[id], gaussian lcensored(0.5) /// (il6 i.time `indepvars' M3[id], gaussian lcensored(1.2), covstructure(_LEx, unstructured)

Syntax not tested, but you need to type the random effects in each equation (e.g. M1[id]), and I think that substantively, you'd want to let the random effects be correlated (I believe the default is that they're uncorrelated). I think that they're latent exogenous variables (hence _LEx in the covstructure option). This is why I said you should avoid the gsem syntax unless you need it and you understand what you're doing. It's a bit complicated. The lcensored option tells the command that the outcome variable is left censored. The link you found earlier indicates that it's equivalent to the same sytax in tobit.

I'm not familiar with how many percent of observations can be censored before Tobit regression becomes unadvisable. However, if over 50% of observations are censored, I think that would be very bad. If one of three variables has a censoring proportion of 50%, then you could present it and warn readers, who can then make their own judgments, but I doubt the estimates would be very precise.

Last, you mentioned imputation. I don't think that Tobit regression really imputes the censored dependent variables. And I don't think there is any good way to proceed if you have 50% of data missing or censored.

Thank you again Weimen for taking the necessary time for responding.

You have responded to the entire questions.

Best regards.
Comment

Announcement

Estimation of censored value

Comment

Comment

Comment

Comment

Comment