Is the normalized dependent variable suitable for fractional model or tobit model?

Johnny Cash

Join Date: May 2020

Posts: 19
#1

Is the normalized dependent variable suitable for fractional model or tobit model?

26 Nov 2020, 22:57

The range of my dependent variable Y is [0,inf). Since it is left censored to zero, the tobit model should be used. However the value of Y is huge, I have to remove the scale using normalization, which is Y_new=(Y-Y_min)/(Y_max-Y_min). After the transformation, the range of Y_new is [0,1]. The dataset is panel.

Under this circumstance, which model should be used for regression, fractional model or tobit model?

Note:fracreg -- Fractional response regression
Tags: panel data, regression, tobit
Maarten Buis

Join Date: Mar 2014

Posts: 3256
#2

27 Nov 2020, 01:38

To answer your question: No, because the max in your original variable is not bounded.

I don't think that your transformation is the best way forward for you. The reason is that it is highly dependent on a single value, the maximum, and that value can change wildly from sample to sample. This would make your results basically incomparable with other results from other datasets. If possible, a better solution is to just change the unit. Say your original y is measured in euros, you can create a new variable measured in 1000s of euros by dividing by 1000 (or millions of euros by dividing by 1,000,000, etc.). That way your results are exactly comparable with results from other datasets, and the effects you find have an intuitive meaning. The range could be large because of a very long tail, in my suggestion does not solve that. Instead you could take the logarithm, or better yet, you can use a log link function: https://blog.stata.com/2011/08/22/us...tell-a-friend/

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 33605
#3

27 Nov 2020, 01:49

I agree with Maarten Buis and was pleased to find that my mental draft of a reply was made largely unnecessary by his post.

There's more.

"Since it is left censored at zero" is not explained here. It's not the language I would use, for example, if negative outcomes are impossible.

If outcomes are zero or positive but not otherwise bounded, Y = Xb is rarely a good functional form any way but Y = exp(Xb) is often a much better idea, as Maarten also implies. In many cases Tobin [sic] has been superseded by Poisson (for all that Poisson gets retrospective credit for a procedure he never discussed or invented);
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#4

27 Nov 2020, 06:43

Originally posted by Maarten Buis View Post

To answer your question: No, because the max in your original variable is not bounded.

I don't think that your transformation is the best way forward for you. The reason is that it is highly dependent on a single value, the maximum, and that value can change wildly from sample to sample. This would make your results basically incomparable with other results from other datasets. If possible, a better solution is to just change the unit. Say your original y is measured in euros, you can create a new variable measured in 1000s of euros by dividing by 1000 (or millions of euros by dividing by 1,000,000, etc.). That way your results are exactly comparable with results from other datasets, and the effects you find have an intuitive meaning. The range could be large because of a very long tail, in my suggestion does not solve that. Instead you could take the logarithm, or better yet, you can use a log link function: https://blog.stata.com/2011/08/22/us...tell-a-friend/

Thank u very much Maarten. I guess I will choose to divide Y by 10^n, so I cound calculate the sum of cross-section Y by t as another variable I need. If the logY is summed, it will be meaningless because it means the log of the product of Yi for each t.
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#5

27 Nov 2020, 06:54

Originally posted by Nick Cox View Post

I agree with Maarten Buis and was pleased to find that my mental draft of a reply was made largely unnecessary by his post.

There's more.

"Since it is left censored at zero" is not explained here. It's not the language I would use, for example, if negative outcomes are impossible.

If outcomes are zero or positive but not otherwise bounded, Y = Xb is rarely a good functional form any way but Y = exp(Xb) is often a much better idea, as Maarten also implies. In many cases Tobin [sic] has been superseded by Poisson (for all that Poisson gets retrospective credit for a procedure he never discussed or invented);

Thanks u vey much Nick.

In my dataset, negative Y means nothing in economics. Y could be negative, but treating negative Y as zero is a common way in practice. What I'm still confused why Y = exp(Xb) model is recommended. However I thought the tobit model is suitable, if I do use Y divided by 10^n as the dependent variavle.

Please tell me if I'm wrong. Thanks.

Last edited by Johnny Cash; 27 Nov 2020, 07:48.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3256
#6

27 Nov 2020, 08:28

A tobit model is very specific. Just saying that negative values are meaningless is not a sufficient reason for using tobit. If I had to name a default model for such a case I would say that it would be the log link function (Y = exp(Xb) model). The reasons for that are discussed in the link I posted in #2 (and the references therein).

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 33605
#7

27 Nov 2020, 08:41

negative Y means nothing in economics

Goodness knows what this means. Negative profits means a lot to a firm and everyone who deals with it or works for it. Expenditure exceeding income means a lot to anyone concerned.

What counts is what your outcome variable is -- and whether negative values are impossible or possible -- and if possible what is done about them.

That's a concrete question and not answering it makes good discussion of your situation even harder.

https://en.wikipedia.org/wiki/Johnny_Cash If that's your real name, fine. If not, please note our request to use your full real name.
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#8

28 Nov 2020, 02:26

Originally posted by Nick Cox View Post

Goodness knows what this means. Negative profits means a lot to a firm and everyone who deals with it or works for it. Expenditure exceeding income means a lot to anyone concerned.

What counts is what your outcome variable is -- and whether negative values are impossible or possible -- and if possible what is done about them.

That's a concrete question and not answering it makes good discussion of your situation even harder.

https://en.wikipedia.org/wiki/Johnny_Cash If that's your real name, fine. If not, please note our request to use your full real name.

It's really sorry for that I made this name for myself since I don't have an English name as a Chinese student. And definitely, I don't mean to violate the rules at all.

It is a valuable opportunity for me to ask for advice, especially meeting experts like u.

To answer the question, the dependent variavle Y is the capital shortfall, which is calculated by Y=CS=max(0,k*A–W). Denote k*A is a capital required by the regulatory agency, and W is the firm capital. If k*A–W is negative, it means the firm capital is larger than the capital required, i.e., there is no capital shortfall for the firm, the firm funtions properly. So we treat CS as zero, since there is capital surplus for the firm. If k*A–W is positive, it means the firm capital is less than the capital required. The absolute value of CS is the shortfall quantity.

I hope the explanation is helpful for u to make a conclusion. Thanks a lot.
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#9

28 Nov 2020, 02:33

Originally posted by Maarten Buis View Post

A tobit model is very specific. Just saying that negative values are meaningless is not a sufficient reason for using tobit. If I had to name a default model for such a case I would say that it would be the log link function (Y = exp(Xb) model). The reasons for that are discussed in the link I posted in #2 (and the references therein).

Now I've made a further explanation about the dependent variavle Y posted in #8. If possible, any suggestion is more than welcome. Thanks a lot.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 33605
#10

28 Nov 2020, 04:25

Thanks for the explanation. I would still gravitate to Poisson regression not Tobit if this were my problem.

I can't imagine any objection to Chinese names here, whether transliterated or not.
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#11

28 Nov 2020, 06:12

Originally posted by Nick Cox View Post

Thanks for the explanation. I would still gravitate to Poisson regression not Tobit if this were my problem.

I can't imagine any objection to Chinese names here, whether transliterated or not.

Thanks for your advice Nick.

When I learned Poisson regression these days, the negative binomial regression often appears with it. After using kdensity to check the distribution of Y, I found the distribution is right skewed, which is prominent in the graph.

The negative binomial regression seems to be suitable for this kind of dependent variable. Nevertheless, although the Poisson regression assumes E(Y)=Var(Y), the robust standard error correction could eliminate the influence when the hypothesis is violated, which is an advantage of Poisson regression.

So comparing negative binomial regression with Poisson regression, since negative binomial regression applies to overdispersion situation, and Poisson regression with robust standard error applies to more general cases, can I say Poisson regression is more suitable than negative binomial regression? Thanks.
Attached Files

Last edited by Johnny Cash; 28 Nov 2020, 06:20.
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#12

29 Nov 2020, 02:24

I checked the poisson distribution and found when lamda is huge, which is equal to the mean of Y, the density graph looks very similar to normal distribution, which is definitely not what the distribution of Y looks like. However, when I generated a negative binomial sequence by "g gamma=rgamma(1,Ymean)" and "g nbtest=rpoisson( gamma)", the distribution of "nbtest" is very similar to the distribution of Y.

Is this the case where I should use negative binomial regression rather than Poisson regression?

Last edited by Johnny Cash; 29 Nov 2020, 02:27.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 33605
#13

29 Nov 2020, 03:45

Sorry, but I find it hard to add much of substance to previous answers.

0. We can't tell what will work well for your dataset without any access to it.

1. The marginal distribution of the outcome is always something you should look at, but it doesn't determine what kind of regression makes most sense. Even the distribution of the response conditional on the predictors can be of less importance in choosing a regression methid than getting the functional form right.

2. What is true for counted variables isn't always true or even comparable with what is true for measured variables. Capital shortfall as defined in #8 doesn't look to me like a variable for which the negative binomial is obviously relevant.
Comment
Johnny Cash

Join Date: May 2020

Posts: 19
#14

30 Nov 2020, 22:12

Thanks for ur help. It means a lot to me.
Comment

Announcement

Is the normalized dependent variable suitable for fractional model or tobit model?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment