Log Transformation instead of Fractional Regression

Prateek Bedi

Join Date: Sep 2018

Posts: 199
#1

Log Transformation instead of Fractional Regression

21 Sep 2018, 07:09

Hi,

In my unbalanced panel dataset, the dependent variable is a proportion (it lies between 0 and 1 by definition) with no values at 0 or 1. The purpose of my analysis is to ascertain relationship between independent variables and dependent variable i.e. I am concerned with sign and significance of coefficients. I was earlier advised to use fractional regression. Now, I have been suggested to take log of my dependent variable which converts all fractional values into negative values ranging from minus infinity to 0.

Is it fine to use a log transformation of the dependent variable in place of fractional regression? If not, what are the implications?

Thanks and Regards
Prateek Bedi
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#2

21 Sep 2018, 08:54

The problems with first log transforming the dependent variables are:
You are no longer modeling the mean proportion, but the mean log(proportion).

There is no guarantee that the predictions from your model will respect the upper bound of 1.

So, I don't find that "solution" convincing. Depending on the exact circumstances of your data and model may not be completely horrible, but why settle for second best, if you can easily do better?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#3

22 Sep 2018, 07:21

Prateek: I fully concur with Maarten's recommendation. I would simply add that if feel you must transform your measure, a logit or probit transformation would seem much more natural than a log transformation.
3 likes
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#4

23 Sep 2018, 09:10

Firstly, thanks a lot Maarten Buis and John Mullahy. I have two follow-up queries:

1. I agree that I would be modelling mean log(proportion). However, since I am interested in studying the relationship between dependent and independent variables, would it make any difference to the direction and significance of my coefficients?
2. I understand that some of the predictions of the model may not respect the upper bound of 1. I would like to know if we should really bother about this issue considering that our purpose is not forecasting.

Thanks and Regards
Prateek Bedi
Comment
John Mullahy

Join Date: Dec 2016

Posts: 755
#5

23 Sep 2018, 12:02

Prateek: I don't have a good reply to your #2.

For your #1, it is possible for the sign of an estimated coefficient or marginal effect to be different in models for transformed and untransformed outcomes. Significance (e.g. the width of a confidence interval or a p-value) will almost certainly differ in the two cases; by how much it is impossible to say ex ante.
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#6

23 Sep 2018, 12:14

Alright John Sir. Really appreciate your response. Thanks a lot!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#7

23 Sep 2018, 15:02

Unfortunately, there is no xtfracreg or mefracreg command. I have asked for one.

If you are going to do a transformation, I agree with John that using logit or probit seems best.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#8

24 Sep 2018, 02:42

To answer #2: you mis-specified the functional form of that relationship when that happens. How bad the mis-specification is, is an empirical question.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35807
#9

24 Sep 2018, 03:25

In addition to excellent points already made, let's underline that a log transformation will not work if you have any observed zeros.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10296
#10

24 Sep 2018, 05:43

The standard reference for fractional logit (at least in economics) is the Papke and Wooldridge paper. Leslie Papke has the Stata do files in her web page at Michigan State, so you can download these if you need to implement the procedure. However, from the paper, you will note that if your interest is simply on marginal effects, there are no material differences between fractional logit and a linear estimator such as fixed effects which models the outcome as continuous.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35807
#11

24 Sep 2018, 05:56

FWIW:

The logit (Berkson's name) as a link function (in later terminology) for binary responses came long after the use of logistic curves as a model for bounded responses. (historically, population sizes).

Wedderburn in 1974 https://www.jstor.org/stable/2334725...o_tab_contents deserves more citation than it seems to get from people in some disciplines. Biometrika isn't exactly a marginal journal!
1 like
Comment
Sagnik Bagchi

Join Date: Mar 2015

Posts: 26
#12

24 Sep 2018, 06:02

Hello All,

Adding to this discussion:

I had posted this long time ago https://www.statalist.org/forums/for...ced-panel-data

What to do when subsequent number of observations of dependent variable are zeros and it is an unbalanced panel? The Papke and Wooldridge (2008) is for balanced panel.

Can I use the fracreg command and add cross-section and time dummies in the model to control for the unobserved heterogeneity?

thanks,

Sagnik
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35807
#13

24 Sep 2018, 08:27

xtgee would seem to be an answer for panel data.
1 like
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#14

24 Sep 2018, 08:47

Papke and Wooldridge have written about fractional regression methods for panel data: "Panel data methods for fractional response variables with an application to test
pass rates", Journal of Econometrics, 145 (2008) 121–133. [This builds on the classic Papke-Wooldridge paper that Andrew Musau cites in #8.] Although framed for a the balanced panel set-up (as #12 points out), I conjecture their proposed methods for panel data would work well as long as the data were not too unbalanced.
2 likes
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#15

26 Sep 2018, 04:30

At first, thanks a lot everyone for providing such helpful and insightful guidance. Based on the discussion above, I have a few more queries.

1. After a logit transformation of the dependent variable (which originally lies between 0 and 1 with no values equal to 0 or 1), can I directly employ FE/Dynamic Panel Data Estimations (since I have severe endogeneity issues in my model) and interpret the results as usual? I would also like to know if the sign and significance of coefficients shall remain same for the original dependent variable and the transformed one.
2. If a logit transformation works well in case of fractional dependent variable, what is the additional advantage of going for fractional regression?
3. Since I have severe endogeneity issues in my model (in the form of omitted variable bias and simultaneity) along with fractional nature of the dependent variable, which estimation methodology is advisable keeping in view its operationalisation in STATA?
4. Please let me know if there's a good text/source for conceptual understanding of issues like dynamic panel data estimation and fractional regression.

Hope to get some more useful inputs...

Thanks and Regards
Prateek Bedi
1 like
Comment

Announcement

Log Transformation instead of Fractional Regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment