Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log Transformation instead of Fractional Regression

    Hi,

    In my unbalanced panel dataset, the dependent variable is a proportion (it lies between 0 and 1 by definition) with no values at 0 or 1. The purpose of my analysis is to ascertain relationship between independent variables and dependent variable i.e. I am concerned with sign and significance of coefficients. I was earlier advised to use fractional regression. Now, I have been suggested to take log of my dependent variable which converts all fractional values into negative values ranging from minus infinity to 0.

    Is it fine to use a log transformation of the dependent variable in place of fractional regression? If not, what are the implications?

    Thanks and Regards
    Prateek Bedi



  • #2
    The problems with first log transforming the dependent variables are:
    • You are no longer modeling the mean proportion, but the mean log(proportion).
    • There is no guarantee that the predictions from your model will respect the upper bound of 1.
    So, I don't find that "solution" convincing. Depending on the exact circumstances of your data and model may not be completely horrible, but why settle for second best, if you can easily do better?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Prateek: I fully concur with Maarten's recommendation. I would simply add that if feel you must transform your measure, a logit or probit transformation would seem much more natural than a log transformation.

      Comment


      • #4
        Firstly, thanks a lot Maarten Buis and John Mullahy. I have two follow-up queries:

        1. I agree that I would be modelling mean log(proportion). However, since I am interested in studying the relationship between dependent and independent variables, would it make any difference to the direction and significance of my coefficients?
        2. I understand that some of the predictions of the model may not respect the upper bound of 1. I would like to know if we should really bother about this issue considering that our purpose is not forecasting.

        Thanks and Regards
        Prateek Bedi

        Comment


        • #5
          Prateek: I don't have a good reply to your #2.

          For your #1, it is possible for the sign of an estimated coefficient or marginal effect to be different in models for transformed and untransformed outcomes. Significance (e.g. the width of a confidence interval or a p-value) will almost certainly differ in the two cases; by how much it is impossible to say ex ante.

          Comment


          • #6
            Alright John Sir. Really appreciate your response. Thanks a lot!

            Comment


            • #7
              Unfortunately, there is no xtfracreg or mefracreg command. I have asked for one.

              If you are going to do a transformation, I agree with John that using logit or probit seems best.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                To answer #2: you mis-specified the functional form of that relationship when that happens. How bad the mis-specification is, is an empirical question.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  In addition to excellent points already made, let's underline that a log transformation will not work if you have any observed zeros.

                  Comment


                  • #10
                    The standard reference for fractional logit (at least in economics) is the Papke and Wooldridge paper. Leslie Papke has the Stata do files in her web page at Michigan State, so you can download these if you need to implement the procedure. However, from the paper, you will note that if your interest is simply on marginal effects, there are no material differences between fractional logit and a linear estimator such as fixed effects which models the outcome as continuous.

                    Comment


                    • #11
                      FWIW:

                      The logit (Berkson's name) as a link function (in later terminology) for binary responses came long after the use of logistic curves as a model for bounded responses. (historically, population sizes).

                      Wedderburn in 1974 https://www.jstor.org/stable/2334725...o_tab_contents deserves more citation than it seems to get from people in some disciplines. Biometrika isn't exactly a marginal journal!

                      Comment


                      • #12
                        Hello All,

                        Adding to this discussion:

                        I had posted this long time ago https://www.statalist.org/forums/for...ced-panel-data


                        What to do when subsequent number of observations of dependent variable are zeros and it is an unbalanced panel? The Papke and Wooldridge (2008) is for balanced panel.

                        Can I use the fracreg command and add cross-section and time dummies in the model to control for the unobserved heterogeneity?

                        thanks,

                        Sagnik

                        Comment


                        • #13
                          xtgee would seem to be an answer for panel data.

                          Comment


                          • #14
                            Papke and Wooldridge have written about fractional regression methods for panel data: "Panel data methods for fractional response variables with an application to test
                            pass rates", Journal of Econometrics, 145 (2008) 121–133. [This builds on the classic Papke-Wooldridge paper that Andrew Musau cites in #8.] Although framed for a the balanced panel set-up (as #12 points out), I conjecture their proposed methods for panel data would work well as long as the data were not too unbalanced.

                            Comment


                            • #15
                              At first, thanks a lot everyone for providing such helpful and insightful guidance. Based on the discussion above, I have a few more queries.

                              1. After a logit transformation of the dependent variable (which originally lies between 0 and 1 with no values equal to 0 or 1), can I directly employ FE/Dynamic Panel Data Estimations (since I have severe endogeneity issues in my model) and interpret the results as usual? I would also like to know if the sign and significance of coefficients shall remain same for the original dependent variable and the transformed one.
                              2. If a logit transformation works well in case of fractional dependent variable, what is the additional advantage of going for fractional regression?
                              3. Since I have severe endogeneity issues in my model (in the form of omitted variable bias and simultaneity) along with fractional nature of the dependent variable, which estimation methodology is advisable keeping in view its operationalisation in STATA?
                              4. Please let me know if there's a good text/source for conceptual understanding of issues like dynamic panel data estimation and fractional regression.

                              Hope to get some more useful inputs...

                              Thanks and Regards
                              Prateek Bedi





                              Comment

                              Working...
                              X