Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transformation of variable to log in panel data

    I want to transform a variable in my panel data set to a log variable. The common thing to do is gen logvar = log(var). However, I am working with panel data and am not sure if this is the right command. Can anyone help me with this?

  • #2
    Yes, it works the same way in panel data. The log is the log. Of course, if your variable takes on zero or negative values then you can't do this (whether panel data or not). And whenever I see someone starting to log transform data, I always wonder why they are doing it. Sometimes there are good reasons, but there tends to be a lot of overuse of log transformation in contexts where either nothing is needed, or something else would be better. But again, there is nothing special about panel data in this connection.

    Comment


    • #3
      Thanks for your answer. I was wondering, I just found someone else doing this command: by id: gen logrisk = log(risk). What is the difference using this command in comparison with just a simple log transformation? It seems that stata is doing something separate on every id (in this case countries).

      Comment


      • #4
        The by: prefix makes no difference here. It's like taking logarithms on your calculator or laptop in your kitchen and your car. Same calculation. Where you do it is immaterial.

        Comment


        • #5
          I still do not understand why it is not a difference. When I run my xt regression with the log variable calculated
          by the command: by id: gen logvar = log(var)

          I get different results from the xt regression I run with the log variable calculated in the other way
          by the command: gen logvar = log(var)

          why do I get different results if it is not a difference?

          Comment


          • #6
            Fabian:
            I find difficult to follow your point.
            Perhaps posting what you typed and what Stata gave you back (as per FAQ) can make things easier.
            However, when in the following toy-example an independent variable is logged following both your approaches, Stata returns the same results (as expected):
            Code:
            . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
            (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
            
            . bysort idcode: gen ln_hours=ln(hours)
            (67 missing values generated)
            
            . gen ln_hours_2=ln(hours)
            (67 missing values generated)
            
            . su ln_hours ln_hours_2
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                ln_hours |     28,467    3.536863    .4218092          0   5.123964
              ln_hours_2 |     28,467    3.536863    .4218092          0   5.123964
            
            . xtreg ln_wage ln_hours
            
            Random-effects GLS regression                   Number of obs     =     28,467
            Group variable: idcode                          Number of groups  =      4,710
            
            R-sq:                                           Obs per group:
                 within  = 0.0002                                         min =          1
                 between = 0.0282                                         avg =        6.0
                 overall = 0.0060                                         max =         15
            
                                                            Wald chi2(1)      =      29.68
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                ln_hours |   .0302355   .0055498     5.45   0.000     .0193581    .0411129
                   _cons |   1.549805   .0204337    75.85   0.000     1.509756    1.589855
            -------------+----------------------------------------------------------------
                 sigma_u |  .37864723
                 sigma_e |  .32039218
                     rho |  .58276109   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            
            . xtreg ln_wage ln_hours_2
            
            Random-effects GLS regression                   Number of obs     =     28,467
            Group variable: idcode                          Number of groups  =      4,710
            
            R-sq:                                           Obs per group:
                 within  = 0.0002                                         min =          1
                 between = 0.0282                                         avg =        6.0
                 overall = 0.0060                                         max =         15
            
                                                            Wald chi2(1)      =      29.68
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
              ln_hours_2 |   .0302355   .0055498     5.45   0.000     .0193581    .0411129
                   _cons |   1.549805   .0204337    75.85   0.000     1.509756    1.589855
            -------------+----------------------------------------------------------------
                 sigma_u |  .37864723
                 sigma_e |  .32039218
                     rho |  .58276109   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            
            .
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Fabian: I think you need to show us evidence for your claim. I am confident something else is responsible for whatever difference you observe.

              Comment


              • #8
                I already solved the problem. I am not sure what the problem was, but I get the same results now. Thank you all for your help.

                Comment


                • #9
                  hello , I want to know why we should transform the variables into log before starting the estimation ? when we must transform them ? please does anyone have any answer ?

                  Comment


                  • #10
                    dada gh:
                    please note the strong preference on this forum for real full names (as per FAQ);
                    please start a new thread;
                    please note that your questions are widely covered by any decent econometrics textbook.
                    Thanks.
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Dear Carlo Lazzaro, Nick Cox, Clyde Schechter

                      Hi, I am New to this site as well as new to Stata. I am doing my Mater degree research by using gravity model on Impact of infrastructure investment on Trade, as an empirical investigation for Sri Lanka. I am Using 30 years data of ten major Exporters of Sri Lanka and I am using panel data.

                      Sri Lanka`s Export values to those countries are the dependent variable of my model and GDP (of Sri lanaka and trade Partner countries), Capital stock data (of Sri lanaka and trade Partner countries), and distance between two capital cities are the independent variables.

                      my regression model is as follow.

                      log (X 1j,t ) = α + β1 log(Y 1,t ) + β2 log(Y j,t ) + β3 log(GG 1,t ) + β4 log(GG j,t ) + β5 (D1j ) + U1jt


                      Where X 1j,t are exports from country 1 (Sri Lanka) to country j (trading partner) at time t, Y 1,t and Y j,t are the GDPs of country 1 (Sri Lanka)and j, (trading partner) respectively, at time t, GG 1j,t are General Government capital stock of country 1 (Sri Lanka) and j, (trading partner) respectively, at time t and D1j is the distance between the capital cities of the two countries

                      my problems:

                      1. How can I incorporate distance data to my main data set. ( I have already combined GDP, Capital stock Data and Export values in Stata format and ran basic commands and got summary of my data other than distance data)

                      2. What kind of variables should i create to get output for the above regression?

                      Therefore it is grateful and much appreciated if you could instruct me how can I run my regression and get output with distance data as well please.

                      kind regards

                      Kuloja

                      Comment

                      Working...
                      X