Transformation of variable to log in panel data

Fabian Rohrbach

Join Date: Oct 2016

Posts: 6
#1

Transformation of variable to log in panel data

29 Oct 2016, 17:49

I want to transform a variable in my panel data set to a log variable. The common thing to do is gen logvar = log(var). However, I am working with panel data and am not sure if this is the right command. Can anyone help me with this?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

29 Oct 2016, 18:44

Yes, it works the same way in panel data. The log is the log. Of course, if your variable takes on zero or negative values then you can't do this (whether panel data or not). And whenever I see someone starting to log transform data, I always wonder why they are doing it. Sometimes there are good reasons, but there tends to be a lot of overuse of log transformation in contexts where either nothing is needed, or something else would be better. But again, there is nothing special about panel data in this connection.
1 like
Comment
Fabian Rohrbach

Join Date: Oct 2016

Posts: 6
#3

30 Oct 2016, 01:35

Thanks for your answer. I was wondering, I just found someone else doing this command: by id: gen logrisk = log(risk). What is the difference using this command in comparison with just a simple log transformation? It seems that stata is doing something separate on every id (in this case countries).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35646
#4

30 Oct 2016, 03:04

The by: prefix makes no difference here. It's like taking logarithms on your calculator or laptop in your kitchen and your car. Same calculation. Where you do it is immaterial.
1 like
Comment
Fabian Rohrbach

Join Date: Oct 2016

Posts: 6
#5

30 Oct 2016, 03:24

I still do not understand why it is not a difference. When I run my xt regression with the log variable calculated
by the command: by id: gen logvar = log(var)

I get different results from the xt regression I run with the log variable calculated in the other way
by the command: gen logvar = log(var)

why do I get different results if it is not a difference?
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

30 Oct 2016, 03:44

Fabian:
I find difficult to follow your point.
Perhaps posting what you typed and what Stata gave you back (as per FAQ) can make things easier.
However, when in the following toy-example an independent variable is logged following both your approaches, Stata returns the same results (as expected):

Code:

. use "http://www.stata-press.com/data/r14/nlswork.dta", clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. bysort idcode: gen ln_hours=ln(hours)
(67 missing values generated)

. gen ln_hours_2=ln(hours)
(67 missing values generated)

. su ln_hours ln_hours_2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    ln_hours |     28,467    3.536863    .4218092          0   5.123964
  ln_hours_2 |     28,467    3.536863    .4218092          0   5.123964

. xtreg ln_wage ln_hours

Random-effects GLS regression                   Number of obs     =     28,467
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.0002                                         min =          1
     between = 0.0282                                         avg =        6.0
     overall = 0.0060                                         max =         15

                                                Wald chi2(1)      =      29.68
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    ln_hours |   .0302355   .0055498     5.45   0.000     .0193581    .0411129
       _cons |   1.549805   .0204337    75.85   0.000     1.509756    1.589855
-------------+----------------------------------------------------------------
     sigma_u |  .37864723
     sigma_e |  .32039218
         rho |  .58276109   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage ln_hours_2

Random-effects GLS regression                   Number of obs     =     28,467
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.0002                                         min =          1
     between = 0.0282                                         avg =        6.0
     overall = 0.0060                                         max =         15

                                                Wald chi2(1)      =      29.68
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  ln_hours_2 |   .0302355   .0055498     5.45   0.000     .0193581    .0411129
       _cons |   1.549805   .0204337    75.85   0.000     1.509756    1.589855
-------------+----------------------------------------------------------------
     sigma_u |  .37864723
     sigma_e |  .32039218
         rho |  .58276109   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35646
#7

30 Oct 2016, 04:33

Fabian: I think you need to show us evidence for your claim. I am confident something else is responsible for whatever difference you observe.
1 like
Comment
Fabian Rohrbach

Join Date: Oct 2016

Posts: 6
#8

30 Oct 2016, 14:16

I already solved the problem. I am not sure what the problem was, but I get the same results now. Thank you all for your help.
Comment
dada gh

Join Date: Jan 2017

Posts: 2
#9

02 Jan 2017, 12:40

hello , I want to know why we should transform the variables into log before starting the estimation ? when we must transform them ? please does anyone have any answer ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#10

02 Jan 2017, 23:51

dada gh:
please note the strong preference on this forum for real full names (as per FAQ);
please start a new thread;
please note that your questions are widely covered by any decent econometrics textbook.
Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kuloja Peiris

Join Date: Aug 2018

Posts: 8
#11

10 Aug 2018, 22:17

Dear Carlo Lazzaro, Nick Cox, Clyde Schechter

Hi, I am New to this site as well as new to Stata. I am doing my Mater degree research by using gravity model on Impact of infrastructure investment on Trade, as an empirical investigation for Sri Lanka. I am Using 30 years data of ten major Exporters of Sri Lanka and I am using panel data.

Sri Lanka`s Export values to those countries are the dependent variable of my model and GDP (of Sri lanaka and trade Partner countries), Capital stock data (of Sri lanaka and trade Partner countries), and distance between two capital cities are the independent variables.

my regression model is as follow.

log (X _1j,t ) = α + β₁ log(Y _1,t ) + β₂ log(Y _j,t ) + β₃ log(GG _1,t ) + β₄ log(GG _j,t ) + β₅(D_1j ) + U_1jt

Where X _1j,t are exports from country 1 (Sri Lanka) to country j (trading partner) at time t, Y _1,t and Y _j,t are the GDPs of country 1 (Sri Lanka)and j, (trading partner) respectively, at time t, GG _1j,t are General Government capital stock of country 1 (Sri Lanka) and j, (trading partner) respectively, at time t and D_1j is the distance between the capital cities of the two countries

my problems:

1. How can I incorporate distance data to my main data set. ( I have already combined GDP, Capital stock Data and Export values in Stata format and ran basic commands and got summary of my data other than distance data)

2. What kind of variables should i create to get output for the above regression?

Therefore it is grateful and much appreciated if you could instruct me how can I run my regression and get output with distance data as well please.

kind regards

Kuloja
Comment

Announcement

Transformation of variable to log in panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment