Help with Regressions of Panel data

Tobias Berger

Join Date: Oct 2022
Posts: 9

Help with Regressions of Panel data

07 Oct 2022, 06:14

Hello together,

I have an urgent question regarding a Stata coding case which I unfortunately could not find in the forum.
I am referring to the data example (extracted with dataex) below.

I am trying to analyse the impact of the below independent variables lProfMar (logProfitMargin) lICturn (logIntangibleCapitalturnover) and lICrat (logIntangibleCapitalratio) on lROA (logROA), which is the dependent variable.

The basic format would be y (lROA) = x1 + x2 + x3
But I have the data for all stock listed American companies over the years 1964 - 2020 with the respective industry codes (gsubind2).
This would mean I am dealing with Panel data, right?

Therefore, I would like to do the following:

1. Run a regression in order to observe first the general impact of the independent variable logICratio (which models the impact of intangible assets on the Return on Assets) across all the industries and years.

2. Secondly, I would like to dive deeper, in order to observe the effect of intangible assets (logICratio) on ROA, for each industry (I have 9 in total, below is only a sample for the SIC code 10) over the decades (e.g. 1964 - 1970, 1971 - 1980, 1981 - 1990, 1991 - 2000, 2001 - 2010, 2011 - 2020).
Meaning, I would like to have for each of the industries 6 regression results, which would enable me to observe the effect over time. For example: In the Industrials industry the effect from 1964 - 1970 is... from 1971 - 1980 is... etc.

For both cases, I would like to include:
- control variables for leverage, (company) size and profit margin
- control for year, industry and year/industry fixed effects
- include standard errors (robust / clustered)

I would also be interested wich tests I would need to include, so that my coefficients I observe with the regressions are valid and significant?

Any help is highly appreciated and please comment if more details are needed.
I am really stuck with coding those regression outputs.

Many thanks in advance.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str6 gvkeystr2 double year str4 curcd float(lROA lProfMar lICturn lICrat lFL lROEmv gsubind2)
"008265" 1965 "USD"  .5196433  1.550513  7.581828   .5976427  6.245504  1.9713898 10
"009867" 1965 "USD" 1.6634578  2.469071 4.4409566   3.963771  5.385618  1.6788192 10
"004351" 1965 "USD"  1.349132 1.7737175  6.297383  2.4883714  5.533849  2.0117655 10
"010001" 1965 "USD"  2.084888 2.1455934  5.174583   3.975052  5.056631  1.9285623 10
"008151" 1965 "USD" 1.7967305   2.63645  4.606376   3.764245  5.403117  1.9227883 10
"001848" 1965 "USD"  1.895812 2.2640448  6.976745  1.8653626  4.849475  2.0608273 10
"011038" 1965 "USD" 1.9138404 2.2680407  6.095886  2.7602546   5.08366  2.1514924 10
"015077" 1965 "USD" 1.2764703 2.2174072   6.25346   2.015943  5.678953   2.115544 10
"004430" 1965 "USD" 1.6241326  2.472753  5.130704  3.2310154  5.449169   2.005333 10
"004073" 1965 "USD" 1.9588038 1.6547745  5.734749    3.77962   5.36468  2.0576415 10
"001678" 1965 "USD" 1.0763617 1.6607312   6.06867  2.5573006  5.844563  1.9356595 10
"002991" 1965 "USD" 2.2398005  2.773686  4.494319   4.182136  4.844166   1.873247 10
"010862" 1965 "USD" 1.3640597 2.3594964 4.6221743  3.5927296  5.521954  1.6343343 10
"006310" 1965 "USD" 1.5878657  2.528244  4.957952  3.3120096  5.612018  2.0345106 10
"003930" 1965 "USD" 1.9455434  2.059591  5.924173  3.1721196  5.113824  2.1200345 10
"007152" 1965 "USD"  2.441854 2.3710392 4.7220263  4.5591288  5.103181  1.9954125 10
"003067" 1965 "USD"  1.816471  2.121225  6.784823  2.1207635  5.107203  2.1831853 10
"003130" 1965 "USD" 1.7599165 3.0512965   3.77133    4.14763  5.965089   1.876967 10
"004503" 1965 "USD" 2.0696442  2.200358  4.941744  4.1378827   5.01432  1.7886598 10
"010482" 1965 "USD"  2.477945  2.824148  4.030914   4.833224  4.870504   1.768618 10
"003067" 1966 "USD"  2.108986 2.3925076  6.337811   2.589007  5.057682  2.3660953 10
"008151" 1966 "USD" 1.7150937  2.617751  5.189945   3.117738  5.419086  2.1169581 10
"001678" 1966 "USD"  .9942523 1.7147983  8.644677 -.10765802  5.927368   2.232343 10
"010482" 1966 "USD"  2.412013 2.7747076 4.4532466  4.3943987  4.959701  1.9925215 10
"002410" 1966 "USD" 2.0655658  2.623508  7.248991  1.4034072  5.082231  2.4626174 10
"004351" 1966 "USD"  1.394485 1.8065077  7.757982  1.0403355  5.538279  2.2581105 10
"010001" 1966 "USD" 2.1959083 2.1742039  5.300657   3.931388  4.994503   2.024241 10
"007475" 1966 "USD" 1.8658162 1.9136842  6.203505   2.958968  5.014277   2.016918 10
"006403" 1966 "USD"  1.942149  2.059661 4.5959444  4.4968843  5.291059  1.6014403 10
"005439" 1966 "USD" 2.2788556  1.644978   6.34365  3.5005686  5.126142  2.3565078 10
"004430" 1966 "USD"   1.59205  2.450299  6.249856   2.102235  5.433359  2.2485101 10
"004073" 1966 "USD" 2.1251755 1.7523392  6.071679   3.511498  5.297934  2.3052883 10
"008853" 1966 "USD"  2.488409 1.9708694  5.408367   4.319514  5.011383  2.1393812 10
"011506" 1966 "USD"  .8011896  2.430097  5.222341   2.359092  6.911114  2.3836055 10
"009772" 1966 "USD" 2.2473223 2.0658972  4.888558   4.503207  5.122468   1.842396 10
"009653" 1966 "USD"  2.129312 2.2134924  5.066084  4.0600758  5.074555   1.942688 10
"008549" 1966 "USD" 1.7356625 2.1964486  6.376786  2.3727682  5.264188  2.2062593 10
"007152" 1966 "USD"  2.450477  2.474588 4.2366166   4.949613  5.106915  1.7490262 10
"001609" 1966 "USD"  1.894249 2.2456503  6.196913   2.662026  4.916108  2.0157099 10
"006819" 1966 "USD"  3.483641  3.898861 2.1273975   6.667723  4.807897  1.3223288 10
"003130" 1966 "USD" 1.7418855 2.2968254  5.128397   3.527004  5.897388  2.2282472 10
"011038" 1966 "USD"   2.01334 2.3441427   6.49139  2.3881476  5.068228     2.3083 10
"009878" 1966 "USD"  2.830249 3.7882795  2.907467   5.344843  4.731468   1.739467 10
"010503" 1966 "USD" 2.0290208  3.329094  4.356546   3.553721   5.59444  2.3557496 10
"002991" 1966 "USD" 2.1458993  2.699328  5.300189   3.356723  4.895236  2.1113396 10
"007017" 1966 "USD" 2.0665941 2.4041915  5.160166  3.7125766  5.047699  2.0158994 10
"007882" 1966 "USD" 1.8856913  4.101265  1.636634   5.638435 4.7260547  .57834214 10
"008974" 1966 "USD"  1.637855 2.2603848   5.09086  3.4969506    6.2505   2.286101 10
"003930" 1966 "USD"  1.930132 2.0980442  6.318486   2.723943  5.190114  2.2732756 10
"007276" 1966 "USD" 2.4633584  2.947414 4.0511537  4.6751313   5.28034  2.0049617 10
"010156" 1966 "USD"  1.963616 2.2620535  6.026539   2.885364  5.018394  2.1319504 10
"009465" 1966 "USD" 2.0246305 2.1046956  7.012237  2.1180387  4.936938  2.2199886 10
"005012" 1966 "USD" 2.0104487 2.0794415  4.930026  4.2113214  5.358421  1.8745003 10
"009867" 1966 "USD" 1.6171207  2.430122  5.961566  2.4357734  5.378591   2.169327 10
"008068" 1966 "USD" 1.6301076 1.2388914  5.268762  4.3327947  5.828477  1.5562615 10
"002067" 1966 "USD"  .6539264 2.1202636 4.1696086   3.574395  5.019521   .6365784 10
"010862" 1966 "USD" 1.4268663 2.3556259 4.4903164   3.791265   5.55829   1.614766 10
"005187" 1966 "USD" 1.8813317  2.353684  4.494313   4.243675  5.583386  1.8112316 10
"004503" 1966 "USD" 2.0637228 2.1914792  5.658694    3.42389  5.038464   2.080169 10
"005667" 1966 "USD" 1.4726313  2.466436    5.8844  2.3321354  5.888969   2.400308 10
"006310" 1966 "USD" 1.7049178  2.591575  5.712029  2.6116536  5.606579   2.390936 10
"007017" 1967 "USD"  2.022548  2.362567  4.736841  4.1334805  5.094829  1.8100348 10
"010862" 1967 "USD" 1.5026466 2.3531435 4.5222497   3.837594  5.556502  1.6646914 10
"007882" 1967 "USD"  2.548498  3.327665  1.636634   6.868293  5.024827  .05845372 10
"008068" 1967 "USD" 1.7658633 1.7077713 4.2272806   5.041152  5.249301  1.0299217 10
"003930" 1967 "USD"  1.931445 2.1512759  6.004253   2.986257   5.23858  2.2476737 10
"007620" 1967 "USD" 1.1141868 1.4257585  7.106747  1.7920214  5.316026  1.5578226 10
"004351" 1967 "USD"  1.391462  1.823098  7.065873   1.712831  5.554659  2.2070367 10
"010482" 1967 "USD" 2.2599998  2.689897  4.408921  4.3715224  5.069791  1.9039836 10
"008974" 1967 "USD"  1.593837 2.1702335  4.922835   3.711109  5.771074  1.9205186 10
"003420" 1967 "USD"  2.061482 3.9446754  2.456654   4.870492   5.17167   1.434817 10
"005439" 1967 "USD" 2.2404225  1.688551  5.530979  4.2312326  5.133477  1.9954216 10
"002991" 1967 "USD"  2.072094  2.548395    5.6606   3.073439   4.90805  2.1182957 10
"004503" 1967 "USD"  2.092912 2.2288344  5.650302   3.424116  5.079903  2.1381311 10
"010565" 1967 "USD" 2.1497772  2.580431 4.7035666  4.0761204  5.272179  2.0522823 10
"003130" 1967 "USD" 1.7614495 2.2301075 4.5857816  4.1559005  5.738585  1.8023727 10
"009465" 1967 "USD"  2.103703 2.1450882  5.006168   4.162787  4.875771  1.7514894 10
"001678" 1967 "USD" 1.3887788  1.900434  4.555962  4.1427236  5.562011  1.3745133 10
"002067" 1967 "USD" -.1823216 1.2039728  1.914266    5.90978  4.929632 -1.6660073 10
"006403" 1967 "USD" 1.6607544 2.0139108   4.20309  4.6540937   5.37381  1.2434888 10
"007938" 1967 "USD"  1.641437 1.8242042  6.667015  2.3605583  5.150666    2.01901 10
"004430" 1967 "USD"  1.543364  2.443657  6.312604   1.997444  5.437443  2.2191405 10
"007276" 1967 "USD" 2.2603252 3.1099246   3.47086   4.889881  5.029291  1.5753503 10
"004073" 1967 "USD" 1.8998277  1.745585   7.10787  2.2567136  5.143138  2.2525432 10
"001537" 1967 "USD" 2.2350664  2.026553  6.000526   3.418327  5.172868  2.3720183 10
"008853" 1967 "USD" 2.3754075  1.954475  5.167621   4.463652  5.062393  1.9692234 10
"011506" 1967 "USD" 1.4458642 2.8143156  4.177734   3.664155  6.574507   2.081156 10
"009653" 1967 "USD" 2.1195774  2.226658  5.049951   4.053309  5.108707  1.9538376 10
"001976" 1967 "USD"  2.444437 2.1441348   5.14889  4.3617525  4.838223  1.9817382 10
"007008" 1967 "USD" 1.1143705  1.849222  5.852809    2.62268  5.870417    1.98212 10
"006310" 1967 "USD" 1.5941818  2.630798  4.789202   3.384522  5.653182  2.0316436 10
"005667" 1967 "USD"  1.704374  2.768089  4.695209   3.451417  5.570193  2.0415192 10
"006819" 1967 "USD"   3.48749 3.8873906 2.2540417   6.556398  4.785724  1.4241613 10
"010156" 1967 "USD" 1.9158365  2.243663  5.295091  3.5874224   5.04796  1.9082427 10
"001788" 1967 "USD" 2.3272777  2.495014  2.601183   6.441422  4.899799   .3788298 10
"010503" 1967 "USD"  1.866553  3.306182  3.443982    4.32673  5.691399  1.7763008 10
"009878" 1967 "USD" 2.7553964  3.634508 2.7834485    5.54778  4.689013  1.5065703 10
"015077" 1967 "USD"  1.350164 2.0528038  9.257823 -.10765802    5.6591    2.28169 10
"005581" 1967 "USD"  1.545457 2.0613651   5.32023   3.374202  5.558843   1.935049 10
"005187" 1967 "USD" 1.9996516  2.477883 3.2769146   5.455194  5.475481   .9849555 10
end

Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

07 Oct 2022, 11:07

Tobias:
welcome to this forum.
1) You're probably after a fixed effect panel data regression:

Code:

.
. destring gvkeystr2, g( gvkeystr2_num)
. xtset gvkeystr2_num year

Panel variable: gvkeystr2_num (unbalanced)
 Time variable: year, 1965 to 1967, but with a gap
         Delta: 1 unit
xtreg lROA i.year lProfMar lICturn lICrat lFL lROEmv, fe

Fixed-effects (within) regression               Number of obs     =        100
Group variable: gvkeystr2_~m                    Number of groups  =         53

R-squared:                                      Obs per group:
     Within  = 0.8899                                         min =          1
     Between = 0.9892                                         avg =        1.9
     Overall = 0.9861                                         max =          3

                                                F(7,40)           =      46.18
corr(u_i, Xb) = 0.7000                          Prob > F          =     0.0000

------------------------------------------------------------------------------
        lROA | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       1966  |   .0085624   .0212587     0.40   0.689    -.0344029    .0515278
       1967  |   .0065978    .021212     0.31   0.757    -.0362733    .0494689
             |
    lProfMar |   .7293048   .0805789     9.05   0.000     .5664487    .8921609
     lICturn |   .7604428   .0611535    12.43   0.000     .6368468    .8840387
      lICrat |   .8519224   .0566816    15.03   0.000     .7373647    .9664801
         lFL |  -.0162381    .087619    -0.19   0.854    -.1933227    .1608465
      lROEmv |   .1704402   .0541808     3.15   0.003     .0609367    .2799438
       _cons |  -7.055757   .8612136    -8.19   0.000    -8.796334   -5.315179
-------------+----------------------------------------------------------------
     sigma_u |  .07312191
     sigma_e |  .05620967
         rho |  .62856803   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(52, 40) = 0.89                      Prob > F = 0.6564

The F-test appearing as a footnote is telling you that you should go pooled OLS instead of -xtreg,fe- (but it may well be caused by the subsample of your dataset that you shared via -dataex-)

2) Something like:

Code:

bysort industry: xtreg xtreg lROA i.year lProfMar lICturn lICrat lFL lROEmv, fe

should do the trick.

3) One of the most relevant test aims at testingt the correct specification of the functional form of the regressand (othet tests that you can easily find among canned Stata commands and community-contributed Stata modules investigate heteroskedasticity and serial correlation of the epsilon error):

Code:

. quetly xtreg lROA i.year lProfMar lICturn lICrat lFL lROEmv, fe

. predict fitted, xb

. g sq_fitted=fitted^2

. xtreg lROA fitted sq_fitted, fe

Fixed-effects (within) regression               Number of obs     =        100
Group variable: gvkeystr2_~m                    Number of groups  =         53

R-squared:                                      Obs per group:
     Within  = 0.8903                                         min =          1
     Between = 0.9898                                         avg =        1.9
     Overall = 0.9869                                         max =          3

                                                F(2,45)           =     182.63
corr(u_i, Xb) = 0.6594                          Prob > F          =     0.0000

------------------------------------------------------------------------------
        lROA | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   .9593316   .1095214     8.76   0.000     .7387443    1.179919
   sq_fitted |     .01615   .0382053     0.42   0.675    -.0607995    .0930995
       _cons |   .0160262   .1051282     0.15   0.880    -.1957128    .2277653
-------------+----------------------------------------------------------------
     sigma_u |  .06746723
     sigma_e |  .05289007
         rho |  .61936473   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(52, 45) = 1.53                      Prob > F = 0.0746

As the sq_fitted does not reach statistical significance, if -xtreg,fe- were the way to go, no misspecification issue would be detected.

Kind regards,
Carlo
(Stata 19.0)

Comment

Tobias Berger

Join Date: Oct 2022

Posts: 9
#3

08 Oct 2022, 05:25

Hello Carlo,

Many thanks for your swift and comprehensive response, that is highly appreciated.

When performing the commands, a few questions arise, which it would be great to have your opinion on.

I am referring to my regression outputs below (1. and 2.), which result from my whole dataset.

1) When running the 1. Regression without industry focus below, the F-test indicates that -xtreg,fe- is suitable, correct?

2) When I now want to state the impact of lICrat (meaning the coefficient) on lROA for the specific decades (e.g. 1970 - 1980), do I need to sum up the coefficient of thelICrat | .8968841 below (in 1. regression) with the coefficients for each year? Meaning e.g. lICrat | .8968841 + (for 1970 | -.00981) + (for 1971 | -.0068977) etc.?
In case this is correct, which role does the p value for each year play as they show some insignificance for each year? Or does that not matter?
The same question would also arise when looking at the regressions per industry (also one for example below (2. Regression for one industry (code 10)).

3) The year, industry and year/industry fixed effects are absorbed by using -i.year- and -fe-, right?

4) I assume the control variables are "just" the variables I add within the regression (such as FL = Financial Leverage) which absorb some of the effect?

5) When searching for tests, I find that for -xreg,fe- some people mentioned just to use robust and clustered standard errors? I use Stata 17.0, is here only the command -cluster- needed?
I also found -xttest2- but it gives me the following error xttest2 --> gvkeystr2_num takes on too many values r(134);
Which commands / ways would you suggest when it comes to test the model / coefficients?

When running your suggested command, the following result turns out, is here misspecification detected, looking at the statistical significance of sq_fitted?

. quietly xtreg lROA i.year lProfMar lICturn lICrat lFL, fe

.
. predict fitted, xb
(420 missing values generated)

.
. g sq_fitted=fitted^2
(420 missing values generated)

. xtreg lROA fitted sq_fitted, fe

Fixed-effects (within) regression Number of obs = 116,162
Group variable: gvkeystr2_~m Number of groups = 13,566

R-squared: Obs per group:
Within = 0.9752 min = 1
Between = 0.9368 avg = 8.6
Overall = 0.9710 max = 59

F(2,102594) = 2.01e+06
corr(u_i, Xb) = -0.0447 Prob > F = 0.0000

------------------------------------------------------------------------------
lROA | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
fitted | .9928209 .0008538 1162.76 0.000 .9911474 .9944945
sq_fitted | .003045 .0002941 10.35 0.000 .0024686 .0036214
_cons | .0006377 .0009535 0.67 0.504 -.0012311 .0025065
-------------+----------------------------------------------------------------
sigma_u | .20370774
sigma_e | .11938185
rho | .74435309 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(13565, 102594) = 6.95 Prob > F = 0.0000

1. Regression without industry focus:

xtreg lROA i.year lProfMar lICturn lICrat lFL, fe

Fixed-effects (within) regression Number of obs = 116,162
Group variable: gvkeystr2_~m Number of groups = 13,566

R-squared: Obs per group:
Within = 0.9751 min = 1
Between = 0.9370 avg = 8.6
Overall = 0.9711 max = 59

F(63,102533) = 63818.67
corr(u_i, Xb) = -0.0363 Prob > F = 0.0000

------------------------------------------------------------------------------
lROA | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
year |
1964 | .0069237 .0127331 0.54 0.587 -.018033 .0318804
1965 | .002069 .0118185 0.18 0.861 -.0210952 .0252331
1966 | .0087763 .0107554 0.82 0.415 -.012304 .0298567
1967 | -.0055056 .0104756 -0.53 0.599 -.0260377 .0150264
1968 | -.0087312 .0103038 -0.85 0.397 -.0289266 .0114642
1969 | -.0019528 .0103815 -0.19 0.851 -.0223004 .0183949
1970 | -.00981 .0104444 -0.94 0.348 -.0302808 .0106608
1971 | -.0068977 .0103242 -0.67 0.504 -.027133 .0133377
1972 | -.0101976 .0102889 -0.99 0.322 -.0303638 .0099685
1973 | .0047715 .0106656 0.45 0.655 -.016133 .0256759
1974 | .0202451 .0112444 1.80 0.072 -.0017937 .042284
1975 | .0154346 .0108176 1.43 0.154 -.0057676 .0366369
1976 | .0144639 .01055 1.37 0.170 -.006214 .0351417
1977 | .0187628 .010508 1.79 0.074 -.0018328 .0393583
1978 | .0247072 .0103998 2.38 0.018 .0043237 .0450906
1979 | .0211847 .0103137 2.05 0.040 .00097 .0413994
1980 | .0050589 .0102035 0.50 0.620 -.0149399 .0250576
1981 | .0043823 .0101897 0.43 0.667 -.0155895 .024354
1982 | .0025303 .010152 0.25 0.803 -.0173675 .0224281
1983 | -.011525 .0100277 -1.15 0.250 -.0311792 .0081293
1984 | -.0046522 .0100421 -0.46 0.643 -.0243344 .0150301
1985 | -.0098299 .0100263 -0.98 0.327 -.0294812 .0098214
1986 | -.0180935 .0100206 -1.81 0.071 -.0377339 .0015468
1987 | -.0100018 .0100364 -1.00 0.319 -.029673 .0096694
1988 | -.008635 .0100358 -0.86 0.390 -.0283052 .0110351
1989 | -.0044743 .0100572 -0.44 0.656 -.0241863 .0152377
1990 | -.0064634 .010105 -0.64 0.522 -.0262692 .0133423
1991 | -.0115139 .0100637 -1.14 0.253 -.0312387 .0082109
1992 | -.0127256 .0100301 -1.27 0.205 -.0323844 .0069333
1993 | -.0134617 .0099836 -1.35 0.178 -.0330294 .006106
1994 | -.0088791 .0099676 -0.89 0.373 -.0284154 .0106572
1995 | -.0077428 .0099559 -0.78 0.437 -.0272563 .0117707
1996 | -.0120508 .0099427 -1.21 0.226 -.0315384 .0074368
1997 | -.0145307 .0099412 -1.46 0.144 -.0340153 .0049539
1998 | -.0174536 .0099887 -1.75 0.081 -.0370314 .0021241
1999 | -.0154207 .0100158 -1.54 0.124 -.0350516 .0042102
2000 | -.016575 .0100572 -1.65 0.099 -.036287 .0031369
2001 | -.0228049 .0100875 -2.26 0.024 -.0425762 -.0030335
2002 | -.022633 .0101118 -2.24 0.025 -.042452 -.002814
2003 | -.0257311 .0100246 -2.57 0.010 -.0453792 -.006083
2004 | -.0220746 .0099905 -2.21 0.027 -.0416558 -.0024933
2005 | -.0184037 .0099946 -1.84 0.066 -.0379929 .0011856
2006 | -.0197889 .0099948 -1.98 0.048 -.0393786 -.0001993
2007 | -.0196562 .0100278 -1.96 0.050 -.0393106 -1.77e-06
2008 | -.015646 .0101792 -1.54 0.124 -.035597 .0043051
2009 | -.0219553 .0101266 -2.17 0.030 -.0418033 -.0021073
2010 | -.02419 .0100678 -2.40 0.016 -.0439228 -.0044571
2011 | -.02093 .0100956 -2.07 0.038 -.0407172 -.0011427
2012 | -.0264883 .0101017 -2.62 0.009 -.0462875 -.0066891
2013 | -.032125 .0100845 -3.19 0.001 -.0518904 -.0123595
2014 | -.03579 .0100978 -3.54 0.000 -.0555815 -.0159984
2015 | -.0358052 .0101457 -3.53 0.000 -.0556907 -.0159196
2016 | -.0325113 .0101493 -3.20 0.001 -.0524037 -.0126188
2017 | -.0343636 .0101453 -3.39 0.001 -.0542482 -.0144791
2018 | -.0326884 .0101876 -3.21 0.001 -.0526559 -.0127208
2019 | -.0444578 .0102167 -4.35 0.000 -.0644824 -.0244333
2020 | -.0470176 .010289 -4.57 0.000 -.0671839 -.0268512
2021 | -.0353755 .0102203 -3.46 0.001 -.0554072 -.0153438
2022 | -.0434875 .0213021 -2.04 0.041 -.0852394 -.0017356
|
lProfMar | .9891585 .0005563 1778.15 0.000 .9880682 .9902488
lICturn | .8929487 .0012276 727.37 0.000 .8905426 .8953549
lICrat | .8968841 .0012234 733.12 0.000 .8944863 .8992819
lFL | -.0091148 .0012126 -7.52 0.000 -.0114915 -.0067382
_cons | -8.159318 .0163737 -498.32 0.000 -8.19141 -8.127226
-------------+----------------------------------------------------------------
sigma_u | .20330132
sigma_e | .11947974
rho | .74327961 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(13565, 102533) = 5.35 Prob > F = 0.0000

2. Regression for one industry (code 10):

bysort gsubind2: xtreg lROA i.year lProfMar lICturn lICrat lFL, fe

----------------------------------------------------------------------------------------------------------------------------
-> gsubind2 = 10

Fixed-effects (within) regression Number of obs = 8,596
Group variable: gvkeystr2_~m Number of groups = 1,069

R-squared: Obs per group:
Within = 0.9498 min = 1
Between = 0.8860 avg = 8.0
Overall = 0.9460 max = 53

F(60,7467) = 2353.56
corr(u_i, Xb) = -0.2313 Prob > F = 0.0000

------------------------------------------------------------------------------
lROA | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
year |
1966 | .0056595 .051136 0.11 0.912 -.0945814 .1059005
1967 | .0013645 .0491675 0.03 0.978 -.0950176 .0977465
1968 | .0026137 .0476723 0.05 0.956 -.0908373 .0960648
1969 | .0129865 .0487454 0.27 0.790 -.0825683 .1085412
1970 | .0162369 .0487671 0.33 0.739 -.0793604 .1118343
1971 | .000353 .0481193 0.01 0.994 -.0939743 .0946804
1972 | -.016471 .0467775 -0.35 0.725 -.108168 .0752261
1973 | .0160064 .0472965 0.34 0.735 -.0767081 .108721
1974 | .0893322 .0487886 1.83 0.067 -.0063073 .1849717
1975 | .0644704 .0480349 1.34 0.180 -.0296916 .1586324
1976 | .0572181 .046489 1.23 0.218 -.0339135 .1483496
1977 | .0502612 .0467333 1.08 0.282 -.0413492 .1418716
1978 | .0603402 .0465344 1.30 0.195 -.0308802 .1515607
1979 | .055055 .0453888 1.21 0.225 -.0339199 .1440298
1980 | .0341805 .0448958 0.76 0.446 -.0538279 .1221889
1981 | .062656 .0449156 1.39 0.163 -.0253912 .1507033
1982 | .0506718 .0463555 1.09 0.274 -.040198 .1415416
1983 | .0328803 .0460859 0.71 0.476 -.057461 .1232216
1984 | .0172229 .0466827 0.37 0.712 -.0742884 .1087341
1985 | .052386 .0472938 1.11 0.268 -.0403232 .1450952
1986 | -.0241576 .0480769 -0.50 0.615 -.1184018 .0700866
1987 | .0228294 .0470354 0.49 0.627 -.0693734 .1150321
1988 | .0254858 .0464024 0.55 0.583 -.0654761 .1164476
1989 | .0110575 .0453529 0.24 0.807 -.0778468 .0999619
1990 | .0426788 .0451751 0.94 0.345 -.0458772 .1312347
1991 | .0145708 .0458403 0.32 0.751 -.075289 .1044307
1992 | .017971 .0456141 0.39 0.694 -.0714455 .1073875
1993 | .0169826 .044938 0.38 0.706 -.0711085 .1050736
1994 | .0166574 .0450162 0.37 0.711 -.071587 .1049019
1995 | .0328391 .0448633 0.73 0.464 -.0551056 .1207838
1996 | .0380357 .0444041 0.86 0.392 -.0490089 .1250804
1997 | .0294296 .0444018 0.66 0.507 -.0576105 .1164696
1998 | .0273788 .0457754 0.60 0.550 -.0623538 .1171114
1999 | -.0046768 .0453581 -0.10 0.918 -.0935915 .0842378
2000 | .0352222 .0446429 0.79 0.430 -.0522905 .1227349
2001 | .0048705 .0449404 0.11 0.914 -.0832253 .0929664
2002 | .0093083 .0453393 0.21 0.837 -.0795696 .0981862
2003 | .0097455 .0447806 0.22 0.828 -.0780371 .097528
2004 | .0440651 .0445012 0.99 0.322 -.0431699 .1313001
2005 | .0600117 .0443842 1.35 0.176 -.0269939 .1470172
2006 | .055208 .0443398 1.25 0.213 -.0317105 .1421264
2007 | .0455613 .0443865 1.03 0.305 -.0414488 .1325714
2008 | .0273521 .0457267 0.60 0.550 -.0622852 .1169894
2009 | .0543114 .0453021 1.20 0.231 -.0344934 .1431163
2010 | .0324615 .0447871 0.72 0.469 -.0553338 .1202568
2011 | .0332788 .0447314 0.74 0.457 -.0544074 .120965
2012 | .0123594 .0449021 0.28 0.783 -.0756613 .1003801
2013 | .010551 .0447639 0.24 0.814 -.0771988 .0983009
2014 | -.0111098 .0451852 -0.25 0.806 -.0996856 .077466
2015 | -.0334939 .0472266 -0.71 0.478 -.1260714 .0590835
2016 | .0327695 .0470966 0.70 0.487 -.0595531 .1250921
2017 | -.0135446 .0459106 -0.30 0.768 -.1035423 .076453
2018 | .0090506 .046747 0.19 0.846 -.0825868 .1006879
2019 | -.0119334 .0474316 -0.25 0.801 -.1049127 .0810458
2020 | .0365973 .05269 0.69 0.487 -.06669 .1398846
2021 | .0137158 .0473417 0.29 0.772 -.0790873 .1065189
|
lProfMar | .9502819 .0028979 327.92 0.000 .9446012 .9559626
lICturn | .8063445 .0052876 152.50 0.000 .7959793 .8167097
lICrat | .8137707 .0053706 151.52 0.000 .8032428 .8242986
lFL | -.0414511 .0067961 -6.10 0.000 -.0547734 -.0281288
_cons | -7.276817 .071626 -101.59 0.000 -7.417224 -7.13641
-------------+----------------------------------------------------------------
sigma_u | .3062449
sigma_e | .18510303
rho | .73242185 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(1068, 7467) = 5.53 Prob > F = 0.0000
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#4

08 Oct 2022, 08:49

Thomas:
1) Correct;
2) Not quite, You should test the joint statistical significance of -year- via:

Code:

testparm i.year

3) Not quite. -i.industry- is omitted being time-invariant;
4) Correct if you men that they are not the predictors you're really interested in.
5) Cluster-robust standard errors can be invoked via -robust- or -vce(cluster clusterid)- options from -xtreg-. Since you have a huge number of panels (1 panel=1 cluster), cluster-robust standard errors are recommended.
6) yes, your model suffers from mispsecification.

Last edited by Carlo Lazzaro; 08 Oct 2022, 08:51.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tobias Berger

Join Date: Oct 2022

Posts: 9
#5

08 Oct 2022, 12:49

Hello Carlo,

Thanks a lot for your input, that is highly appreciated.
It seems that my econometrics knowledge is limited here, so sorry for some further questions:

2) When running the command testparm i.year, the result below will be calculated. Does that mean I can sum up the coefficient per year, to reach the above mentioned goal?

3) But we should account for industry fixed effects as we do have them, right? Why can we then omit i.industry?

5) Incorporating cluster robust standard errors via -vce(cluster clusterid (e.g. year)) means that we do not have to test for heteroskedasticity or serial correlation?

6) Is there any quick hint how to solve this misspecification? Adding more variables? My regression equation is basically defining ROA (y) = Profit Margin (x1) + Intangible Capital turnover (x2) + Intangible Capital (x3) ratio
This formula results from when applying the DuPont idea on the initial ROA equation = Net Income / Total Assets --> ROA = Net Income / Sales (Profit Margin) * Sales /Intangible Capital (Intangible Capital turnover) * Intangible Capital / Total Assets (Intangible Capital ratio) --> and then taking the log

Could misspecification result from the case that I regress variables which are too dependent on each other as per the formula above?

Stata output

. testparm i.year

( 1) 1964.year = 0
( 2) 1965.year = 0
( 3) 1966.year = 0
( 4) 1967.year = 0
( 5) 1968.year = 0
( 6) 1969.year = 0
( 7) 1970.year = 0
( 8) 1971.year = 0
( 9) 1972.year = 0
(10) 1973.year = 0
(11) 1974.year = 0
(12) 1975.year = 0
(13) 1976.year = 0
(14) 1977.year = 0
(15) 1978.year = 0
(16) 1979.year = 0
(17) 1980.year = 0
(18) 1981.year = 0
(19) 1982.year = 0
(20) 1983.year = 0
(21) 1984.year = 0
(22) 1985.year = 0
(23) 1986.year = 0
(24) 1987.year = 0
(25) 1988.year = 0
(26) 1989.year = 0
(27) 1990.year = 0
(28) 1991.year = 0
(29) 1992.year = 0
(30) 1993.year = 0
(31) 1994.year = 0
(32) 1995.year = 0
(33) 1996.year = 0
(34) 1997.year = 0
(35) 1998.year = 0
(36) 1999.year = 0
(37) 2000.year = 0
(38) 2001.year = 0
(39) 2002.year = 0
(40) 2003.year = 0
(41) 2004.year = 0
(42) 2005.year = 0
(43) 2006.year = 0
(44) 2007.year = 0
(45) 2008.year = 0
(46) 2009.year = 0
(47) 2010.year = 0
(48) 2011.year = 0
(49) 2012.year = 0
(50) 2013.year = 0
(51) 2014.year = 0
(52) 2015.year = 0
(53) 2016.year = 0
(54) 2017.year = 0
(55) 2018.year = 0
(56) 2019.year = 0
(57) 2020.year = 0
(58) 2021.year = 0
(59) 2022.year = 0

F( 59,102533) = 11.24
Prob > F = 0.0000
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#6

08 Oct 2022, 17:19

Tobias:
1) reading any decent textbook on panel data regression is recommended;
2) not quite. The -testparm- output simply tells you that, as it is often the case with the -fe- estimator, the T dimension contributes to explain variationd in thr conditional mean of the regressand. To obtain what you're sermingly after, you should use -predict-;
3) if -i.industry- is a time-invariant predictor, the -fe- machinery will wipe you out. Try it yourself and see that -i.industry- will be omitted;
5) in short panel the standard errors should be clustered on -panelid- or other higher level predictor, if feasible, but not on -timevar-. That said, there's no gain in testing for heteroskedasticity and autocorrelation after imposing cluster-robust standard errors;
6) including more predictors (and/or their interactions) is a possible fix.

Last edited by Carlo Lazzaro; 08 Oct 2022, 17:21.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tobias Berger

Join Date: Oct 2022

Posts: 9
#7

09 Oct 2022, 04:47

Hello Carlo,

Thanks a lot for your input, that is really helpful.
One last question regarding the estimation of coefficient for multiple years, meaning for example per decade.
What should -predict discover? I mean, I have to sum up the coefficient for each year I am interested in with the "basic" coefficient for the respective at the bottom right? Or do I have to calculate an average?
In literature there is not such an example..

xtreg lROA i.year lProfMar lICturn lICrat lFL, fe cluster (gsubind2)

Fixed-effects (within) regression Number of obs = 77706
Group variable: gvkeystr2_~m Number of groups = 10433

R-sq: Within = 0.9811 Obs per group: min = 1
Between = 0.9555 avg = 7.4
Overall = 0.9793 max = 41

F(7,8) = .
corr(u_i, Xb) = 0.0029 Prob > F = .

(Std. err. adjusted for 9 clusters in gsubind2)
------------------------------------------------------------------------------
| Robust
lROA | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
year |
1971 | .0037102 .0037344 0.99 0.350 -.0049013 .0123217
1972 | -.0004549 .0032825 -0.14 0.893 -.0080243 .0071146
1973 | .009482 .0040338 2.35 0.047 .0001801 .018784
1974 | .0218267 .0064618 3.38 0.010 .0069257 .0367276
1975 | .0160525 .0055546 2.89 0.020 .0032436 .0288614
1976 | .0180549 .004753 3.80 0.005 .0070945 .0290152
1977 | .0186226 .0019586 9.51 0.000 .014106 .0231391
1978 | .0246361 .006883 3.58 0.007 .0087639 .0405082
1979 | .0217661 .0058023 3.75 0.006 .008386 .0351461
1980 | .0146647 .0052949 2.77 0.024 .0024547 .0268747
1981 | .0127611 .0033166 3.85 0.005 .005113 .0204091
1982 | .0104817 .004135 2.53 0.035 .0009464 .0200169
1983 | -.0033174 .0018555 -1.79 0.112 -.0075961 .0009613
1984 | .0016403 .0036649 0.45 0.666 -.006811 .0100916
1985 | -.0027814 .0040268 -0.69 0.509 -.0120671 .0065044
1986 | -.0102921 .0044771 -2.30 0.051 -.0206162 .000032
1987 | -.0031316 .0041812 -0.75 0.475 -.0127734 .0065103
1988 | -.0025461 .0050648 -0.50 0.629 -.0142255 .0091334
1989 | .0008585 .0036165 0.24 0.818 -.0074811 .0091982
1990 | .0011939 .0049889 0.24 0.817 -.0103104 .0126982
1991 | -.0035383 .0054198 -0.65 0.532 -.0160364 .0089598
1992 | -.003129 .0048451 -0.65 0.536 -.0143018 .0080439
1993 | -.0038914 .0032702 -1.19 0.268 -.0114325 .0036496
1994 | -.0005659 .0044461 -0.13 0.902 -.0108187 .0096869
1995 | .0006614 .0061458 0.11 0.917 -.0135108 .0148336
1996 | -.0027771 .0057062 -0.49 0.640 -.0159356 .0103815
1997 | -.0054698 .0048149 -1.14 0.289 -.016573 .0056334
1998 | -.0083757 .0049414 -1.70 0.129 -.0197705 .0030192
1999 | -.0046518 .004551 -1.02 0.337 -.0151464 .0058428
2000 | -.0031321 .0031807 -0.98 0.354 -.0104668 .0042026
2001 | -.0084371 .0039738 -2.12 0.066 -.0176007 .0007264
2002 | -.014386 .003155 -4.56 0.002 -.0216616 -.0071105
2003 | -.0163173 .0025905 -6.30 0.000 -.022291 -.0103436
2004 | -.0123025 .0041768 -2.95 0.019 -.0219341 -.0026709
2005 | -.0090141 .0043404 -2.08 0.071 -.019023 .0009948
2006 | -.0128986 .0050258 -2.57 0.033 -.0244882 -.001309
2007 | -.0117261 .0053007 -2.21 0.058 -.0239496 .0004974
2008 | -.0058919 .0063128 -0.93 0.378 -.0204491 .0086654
2009 | -.0169482 .0076545 -2.21 0.058 -.0345996 .0007032
2010 | -.0146992 .0060312 -2.44 0.041 -.0286071 -.0007912
|
lProfMar | .9964079 .0048482 205.52 0.000 .985228 1.007588
lICturn | .9111991 .0111505 81.72 0.000 .885486 .9369121
lICrat | .9109616 .0106905 85.21 0.000 .8863092 .9356141
lFL | -.0093427 .0042198 -2.21 0.058 -.0190734 .0003881
_cons | -8.321487 .1243381 -66.93 0.000 -8.608211 -8.034763
-------------+----------------------------------------------------------------
sigma_u | .16190806
sigma_e | .10390372
rho | .7082971 (fraction of variance due to u_i)
---------------------------------------------------------------
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

09 Oct 2022, 05:58

Tomas:
1) you have to consider that the -i.year- contribution to variation inn the conditional mean of the regressand is adjusted for the remianing predictors;
2) via -predict- you obtian the linear prediction for a given observation multiplying the variables by their coefficients, s you can see in the following (too basic to be true) toy-example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage age i.year if year<=72, fe

Fixed-effects (within) regression               Number of obs     =      7,828
Group variable: idcode                          Number of groups  =      2,943

R-squared:                                      Obs per group:
     Within  = 0.1055                                         min =          1
     Between = 0.0469                                         avg =        2.7
     Overall = 0.0105                                         max =          5

                                                F(5,4880)         =     115.10
corr(u_i, Xb) = -0.4195                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |  -.0304386   .0243563    -1.25   0.211    -.0781879    .0173107
             |
        year |
         69  |   .1241075   .0251265     4.94   0.000     .0748482    .1733668
         70  |   .1647007   .0494931     3.33   0.001      .067672    .2617295
         71  |   .2681541   .0735818     3.64   0.000     .1239006    .4124077
         72  |   .3516117   .0976534     3.60   0.000     .1601671    .5430562
             |
       _cons |   1.999258   .4852245     4.12   0.000     1.047999    2.950516
-------------+----------------------------------------------------------------
     sigma_u |  .43375559
     sigma_e |  .23170746
         rho |  .77799345   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2942, 4880) = 5.79                  Prob > F = 0.0000

. predict fitted, xb


. list idcode year age fitted if idcode==1 & year<=72

       +--------------------------------+
       | idcode   year   age     fitted |
       |--------------------------------|
    1. |      1     70    18   1.616064 |
    2. |      1     71    19   1.689079 |
    3. |      1     72    20   1.742098 |
       +--------------------------------+

. di  1.999258 + (.1647007)+(18* -.0304386)
1.6160639

. di  1.999258 + (.2681541 )+(19* -.0304386)
1.6890787

. di  1.999258 + (.3516117 )+(20* -.0304386)
1.7420977

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Tobias Berger

Join Date: Oct 2022

Posts: 9
#9

14 Oct 2022, 07:59

Hello Carlo,
Many thanks for your response. Sorry for coming back that late as I was working on different projects.

I am currently not really sure when to take the log / ln in my panel data regressions.
May I ask for your advice / ideas when considering the following regressions:

First, I am analysing the impact of intangible capital on Return on Assets (as the dependent variable) and in another approach I am analysing the impact on Return on Equity (as another dependent variable).

As my independent variable of interest, I have chosen the Intangible Capital Ratio (Intangible Capital / Total Assets)
Furthermore, I have chosen the following control variables:
Financial Leverage (Total Assets / Shareholders Equity book value)

Firm Size (measured by total assets)

Efficiency (sales / total assets)

Sales growth (annual growth rate of sales in %)

Asset growth (annual growth rate of assets in %)

Profit Margin (net income/sales)

Intangible Capital turnover (sales / intangible capital)

Which of the variables (ROA, ROE, Intangible Capital Ratio and my 7 control variables) would you transform in a log format in order to run panel data regressions?

Many thanks in advance for your answer.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#10

14 Oct 2022, 08:10

Tobias:
the issue is larger.
1) yoummay want to log the regressand and keep the predictors in their linear metric (log-linear regerssion, often used in econometrics because it allows to express in % terms the contribution of each regressor to variation in the cinditional mean of the regressand;
2) you may want to log all the terms of your regression (log-log regerssion), taht allows you to express what above as the elasticity of Y with respect to each regressors.

In addition, you may want to go 1) to fix some forms of heteroskedasticity and/or misspecification of the functional form of theb regressand.

That said, I would take a look at the literature un your research field and see what is the most frequently used approach/regerssio specification (-fe- or -re- included).

Kind regards,
Carlo
(Stata 19.0)
Comment
Tobias Berger

Join Date: Oct 2022

Posts: 9
#11

15 Oct 2022, 12:52

Hello Carlo,

Great, many thanks for your input.

In literature I find both approaches, so I was curious if there is a kind of limitation / guidance when to use which transformation.
When speaking about the log-log regression, would for example one possible finding (coefficient of Intangible Capital Ratio) could be that the Intangible Capital Ratio explains 30% of changes in the regressand (e.g. the ratio lIARg exhibits an influence of 30% on the ROE; it explains almost 0,30% of the growth in ROE when it changes 1%)?
-> this is for example one example from literature where all variables (dependent and independent were log transformed).

Fixed effects makes the most sense as I am dealing with firm and year effects, which is also confirmed by the Hausman test.

Many thanks for your input.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#12

15 Oct 2022, 18:00

Tobias:
1) yes, the log-log regression results that you mention makes sense;
2) fixed effect is the first choice in this kind of researches;
3) -hausman- test works with default standard errors only. If you impose non-default standard errors you should consider the community-contributed module -xtoverid- or the Mundlak approach to compare -fe- vs. -re- specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tobias Berger

Join Date: Oct 2022

Posts: 9
#13

16 Oct 2022, 08:24

Thanks a lot Carlo.

When testing as you suggested, should I also test year effects with testparm when using clustered robust standard errors?
Clustered robust standard errors control also for heteroskedasticity right?

My regression code would look like:

xtreg lROA i.year lICrat lFL at lEff g_sales_w g_at_w lProfMar lICturn, fe cluster(gsubind2)

testparm i.year
Comment
Tobias Berger

Join Date: Oct 2022

Posts: 9
#14

16 Oct 2022, 08:30

Sorry for another note but the command -xtoverid- does not seem to work (Stata 17.0) installed.
How does this command work?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#15

16 Oct 2022, 11:04

Tobias;
try -search xtoverid- and follow the instructions reported in its -helpfile- to install it.
Please note that you also have to install some ancillary community-contributed modules that support -xtoverid-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Help with Regressions of Panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment