Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Regressions of Panel data

    Hello together,

    I have an urgent question regarding a Stata coding case which I unfortunately could not find in the forum.
    I am referring to the data example (extracted with dataex) below.

    I am trying to analyse the impact of the below independent variables lProfMar (logProfitMargin) lICturn (logIntangibleCapitalturnover) and lICrat (logIntangibleCapitalratio) on lROA (logROA), which is the dependent variable.

    The basic format would be y (lROA) = x1 + x2 + x3
    But I have the data for all stock listed American companies over the years 1964 - 2020 with the respective industry codes (gsubind2).
    This would mean I am dealing with Panel data, right?

    Therefore, I would like to do the following:

    1. Run a regression in order to observe first the general impact of the independent variable logICratio (which models the impact of intangible assets on the Return on Assets) across all the industries and years.

    2. Secondly, I would like to dive deeper, in order to observe the effect of intangible assets (logICratio) on ROA, for each industry (I have 9 in total, below is only a sample for the SIC code 10) over the decades (e.g. 1964 - 1970, 1971 - 1980, 1981 - 1990, 1991 - 2000, 2001 - 2010, 2011 - 2020).
    Meaning, I would like to have for each of the industries 6 regression results, which would enable me to observe the effect over time. For example: In the Industrials industry the effect from 1964 - 1970 is... from 1971 - 1980 is... etc.

    For both cases, I would like to include:
    - control variables for leverage, (company) size and profit margin
    - control for year, industry and year/industry fixed effects
    - include standard errors (robust / clustered)

    I would also be interested wich tests I would need to include, so that my coefficients I observe with the regressions are valid and significant?

    Any help is highly appreciated and please comment if more details are needed.
    I am really stuck with coding those regression outputs.

    Many thanks in advance.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 gvkeystr2 double year str4 curcd float(lROA lProfMar lICturn lICrat lFL lROEmv gsubind2)
    "008265" 1965 "USD"  .5196433  1.550513  7.581828   .5976427  6.245504  1.9713898 10
    "009867" 1965 "USD" 1.6634578  2.469071 4.4409566   3.963771  5.385618  1.6788192 10
    "004351" 1965 "USD"  1.349132 1.7737175  6.297383  2.4883714  5.533849  2.0117655 10
    "010001" 1965 "USD"  2.084888 2.1455934  5.174583   3.975052  5.056631  1.9285623 10
    "008151" 1965 "USD" 1.7967305   2.63645  4.606376   3.764245  5.403117  1.9227883 10
    "001848" 1965 "USD"  1.895812 2.2640448  6.976745  1.8653626  4.849475  2.0608273 10
    "011038" 1965 "USD" 1.9138404 2.2680407  6.095886  2.7602546   5.08366  2.1514924 10
    "015077" 1965 "USD" 1.2764703 2.2174072   6.25346   2.015943  5.678953   2.115544 10
    "004430" 1965 "USD" 1.6241326  2.472753  5.130704  3.2310154  5.449169   2.005333 10
    "004073" 1965 "USD" 1.9588038 1.6547745  5.734749    3.77962   5.36468  2.0576415 10
    "001678" 1965 "USD" 1.0763617 1.6607312   6.06867  2.5573006  5.844563  1.9356595 10
    "002991" 1965 "USD" 2.2398005  2.773686  4.494319   4.182136  4.844166   1.873247 10
    "010862" 1965 "USD" 1.3640597 2.3594964 4.6221743  3.5927296  5.521954  1.6343343 10
    "006310" 1965 "USD" 1.5878657  2.528244  4.957952  3.3120096  5.612018  2.0345106 10
    "003930" 1965 "USD" 1.9455434  2.059591  5.924173  3.1721196  5.113824  2.1200345 10
    "007152" 1965 "USD"  2.441854 2.3710392 4.7220263  4.5591288  5.103181  1.9954125 10
    "003067" 1965 "USD"  1.816471  2.121225  6.784823  2.1207635  5.107203  2.1831853 10
    "003130" 1965 "USD" 1.7599165 3.0512965   3.77133    4.14763  5.965089   1.876967 10
    "004503" 1965 "USD" 2.0696442  2.200358  4.941744  4.1378827   5.01432  1.7886598 10
    "010482" 1965 "USD"  2.477945  2.824148  4.030914   4.833224  4.870504   1.768618 10
    "003067" 1966 "USD"  2.108986 2.3925076  6.337811   2.589007  5.057682  2.3660953 10
    "008151" 1966 "USD" 1.7150937  2.617751  5.189945   3.117738  5.419086  2.1169581 10
    "001678" 1966 "USD"  .9942523 1.7147983  8.644677 -.10765802  5.927368   2.232343 10
    "010482" 1966 "USD"  2.412013 2.7747076 4.4532466  4.3943987  4.959701  1.9925215 10
    "002410" 1966 "USD" 2.0655658  2.623508  7.248991  1.4034072  5.082231  2.4626174 10
    "004351" 1966 "USD"  1.394485 1.8065077  7.757982  1.0403355  5.538279  2.2581105 10
    "010001" 1966 "USD" 2.1959083 2.1742039  5.300657   3.931388  4.994503   2.024241 10
    "007475" 1966 "USD" 1.8658162 1.9136842  6.203505   2.958968  5.014277   2.016918 10
    "006403" 1966 "USD"  1.942149  2.059661 4.5959444  4.4968843  5.291059  1.6014403 10
    "005439" 1966 "USD" 2.2788556  1.644978   6.34365  3.5005686  5.126142  2.3565078 10
    "004430" 1966 "USD"   1.59205  2.450299  6.249856   2.102235  5.433359  2.2485101 10
    "004073" 1966 "USD" 2.1251755 1.7523392  6.071679   3.511498  5.297934  2.3052883 10
    "008853" 1966 "USD"  2.488409 1.9708694  5.408367   4.319514  5.011383  2.1393812 10
    "011506" 1966 "USD"  .8011896  2.430097  5.222341   2.359092  6.911114  2.3836055 10
    "009772" 1966 "USD" 2.2473223 2.0658972  4.888558   4.503207  5.122468   1.842396 10
    "009653" 1966 "USD"  2.129312 2.2134924  5.066084  4.0600758  5.074555   1.942688 10
    "008549" 1966 "USD" 1.7356625 2.1964486  6.376786  2.3727682  5.264188  2.2062593 10
    "007152" 1966 "USD"  2.450477  2.474588 4.2366166   4.949613  5.106915  1.7490262 10
    "001609" 1966 "USD"  1.894249 2.2456503  6.196913   2.662026  4.916108  2.0157099 10
    "006819" 1966 "USD"  3.483641  3.898861 2.1273975   6.667723  4.807897  1.3223288 10
    "003130" 1966 "USD" 1.7418855 2.2968254  5.128397   3.527004  5.897388  2.2282472 10
    "011038" 1966 "USD"   2.01334 2.3441427   6.49139  2.3881476  5.068228     2.3083 10
    "009878" 1966 "USD"  2.830249 3.7882795  2.907467   5.344843  4.731468   1.739467 10
    "010503" 1966 "USD" 2.0290208  3.329094  4.356546   3.553721   5.59444  2.3557496 10
    "002991" 1966 "USD" 2.1458993  2.699328  5.300189   3.356723  4.895236  2.1113396 10
    "007017" 1966 "USD" 2.0665941 2.4041915  5.160166  3.7125766  5.047699  2.0158994 10
    "007882" 1966 "USD" 1.8856913  4.101265  1.636634   5.638435 4.7260547  .57834214 10
    "008974" 1966 "USD"  1.637855 2.2603848   5.09086  3.4969506    6.2505   2.286101 10
    "003930" 1966 "USD"  1.930132 2.0980442  6.318486   2.723943  5.190114  2.2732756 10
    "007276" 1966 "USD" 2.4633584  2.947414 4.0511537  4.6751313   5.28034  2.0049617 10
    "010156" 1966 "USD"  1.963616 2.2620535  6.026539   2.885364  5.018394  2.1319504 10
    "009465" 1966 "USD" 2.0246305 2.1046956  7.012237  2.1180387  4.936938  2.2199886 10
    "005012" 1966 "USD" 2.0104487 2.0794415  4.930026  4.2113214  5.358421  1.8745003 10
    "009867" 1966 "USD" 1.6171207  2.430122  5.961566  2.4357734  5.378591   2.169327 10
    "008068" 1966 "USD" 1.6301076 1.2388914  5.268762  4.3327947  5.828477  1.5562615 10
    "002067" 1966 "USD"  .6539264 2.1202636 4.1696086   3.574395  5.019521   .6365784 10
    "010862" 1966 "USD" 1.4268663 2.3556259 4.4903164   3.791265   5.55829   1.614766 10
    "005187" 1966 "USD" 1.8813317  2.353684  4.494313   4.243675  5.583386  1.8112316 10
    "004503" 1966 "USD" 2.0637228 2.1914792  5.658694    3.42389  5.038464   2.080169 10
    "005667" 1966 "USD" 1.4726313  2.466436    5.8844  2.3321354  5.888969   2.400308 10
    "006310" 1966 "USD" 1.7049178  2.591575  5.712029  2.6116536  5.606579   2.390936 10
    "007017" 1967 "USD"  2.022548  2.362567  4.736841  4.1334805  5.094829  1.8100348 10
    "010862" 1967 "USD" 1.5026466 2.3531435 4.5222497   3.837594  5.556502  1.6646914 10
    "007882" 1967 "USD"  2.548498  3.327665  1.636634   6.868293  5.024827  .05845372 10
    "008068" 1967 "USD" 1.7658633 1.7077713 4.2272806   5.041152  5.249301  1.0299217 10
    "003930" 1967 "USD"  1.931445 2.1512759  6.004253   2.986257   5.23858  2.2476737 10
    "007620" 1967 "USD" 1.1141868 1.4257585  7.106747  1.7920214  5.316026  1.5578226 10
    "004351" 1967 "USD"  1.391462  1.823098  7.065873   1.712831  5.554659  2.2070367 10
    "010482" 1967 "USD" 2.2599998  2.689897  4.408921  4.3715224  5.069791  1.9039836 10
    "008974" 1967 "USD"  1.593837 2.1702335  4.922835   3.711109  5.771074  1.9205186 10
    "003420" 1967 "USD"  2.061482 3.9446754  2.456654   4.870492   5.17167   1.434817 10
    "005439" 1967 "USD" 2.2404225  1.688551  5.530979  4.2312326  5.133477  1.9954216 10
    "002991" 1967 "USD"  2.072094  2.548395    5.6606   3.073439   4.90805  2.1182957 10
    "004503" 1967 "USD"  2.092912 2.2288344  5.650302   3.424116  5.079903  2.1381311 10
    "010565" 1967 "USD" 2.1497772  2.580431 4.7035666  4.0761204  5.272179  2.0522823 10
    "003130" 1967 "USD" 1.7614495 2.2301075 4.5857816  4.1559005  5.738585  1.8023727 10
    "009465" 1967 "USD"  2.103703 2.1450882  5.006168   4.162787  4.875771  1.7514894 10
    "001678" 1967 "USD" 1.3887788  1.900434  4.555962  4.1427236  5.562011  1.3745133 10
    "002067" 1967 "USD" -.1823216 1.2039728  1.914266    5.90978  4.929632 -1.6660073 10
    "006403" 1967 "USD" 1.6607544 2.0139108   4.20309  4.6540937   5.37381  1.2434888 10
    "007938" 1967 "USD"  1.641437 1.8242042  6.667015  2.3605583  5.150666    2.01901 10
    "004430" 1967 "USD"  1.543364  2.443657  6.312604   1.997444  5.437443  2.2191405 10
    "007276" 1967 "USD" 2.2603252 3.1099246   3.47086   4.889881  5.029291  1.5753503 10
    "004073" 1967 "USD" 1.8998277  1.745585   7.10787  2.2567136  5.143138  2.2525432 10
    "001537" 1967 "USD" 2.2350664  2.026553  6.000526   3.418327  5.172868  2.3720183 10
    "008853" 1967 "USD" 2.3754075  1.954475  5.167621   4.463652  5.062393  1.9692234 10
    "011506" 1967 "USD" 1.4458642 2.8143156  4.177734   3.664155  6.574507   2.081156 10
    "009653" 1967 "USD" 2.1195774  2.226658  5.049951   4.053309  5.108707  1.9538376 10
    "001976" 1967 "USD"  2.444437 2.1441348   5.14889  4.3617525  4.838223  1.9817382 10
    "007008" 1967 "USD" 1.1143705  1.849222  5.852809    2.62268  5.870417    1.98212 10
    "006310" 1967 "USD" 1.5941818  2.630798  4.789202   3.384522  5.653182  2.0316436 10
    "005667" 1967 "USD"  1.704374  2.768089  4.695209   3.451417  5.570193  2.0415192 10
    "006819" 1967 "USD"   3.48749 3.8873906 2.2540417   6.556398  4.785724  1.4241613 10
    "010156" 1967 "USD" 1.9158365  2.243663  5.295091  3.5874224   5.04796  1.9082427 10
    "001788" 1967 "USD" 2.3272777  2.495014  2.601183   6.441422  4.899799   .3788298 10
    "010503" 1967 "USD"  1.866553  3.306182  3.443982    4.32673  5.691399  1.7763008 10
    "009878" 1967 "USD" 2.7553964  3.634508 2.7834485    5.54778  4.689013  1.5065703 10
    "015077" 1967 "USD"  1.350164 2.0528038  9.257823 -.10765802    5.6591    2.28169 10
    "005581" 1967 "USD"  1.545457 2.0613651   5.32023   3.374202  5.558843   1.935049 10
    "005187" 1967 "USD" 1.9996516  2.477883 3.2769146   5.455194  5.475481   .9849555 10
    end

  • #2
    Tobias:
    welcome to this forum.
    1) You're probably after a fixed effect panel data regression:
    Code:
    .
    . destring gvkeystr2, g( gvkeystr2_num)
    . xtset gvkeystr2_num year
    
    Panel variable: gvkeystr2_num (unbalanced)
     Time variable: year, 1965 to 1967, but with a gap
             Delta: 1 unit
    xtreg lROA i.year lProfMar lICturn lICrat lFL lROEmv, fe
    
    Fixed-effects (within) regression               Number of obs     =        100
    Group variable: gvkeystr2_~m                    Number of groups  =         53
    
    R-squared:                                      Obs per group:
         Within  = 0.8899                                         min =          1
         Between = 0.9892                                         avg =        1.9
         Overall = 0.9861                                         max =          3
    
                                                    F(7,40)           =      46.18
    corr(u_i, Xb) = 0.7000                          Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
            lROA | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            year |
           1966  |   .0085624   .0212587     0.40   0.689    -.0344029    .0515278
           1967  |   .0065978    .021212     0.31   0.757    -.0362733    .0494689
                 |
        lProfMar |   .7293048   .0805789     9.05   0.000     .5664487    .8921609
         lICturn |   .7604428   .0611535    12.43   0.000     .6368468    .8840387
          lICrat |   .8519224   .0566816    15.03   0.000     .7373647    .9664801
             lFL |  -.0162381    .087619    -0.19   0.854    -.1933227    .1608465
          lROEmv |   .1704402   .0541808     3.15   0.003     .0609367    .2799438
           _cons |  -7.055757   .8612136    -8.19   0.000    -8.796334   -5.315179
    -------------+----------------------------------------------------------------
         sigma_u |  .07312191
         sigma_e |  .05620967
             rho |  .62856803   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(52, 40) = 0.89                      Prob > F = 0.6564
    The F-test appearing as a footnote is telling you that you should go pooled OLS instead of -xtreg,fe- (but it may well be caused by the subsample of your dataset that you shared via -dataex-)

    2) Something like:
    Code:
    bysort industry: xtreg xtreg lROA i.year lProfMar lICturn lICrat lFL lROEmv, fe
    should do the trick.

    3) One of the most relevant test aims at testingt the correct specification of the functional form of the regressand (othet tests that you can easily find among canned Stata commands and community-contributed Stata modules investigate heteroskedasticity and serial correlation of the epsilon error):
    Code:
    . quetly xtreg lROA i.year lProfMar lICturn lICrat lFL lROEmv, fe
    
    . predict fitted, xb
    
    . g sq_fitted=fitted^2
    
    . xtreg lROA fitted sq_fitted, fe
    
    Fixed-effects (within) regression               Number of obs     =        100
    Group variable: gvkeystr2_~m                    Number of groups  =         53
    
    R-squared:                                      Obs per group:
         Within  = 0.8903                                         min =          1
         Between = 0.9898                                         avg =        1.9
         Overall = 0.9869                                         max =          3
    
                                                    F(2,45)           =     182.63
    corr(u_i, Xb) = 0.6594                          Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
            lROA | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          fitted |   .9593316   .1095214     8.76   0.000     .7387443    1.179919
       sq_fitted |     .01615   .0382053     0.42   0.675    -.0607995    .0930995
           _cons |   .0160262   .1051282     0.15   0.880    -.1957128    .2277653
    -------------+----------------------------------------------------------------
         sigma_u |  .06746723
         sigma_e |  .05289007
             rho |  .61936473   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(52, 45) = 1.53                      Prob > F = 0.0746
    As the sq_fitted does not reach statistical significance, if -xtreg,fe- were the way to go, no misspecification issue would be detected.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hello Carlo,

      Many thanks for your swift and comprehensive response, that is highly appreciated.

      When performing the commands, a few questions arise, which it would be great to have your opinion on.

      I am referring to my regression outputs below (1. and 2.), which result from my whole dataset.

      1) When running the 1. Regression without industry focus below, the F-test indicates that -xtreg,fe- is suitable, correct?

      2) When I now want to state the impact of lICrat (meaning the coefficient) on lROA for the specific decades (e.g. 1970 - 1980), do I need to sum up the coefficient of thelICrat | .8968841 below (in 1. regression) with the coefficients for each year? Meaning e.g. lICrat | .8968841 + (for 1970 | -.00981) + (for 1971 | -.0068977) etc.?
      In case this is correct, which role does the p value for each year play as they show some insignificance for each year? Or does that not matter?
      The same question would also arise when looking at the regressions per industry (also one for example below (2. Regression for one industry (code 10)).

      3) The year, industry and year/industry fixed effects are absorbed by using -i.year- and -fe-, right?

      4) I assume the control variables are "just" the variables I add within the regression (such as FL = Financial Leverage) which absorb some of the effect?

      5) When searching for tests, I find that for -xreg,fe- some people mentioned just to use robust and clustered standard errors? I use Stata 17.0, is here only the command -cluster- needed?
      I also found -xttest2- but it gives me the following error xttest2 --> gvkeystr2_num takes on too many values r(134);
      Which commands / ways would you suggest when it comes to test the model / coefficients?

      When running your suggested command, the following result turns out, is here misspecification detected, looking at the statistical significance of sq_fitted?

      . quietly xtreg lROA i.year lProfMar lICturn lICrat lFL, fe

      .
      . predict fitted, xb
      (420 missing values generated)

      .
      . g sq_fitted=fitted^2
      (420 missing values generated)

      . xtreg lROA fitted sq_fitted, fe

      Fixed-effects (within) regression Number of obs = 116,162
      Group variable: gvkeystr2_~m Number of groups = 13,566

      R-squared: Obs per group:
      Within = 0.9752 min = 1
      Between = 0.9368 avg = 8.6
      Overall = 0.9710 max = 59

      F(2,102594) = 2.01e+06
      corr(u_i, Xb) = -0.0447 Prob > F = 0.0000

      ------------------------------------------------------------------------------
      lROA | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      fitted | .9928209 .0008538 1162.76 0.000 .9911474 .9944945
      sq_fitted | .003045 .0002941 10.35 0.000 .0024686 .0036214
      _cons | .0006377 .0009535 0.67 0.504 -.0012311 .0025065
      -------------+----------------------------------------------------------------
      sigma_u | .20370774
      sigma_e | .11938185
      rho | .74435309 (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(13565, 102594) = 6.95 Prob > F = 0.0000





      1. Regression without industry focus:

      xtreg lROA i.year lProfMar lICturn lICrat lFL, fe

      Fixed-effects (within) regression Number of obs = 116,162
      Group variable: gvkeystr2_~m Number of groups = 13,566

      R-squared: Obs per group:
      Within = 0.9751 min = 1
      Between = 0.9370 avg = 8.6
      Overall = 0.9711 max = 59

      F(63,102533) = 63818.67
      corr(u_i, Xb) = -0.0363 Prob > F = 0.0000

      ------------------------------------------------------------------------------
      lROA | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      year |
      1964 | .0069237 .0127331 0.54 0.587 -.018033 .0318804
      1965 | .002069 .0118185 0.18 0.861 -.0210952 .0252331
      1966 | .0087763 .0107554 0.82 0.415 -.012304 .0298567
      1967 | -.0055056 .0104756 -0.53 0.599 -.0260377 .0150264
      1968 | -.0087312 .0103038 -0.85 0.397 -.0289266 .0114642
      1969 | -.0019528 .0103815 -0.19 0.851 -.0223004 .0183949
      1970 | -.00981 .0104444 -0.94 0.348 -.0302808 .0106608
      1971 | -.0068977 .0103242 -0.67 0.504 -.027133 .0133377
      1972 | -.0101976 .0102889 -0.99 0.322 -.0303638 .0099685
      1973 | .0047715 .0106656 0.45 0.655 -.016133 .0256759
      1974 | .0202451 .0112444 1.80 0.072 -.0017937 .042284
      1975 | .0154346 .0108176 1.43 0.154 -.0057676 .0366369
      1976 | .0144639 .01055 1.37 0.170 -.006214 .0351417
      1977 | .0187628 .010508 1.79 0.074 -.0018328 .0393583
      1978 | .0247072 .0103998 2.38 0.018 .0043237 .0450906
      1979 | .0211847 .0103137 2.05 0.040 .00097 .0413994
      1980 | .0050589 .0102035 0.50 0.620 -.0149399 .0250576
      1981 | .0043823 .0101897 0.43 0.667 -.0155895 .024354
      1982 | .0025303 .010152 0.25 0.803 -.0173675 .0224281
      1983 | -.011525 .0100277 -1.15 0.250 -.0311792 .0081293
      1984 | -.0046522 .0100421 -0.46 0.643 -.0243344 .0150301
      1985 | -.0098299 .0100263 -0.98 0.327 -.0294812 .0098214
      1986 | -.0180935 .0100206 -1.81 0.071 -.0377339 .0015468
      1987 | -.0100018 .0100364 -1.00 0.319 -.029673 .0096694
      1988 | -.008635 .0100358 -0.86 0.390 -.0283052 .0110351
      1989 | -.0044743 .0100572 -0.44 0.656 -.0241863 .0152377
      1990 | -.0064634 .010105 -0.64 0.522 -.0262692 .0133423
      1991 | -.0115139 .0100637 -1.14 0.253 -.0312387 .0082109
      1992 | -.0127256 .0100301 -1.27 0.205 -.0323844 .0069333
      1993 | -.0134617 .0099836 -1.35 0.178 -.0330294 .006106
      1994 | -.0088791 .0099676 -0.89 0.373 -.0284154 .0106572
      1995 | -.0077428 .0099559 -0.78 0.437 -.0272563 .0117707
      1996 | -.0120508 .0099427 -1.21 0.226 -.0315384 .0074368
      1997 | -.0145307 .0099412 -1.46 0.144 -.0340153 .0049539
      1998 | -.0174536 .0099887 -1.75 0.081 -.0370314 .0021241
      1999 | -.0154207 .0100158 -1.54 0.124 -.0350516 .0042102
      2000 | -.016575 .0100572 -1.65 0.099 -.036287 .0031369
      2001 | -.0228049 .0100875 -2.26 0.024 -.0425762 -.0030335
      2002 | -.022633 .0101118 -2.24 0.025 -.042452 -.002814
      2003 | -.0257311 .0100246 -2.57 0.010 -.0453792 -.006083
      2004 | -.0220746 .0099905 -2.21 0.027 -.0416558 -.0024933
      2005 | -.0184037 .0099946 -1.84 0.066 -.0379929 .0011856
      2006 | -.0197889 .0099948 -1.98 0.048 -.0393786 -.0001993
      2007 | -.0196562 .0100278 -1.96 0.050 -.0393106 -1.77e-06
      2008 | -.015646 .0101792 -1.54 0.124 -.035597 .0043051
      2009 | -.0219553 .0101266 -2.17 0.030 -.0418033 -.0021073
      2010 | -.02419 .0100678 -2.40 0.016 -.0439228 -.0044571
      2011 | -.02093 .0100956 -2.07 0.038 -.0407172 -.0011427
      2012 | -.0264883 .0101017 -2.62 0.009 -.0462875 -.0066891
      2013 | -.032125 .0100845 -3.19 0.001 -.0518904 -.0123595
      2014 | -.03579 .0100978 -3.54 0.000 -.0555815 -.0159984
      2015 | -.0358052 .0101457 -3.53 0.000 -.0556907 -.0159196
      2016 | -.0325113 .0101493 -3.20 0.001 -.0524037 -.0126188
      2017 | -.0343636 .0101453 -3.39 0.001 -.0542482 -.0144791
      2018 | -.0326884 .0101876 -3.21 0.001 -.0526559 -.0127208
      2019 | -.0444578 .0102167 -4.35 0.000 -.0644824 -.0244333
      2020 | -.0470176 .010289 -4.57 0.000 -.0671839 -.0268512
      2021 | -.0353755 .0102203 -3.46 0.001 -.0554072 -.0153438
      2022 | -.0434875 .0213021 -2.04 0.041 -.0852394 -.0017356
      |
      lProfMar | .9891585 .0005563 1778.15 0.000 .9880682 .9902488
      lICturn | .8929487 .0012276 727.37 0.000 .8905426 .8953549
      lICrat | .8968841 .0012234 733.12 0.000 .8944863 .8992819
      lFL | -.0091148 .0012126 -7.52 0.000 -.0114915 -.0067382
      _cons | -8.159318 .0163737 -498.32 0.000 -8.19141 -8.127226
      -------------+----------------------------------------------------------------
      sigma_u | .20330132
      sigma_e | .11947974
      rho | .74327961 (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(13565, 102533) = 5.35 Prob > F = 0.0000

      2. Regression for one industry (code 10):

      bysort gsubind2: xtreg lROA i.year lProfMar lICturn lICrat lFL, fe

      ----------------------------------------------------------------------------------------------------------------------------
      -> gsubind2 = 10

      Fixed-effects (within) regression Number of obs = 8,596
      Group variable: gvkeystr2_~m Number of groups = 1,069

      R-squared: Obs per group:
      Within = 0.9498 min = 1
      Between = 0.8860 avg = 8.0
      Overall = 0.9460 max = 53

      F(60,7467) = 2353.56
      corr(u_i, Xb) = -0.2313 Prob > F = 0.0000

      ------------------------------------------------------------------------------
      lROA | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      year |
      1966 | .0056595 .051136 0.11 0.912 -.0945814 .1059005
      1967 | .0013645 .0491675 0.03 0.978 -.0950176 .0977465
      1968 | .0026137 .0476723 0.05 0.956 -.0908373 .0960648
      1969 | .0129865 .0487454 0.27 0.790 -.0825683 .1085412
      1970 | .0162369 .0487671 0.33 0.739 -.0793604 .1118343
      1971 | .000353 .0481193 0.01 0.994 -.0939743 .0946804
      1972 | -.016471 .0467775 -0.35 0.725 -.108168 .0752261
      1973 | .0160064 .0472965 0.34 0.735 -.0767081 .108721
      1974 | .0893322 .0487886 1.83 0.067 -.0063073 .1849717
      1975 | .0644704 .0480349 1.34 0.180 -.0296916 .1586324
      1976 | .0572181 .046489 1.23 0.218 -.0339135 .1483496
      1977 | .0502612 .0467333 1.08 0.282 -.0413492 .1418716
      1978 | .0603402 .0465344 1.30 0.195 -.0308802 .1515607
      1979 | .055055 .0453888 1.21 0.225 -.0339199 .1440298
      1980 | .0341805 .0448958 0.76 0.446 -.0538279 .1221889
      1981 | .062656 .0449156 1.39 0.163 -.0253912 .1507033
      1982 | .0506718 .0463555 1.09 0.274 -.040198 .1415416
      1983 | .0328803 .0460859 0.71 0.476 -.057461 .1232216
      1984 | .0172229 .0466827 0.37 0.712 -.0742884 .1087341
      1985 | .052386 .0472938 1.11 0.268 -.0403232 .1450952
      1986 | -.0241576 .0480769 -0.50 0.615 -.1184018 .0700866
      1987 | .0228294 .0470354 0.49 0.627 -.0693734 .1150321
      1988 | .0254858 .0464024 0.55 0.583 -.0654761 .1164476
      1989 | .0110575 .0453529 0.24 0.807 -.0778468 .0999619
      1990 | .0426788 .0451751 0.94 0.345 -.0458772 .1312347
      1991 | .0145708 .0458403 0.32 0.751 -.075289 .1044307
      1992 | .017971 .0456141 0.39 0.694 -.0714455 .1073875
      1993 | .0169826 .044938 0.38 0.706 -.0711085 .1050736
      1994 | .0166574 .0450162 0.37 0.711 -.071587 .1049019
      1995 | .0328391 .0448633 0.73 0.464 -.0551056 .1207838
      1996 | .0380357 .0444041 0.86 0.392 -.0490089 .1250804
      1997 | .0294296 .0444018 0.66 0.507 -.0576105 .1164696
      1998 | .0273788 .0457754 0.60 0.550 -.0623538 .1171114
      1999 | -.0046768 .0453581 -0.10 0.918 -.0935915 .0842378
      2000 | .0352222 .0446429 0.79 0.430 -.0522905 .1227349
      2001 | .0048705 .0449404 0.11 0.914 -.0832253 .0929664
      2002 | .0093083 .0453393 0.21 0.837 -.0795696 .0981862
      2003 | .0097455 .0447806 0.22 0.828 -.0780371 .097528
      2004 | .0440651 .0445012 0.99 0.322 -.0431699 .1313001
      2005 | .0600117 .0443842 1.35 0.176 -.0269939 .1470172
      2006 | .055208 .0443398 1.25 0.213 -.0317105 .1421264
      2007 | .0455613 .0443865 1.03 0.305 -.0414488 .1325714
      2008 | .0273521 .0457267 0.60 0.550 -.0622852 .1169894
      2009 | .0543114 .0453021 1.20 0.231 -.0344934 .1431163
      2010 | .0324615 .0447871 0.72 0.469 -.0553338 .1202568
      2011 | .0332788 .0447314 0.74 0.457 -.0544074 .120965
      2012 | .0123594 .0449021 0.28 0.783 -.0756613 .1003801
      2013 | .010551 .0447639 0.24 0.814 -.0771988 .0983009
      2014 | -.0111098 .0451852 -0.25 0.806 -.0996856 .077466
      2015 | -.0334939 .0472266 -0.71 0.478 -.1260714 .0590835
      2016 | .0327695 .0470966 0.70 0.487 -.0595531 .1250921
      2017 | -.0135446 .0459106 -0.30 0.768 -.1035423 .076453
      2018 | .0090506 .046747 0.19 0.846 -.0825868 .1006879
      2019 | -.0119334 .0474316 -0.25 0.801 -.1049127 .0810458
      2020 | .0365973 .05269 0.69 0.487 -.06669 .1398846
      2021 | .0137158 .0473417 0.29 0.772 -.0790873 .1065189
      |
      lProfMar | .9502819 .0028979 327.92 0.000 .9446012 .9559626
      lICturn | .8063445 .0052876 152.50 0.000 .7959793 .8167097
      lICrat | .8137707 .0053706 151.52 0.000 .8032428 .8242986
      lFL | -.0414511 .0067961 -6.10 0.000 -.0547734 -.0281288
      _cons | -7.276817 .071626 -101.59 0.000 -7.417224 -7.13641
      -------------+----------------------------------------------------------------
      sigma_u | .3062449
      sigma_e | .18510303
      rho | .73242185 (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(1068, 7467) = 5.53 Prob > F = 0.0000

      Comment


      • #4
        Thomas:
        1) Correct;
        2) Not quite, You should test the joint statistical significance of -year- via:
        Code:
        testparm i.year
        3) Not quite. -i.industry- is omitted being time-invariant;
        4) Correct if you men that they are not the predictors you're really interested in.
        5) Cluster-robust standard errors can be invoked via -robust- or -vce(cluster clusterid)- options from -xtreg-. Since you have a huge number of panels (1 panel=1 cluster), cluster-robust standard errors are recommended.
        6) yes, your model suffers from mispsecification.
        Last edited by Carlo Lazzaro; 08 Oct 2022, 08:51.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hello Carlo,

          Thanks a lot for your input, that is highly appreciated.
          It seems that my econometrics knowledge is limited here, so sorry for some further questions:

          2) When running the command testparm i.year, the result below will be calculated. Does that mean I can sum up the coefficient per year, to reach the above mentioned goal?

          3) But we should account for industry fixed effects as we do have them, right? Why can we then omit i.industry?

          5) Incorporating cluster robust standard errors via -vce(cluster clusterid (e.g. year)) means that we do not have to test for heteroskedasticity or serial correlation?

          6) Is there any quick hint how to solve this misspecification? Adding more variables? My regression equation is basically defining ROA (y) = Profit Margin (x1) + Intangible Capital turnover (x2) + Intangible Capital (x3) ratio
          This formula results from when applying the DuPont idea on the initial ROA equation = Net Income / Total Assets --> ROA = Net Income / Sales (Profit Margin) * Sales /Intangible Capital (Intangible Capital turnover) * Intangible Capital / Total Assets (Intangible Capital ratio) --> and then taking the log

          Could misspecification result from the case that I regress variables which are too dependent on each other as per the formula above?


          Stata output

          . testparm i.year


          ( 1) 1964.year = 0
          ( 2) 1965.year = 0
          ( 3) 1966.year = 0
          ( 4) 1967.year = 0
          ( 5) 1968.year = 0
          ( 6) 1969.year = 0
          ( 7) 1970.year = 0
          ( 8) 1971.year = 0
          ( 9) 1972.year = 0
          (10) 1973.year = 0
          (11) 1974.year = 0
          (12) 1975.year = 0
          (13) 1976.year = 0
          (14) 1977.year = 0
          (15) 1978.year = 0
          (16) 1979.year = 0
          (17) 1980.year = 0
          (18) 1981.year = 0
          (19) 1982.year = 0
          (20) 1983.year = 0
          (21) 1984.year = 0
          (22) 1985.year = 0
          (23) 1986.year = 0
          (24) 1987.year = 0
          (25) 1988.year = 0
          (26) 1989.year = 0
          (27) 1990.year = 0
          (28) 1991.year = 0
          (29) 1992.year = 0
          (30) 1993.year = 0
          (31) 1994.year = 0
          (32) 1995.year = 0
          (33) 1996.year = 0
          (34) 1997.year = 0
          (35) 1998.year = 0
          (36) 1999.year = 0
          (37) 2000.year = 0
          (38) 2001.year = 0
          (39) 2002.year = 0
          (40) 2003.year = 0
          (41) 2004.year = 0
          (42) 2005.year = 0
          (43) 2006.year = 0
          (44) 2007.year = 0
          (45) 2008.year = 0
          (46) 2009.year = 0
          (47) 2010.year = 0
          (48) 2011.year = 0
          (49) 2012.year = 0
          (50) 2013.year = 0
          (51) 2014.year = 0
          (52) 2015.year = 0
          (53) 2016.year = 0
          (54) 2017.year = 0
          (55) 2018.year = 0
          (56) 2019.year = 0
          (57) 2020.year = 0
          (58) 2021.year = 0
          (59) 2022.year = 0

          F( 59,102533) = 11.24
          Prob > F = 0.0000

          Comment


          • #6
            Tobias:
            1) reading any decent textbook on panel data regression is recommended;
            2) not quite. The -testparm- output simply tells you that, as it is often the case with the -fe- estimator, the T dimension contributes to explain variationd in thr conditional mean of the regressand. To obtain what you're sermingly after, you should use -predict-;
            3) if -i.industry- is a time-invariant predictor, the -fe- machinery will wipe you out. Try it yourself and see that -i.industry- will be omitted;
            5) in short panel the standard errors should be clustered on -panelid- or other higher level predictor, if feasible, but not on -timevar-. That said, there's no gain in testing for heteroskedasticity and autocorrelation after imposing cluster-robust standard errors;
            6) including more predictors (and/or their interactions) is a possible fix.
            Last edited by Carlo Lazzaro; 08 Oct 2022, 17:21.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hello Carlo,

              Thanks a lot for your input, that is really helpful.
              One last question regarding the estimation of coefficient for multiple years, meaning for example per decade.
              What should -predict discover? I mean, I have to sum up the coefficient for each year I am interested in with the "basic" coefficient for the respective at the bottom right? Or do I have to calculate an average?
              In literature there is not such an example..

              xtreg lROA i.year lProfMar lICturn lICrat lFL, fe cluster (gsubind2)

              Fixed-effects (within) regression Number of obs = 77706
              Group variable: gvkeystr2_~m Number of groups = 10433

              R-sq: Within = 0.9811 Obs per group: min = 1
              Between = 0.9555 avg = 7.4
              Overall = 0.9793 max = 41

              F(7,8) = .
              corr(u_i, Xb) = 0.0029 Prob > F = .

              (Std. err. adjusted for 9 clusters in gsubind2)
              ------------------------------------------------------------------------------
              | Robust
              lROA | Coefficient std. err. t P>|t| [95% conf. interval]
              -------------+----------------------------------------------------------------
              year |
              1971 | .0037102 .0037344 0.99 0.350 -.0049013 .0123217
              1972 | -.0004549 .0032825 -0.14 0.893 -.0080243 .0071146
              1973 | .009482 .0040338 2.35 0.047 .0001801 .018784
              1974 | .0218267 .0064618 3.38 0.010 .0069257 .0367276
              1975 | .0160525 .0055546 2.89 0.020 .0032436 .0288614
              1976 | .0180549 .004753 3.80 0.005 .0070945 .0290152
              1977 | .0186226 .0019586 9.51 0.000 .014106 .0231391
              1978 | .0246361 .006883 3.58 0.007 .0087639 .0405082
              1979 | .0217661 .0058023 3.75 0.006 .008386 .0351461
              1980 | .0146647 .0052949 2.77 0.024 .0024547 .0268747
              1981 | .0127611 .0033166 3.85 0.005 .005113 .0204091
              1982 | .0104817 .004135 2.53 0.035 .0009464 .0200169
              1983 | -.0033174 .0018555 -1.79 0.112 -.0075961 .0009613
              1984 | .0016403 .0036649 0.45 0.666 -.006811 .0100916
              1985 | -.0027814 .0040268 -0.69 0.509 -.0120671 .0065044
              1986 | -.0102921 .0044771 -2.30 0.051 -.0206162 .000032
              1987 | -.0031316 .0041812 -0.75 0.475 -.0127734 .0065103
              1988 | -.0025461 .0050648 -0.50 0.629 -.0142255 .0091334
              1989 | .0008585 .0036165 0.24 0.818 -.0074811 .0091982
              1990 | .0011939 .0049889 0.24 0.817 -.0103104 .0126982
              1991 | -.0035383 .0054198 -0.65 0.532 -.0160364 .0089598
              1992 | -.003129 .0048451 -0.65 0.536 -.0143018 .0080439
              1993 | -.0038914 .0032702 -1.19 0.268 -.0114325 .0036496
              1994 | -.0005659 .0044461 -0.13 0.902 -.0108187 .0096869
              1995 | .0006614 .0061458 0.11 0.917 -.0135108 .0148336
              1996 | -.0027771 .0057062 -0.49 0.640 -.0159356 .0103815
              1997 | -.0054698 .0048149 -1.14 0.289 -.016573 .0056334
              1998 | -.0083757 .0049414 -1.70 0.129 -.0197705 .0030192
              1999 | -.0046518 .004551 -1.02 0.337 -.0151464 .0058428
              2000 | -.0031321 .0031807 -0.98 0.354 -.0104668 .0042026
              2001 | -.0084371 .0039738 -2.12 0.066 -.0176007 .0007264
              2002 | -.014386 .003155 -4.56 0.002 -.0216616 -.0071105
              2003 | -.0163173 .0025905 -6.30 0.000 -.022291 -.0103436
              2004 | -.0123025 .0041768 -2.95 0.019 -.0219341 -.0026709
              2005 | -.0090141 .0043404 -2.08 0.071 -.019023 .0009948
              2006 | -.0128986 .0050258 -2.57 0.033 -.0244882 -.001309
              2007 | -.0117261 .0053007 -2.21 0.058 -.0239496 .0004974
              2008 | -.0058919 .0063128 -0.93 0.378 -.0204491 .0086654
              2009 | -.0169482 .0076545 -2.21 0.058 -.0345996 .0007032
              2010 | -.0146992 .0060312 -2.44 0.041 -.0286071 -.0007912
              |
              lProfMar | .9964079 .0048482 205.52 0.000 .985228 1.007588
              lICturn | .9111991 .0111505 81.72 0.000 .885486 .9369121
              lICrat | .9109616 .0106905 85.21 0.000 .8863092 .9356141
              lFL | -.0093427 .0042198 -2.21 0.058 -.0190734 .0003881
              _cons | -8.321487 .1243381 -66.93 0.000 -8.608211 -8.034763
              -------------+----------------------------------------------------------------
              sigma_u | .16190806
              sigma_e | .10390372
              rho | .7082971 (fraction of variance due to u_i)
              ---------------------------------------------------------------

              Comment


              • #8
                Tomas:
                1) you have to consider that the -i.year- contribution to variation inn the conditional mean of the regressand is adjusted for the remianing predictors;
                2) via -predict- you obtian the linear prediction for a given observation multiplying the variables by their coefficients, s you can see in the following (too basic to be true) toy-example:
                Code:
                . use "https://www.stata-press.com/data/r17/nlswork.dta"
                (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                
                . xtreg ln_wage age i.year if year<=72, fe
                
                Fixed-effects (within) regression               Number of obs     =      7,828
                Group variable: idcode                          Number of groups  =      2,943
                
                R-squared:                                      Obs per group:
                     Within  = 0.1055                                         min =          1
                     Between = 0.0469                                         avg =        2.7
                     Overall = 0.0105                                         max =          5
                
                                                                F(5,4880)         =     115.10
                corr(u_i, Xb) = -0.4195                         Prob > F          =     0.0000
                
                ------------------------------------------------------------------------------
                     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         age |  -.0304386   .0243563    -1.25   0.211    -.0781879    .0173107
                             |
                        year |
                         69  |   .1241075   .0251265     4.94   0.000     .0748482    .1733668
                         70  |   .1647007   .0494931     3.33   0.001      .067672    .2617295
                         71  |   .2681541   .0735818     3.64   0.000     .1239006    .4124077
                         72  |   .3516117   .0976534     3.60   0.000     .1601671    .5430562
                             |
                       _cons |   1.999258   .4852245     4.12   0.000     1.047999    2.950516
                -------------+----------------------------------------------------------------
                     sigma_u |  .43375559
                     sigma_e |  .23170746
                         rho |  .77799345   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                F test that all u_i=0: F(2942, 4880) = 5.79                  Prob > F = 0.0000
                
                . predict fitted, xb
                
                
                . list idcode year age fitted if idcode==1 & year<=72
                
                       +--------------------------------+
                       | idcode   year   age     fitted |
                       |--------------------------------|
                    1. |      1     70    18   1.616064 |
                    2. |      1     71    19   1.689079 |
                    3. |      1     72    20   1.742098 |
                       +--------------------------------+
                
                . di  1.999258 + (.1647007)+(18* -.0304386)
                1.6160639
                
                . di  1.999258 + (.2681541 )+(19* -.0304386)
                1.6890787
                
                . di  1.999258 + (.3516117 )+(20* -.0304386)
                1.7420977
                
                .
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hello Carlo,
                  Many thanks for your response. Sorry for coming back that late as I was working on different projects.

                  I am currently not really sure when to take the log / ln in my panel data regressions.
                  May I ask for your advice / ideas when considering the following regressions:

                  First, I am analysing the impact of intangible capital on Return on Assets (as the dependent variable) and in another approach I am analysing the impact on Return on Equity (as another dependent variable).

                  As my independent variable of interest, I have chosen the Intangible Capital Ratio (Intangible Capital / Total Assets)
                  Furthermore, I have chosen the following control variables:
                  1. Financial Leverage (Total Assets / Shareholders Equity book value)
                  2. Firm Size (measured by total assets)
                  3. Efficiency (sales / total assets)
                  4. Sales growth (annual growth rate of sales in %)
                  5. Asset growth (annual growth rate of assets in %)
                  6. Profit Margin (net income/sales)
                  7. Intangible Capital turnover (sales / intangible capital)
                  Which of the variables (ROA, ROE, Intangible Capital Ratio and my 7 control variables) would you transform in a log format in order to run panel data regressions?

                  Many thanks in advance for your answer.

                  Comment


                  • #10
                    Tobias:
                    the issue is larger.
                    1) yoummay want to log the regressand and keep the predictors in their linear metric (log-linear regerssion, often used in econometrics because it allows to express in % terms the contribution of each regressor to variation in the cinditional mean of the regressand;
                    2) you may want to log all the terms of your regression (log-log regerssion), taht allows you to express what above as the elasticity of Y with respect to each regressors.

                    In addition, you may want to go 1) to fix some forms of heteroskedasticity and/or misspecification of the functional form of theb regressand.

                    That said, I would take a look at the literature un your research field and see what is the most frequently used approach/regerssio specification (-fe- or -re- included).



                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hello Carlo,

                      Great, many thanks for your input.

                      In literature I find both approaches, so I was curious if there is a kind of limitation / guidance when to use which transformation.
                      When speaking about the log-log regression, would for example one possible finding (coefficient of Intangible Capital Ratio) could be that the Intangible Capital Ratio explains 30% of changes in the regressand (e.g. the ratio lIARg exhibits an influence of 30% on the ROE; it explains almost 0,30% of the growth in ROE when it changes 1%)?
                      -> this is for example one example from literature where all variables (dependent and independent were log transformed).

                      Fixed effects makes the most sense as I am dealing with firm and year effects, which is also confirmed by the Hausman test.

                      Many thanks for your input.

                      Comment


                      • #12
                        Tobias:
                        1) yes, the log-log regression results that you mention makes sense;
                        2) fixed effect is the first choice in this kind of researches;
                        3) -hausman- test works with default standard errors only. If you impose non-default standard errors you should consider the community-contributed module -xtoverid- or the Mundlak approach to compare -fe- vs. -re- specification.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Thanks a lot Carlo.

                          When testing as you suggested, should I also test year effects with testparm when using clustered robust standard errors?
                          Clustered robust standard errors control also for heteroskedasticity right?

                          My regression code would look like:

                          xtreg lROA i.year lICrat lFL at lEff g_sales_w g_at_w lProfMar lICturn, fe cluster(gsubind2)

                          testparm i.year

                          Comment


                          • #14
                            Sorry for another note but the command -xtoverid- does not seem to work (Stata 17.0) installed.
                            How does this command work?

                            Comment


                            • #15
                              Tobias;
                              try -search xtoverid- and follow the instructions reported in its -helpfile- to install it.
                              Please note that you also have to install some ancillary community-contributed modules that support -xtoverid-.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X