Small Sample_Regression-Type

Liam Taylor

Join Date: Jul 2015

Posts: 4
#1

Small Sample_Regression-Type

16 Jul 2015, 00:58

Dear all, I've got a - more or less - general question (regardless of the specific variables I use): I am working on a small panel data sample, N=7, T=9. Now, I'm wondering which regression specification I should use. I did a OLS regression with time-fixed and entity-fixed effects, but I don't know if it is appropriate regarding the small sample size. Hope you can help me. (I'm a new user in the forum, I didn't found a similar topic. So, if I am wrong, I apologise.)
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#2

16 Jul 2015, 01:57

Liam:
welcome to the list.
Your chances of getting helpful replies are conditional on posting exactly what you typed and what Stata geve you back (as per FAQ). Thanks.
As you posted, the main concern relates to your limited sample size. That said, in addition to pooled OLS, you may want to take a look at -xtpcse- and -xtgls- entries in Stata .pds manual.

Kind regards,
Carlo
(Stata 19.0)
Comment

Liam Taylor

Join Date: Jul 2015
Posts: 4

16 Jul 2015, 03:12

Thanks for your reply, Carlo.

Basically, I want to analyse the relationship between a company's marketing expenditures and a performance measure. I use reg with year-dummies (i.year) and entity-dummies (i.company_id) [xtreg with i.year should have the same effect, shouldn't it?]. Furthermore I use other independent variables to proxy for financial performance, dummies for regional characteristics and a dummy if a company was acquired by another one.

This is my command and the output:

Code:

reg ln_q derivatives marketingexptex cashratio capex_to_sales ln_assets region_1_dum region_2_dum acq_dum i.company_id i.year, vce(cluster company_id)

Code:

Linear regression                   Number of obs =      62
                                                       F(  5,     6) =       .
                                                       Prob > F      =     .
                                                       R-squared     =  0.8139
                                                       Root MSE      =  .16284

                                (Std. Err. adjusted for 7 clusters in company_id)
---------------------------------------------------------------------------------
                |               Robust
           ln_q |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------------------------------------------------------------
derivatives |   .1863721   .1833022     1.02   0.349    -.2621522    .6348964
marketingexptex |   1.426789   1.370361     1.04   0.338    -1.926363    4.779941
cashratio |   .0168063   .0154145     1.09   0.317    -.0209116    .0545242
capex_to_sales |   1.208879   .6356739     1.90   0.106    -.3465594    2.764317
ln_assets |   .1951717   .3236715     0.60   0.569    -.5968239    .9871674
region_1_dum |   .2871823   .0906221     3.17   0.019      .065438    .5089266
region_2_dum |  -.3879954   .1957098    -1.98   0.095      -.86688    .0908891
acq_dum |   -.073599   .0705417    -1.04   0.337    -.2462083    .0990103
                  |
company_id |
             2  |   .1973345   .4842171     0.41   0.698    -.9875019    1.382171
             3  |   .1509188   .5682934     0.27   0.799    -1.239645    1.541483
             4  |   .4792309   .5189663     0.92   0.391     -.790634    1.749096
             5  |   .0135209   .1403323     0.10   0.926    -.3298598    .3569015
             8  |   .2825196   .3950338     0.72   0.501    -.6840934    1.249133
             9  |   .2615732    .479411     0.55   0.605    -.9115034     1.43465
                |
year |
          2007  |  -.1647865   .1058503    -1.56   0.171    -.4237929    .0942199
          2008  |    -.00062    .121679    -0.01   0.996    -.2983578    .2971178
          2009  |  -.0441678   .1168938    -0.38   0.719    -.3301966    .2418609
          2010  |   .0047404   .1421144     0.03   0.974     -.343001    .3524818
          2011  |  -.2700914   .2085211    -1.30   0.243    -.7803242    .2401413
          2012  |  -.2176848   .2255366    -0.97   0.372     -.769553    .3341834
          2013  |   .0865537   .2119863     0.41   0.697     -.432158    .6052655
          2014  |   .4043251   .1888094     2.14   0.076    -.0576749    .8663251
                |
          _cons |   -4.58519   4.830504    -0.95   0.379    -16.40501    7.234627

I hope this isn't too confusing.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#4

16 Jul 2015, 03:50

Liam:
thanks for providing more details.
Some remarks about your model:
- you seem to have too many predictors for a very small sample size (this is probably the reason why F-test was not reported). Hence, the first advice would be to be more parsimoniuos and reduce the number of predictors. Please consider that there should be 20 observations per predictor (Katz MH. Multivariable Analysis. Second Edtion. NY: Cambridge University Press, 2006: 81), even though 10 obs per predictor may sound wise enough;
- most of your coefficients are not significant (this is probably due to the small sample size; by the way, can't you collect more observations?);
- you seem to have a quite high R-squared. However, I would take a look at -estat vif- and -estat vce, corr- to test whether there's some dangerous correlation issue;
- before -xtreg- (but I would not consider this kind of panel data regression feasible for your data, as you have a "small N, large T" datataset) you should -xtset- your data. In your case:

Code:

xtset company_id year

If you add i.year among predictors after having -xtset- your data, you run the risk to see i.year (or, better, some of those years) dropped due to collinearity.

Kind regards,
Carlo
(Stata 19.0)
Comment

Liam Taylor

Join Date: Jul 2015
Posts: 4

16 Jul 2015, 06:20

Carlo, many thanks for the detailled answer. I worked with your recommendations and dropped two correlated variables (corr > 0.6) and furthermore did a fe-regression with entity- and year-effects.

Code:

 xtreg ln_q derivatives marketingexptex cashratio capex_to_sales ln_assets acq_dum i.year, fe vce(cluster company_id)

It seems to be okay -- regardless the insignificance of the coefficients --, no time-variables were dropped. But I'm wondering about the F-test. It is not reported if I'm using vce(cluster ...) or robust as standard errors. But it is reported with "normal" standard errors.

via:

Code:

xtreg ln_q derivatives marketingexptex cashratio capex_to_sales ln_assets acq_dum i.year, fe

Output:

Code:

Fixed-effects (within) regression               Number of obs      =        62
Group variable: company_id                      Number of groups   =         7

R-sq:  within  = 0.7447                         Obs per group: min =         8
       between = 0.7753                                        avg =       8.9
       overall = 0.7484                                        max =         9

                                                F(14,41)           =      8.54
corr(u_i, Xb)  = 0.0199                         Prob > F           =    0.0000

---------------------------------------------------------------------------------
           ln_q |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
    derivatives |   .0001196   .1896967     0.00   1.000    -.3829808    .3832199
marketingexptex |   1.580859   1.269536     1.25   0.220    -.9830205    4.144738
      cashratio |   .0232494    .011081     2.10   0.042      .000871    .0456278
 capex_to_sales |   .9651949   .5684589     1.70   0.097    -.1828312    2.113221
      ln_assets |   .0601555   .1482983     0.41   0.687     -.239339    .3596501
        acq_dum |  -.0055041    .107766    -0.05   0.960     -.223142    .2121337
                |
           year |
          2007  |  -.1404156   .1015493    -1.38   0.174    -.3454986    .0646674
          2008  |   .0545582   .1480937     0.37   0.714     -.244523    .3536395
          2009  |  -.0085791   .1168169    -0.07   0.942    -.2444957    .2273374
          2010  |  -.0297073   .1288475    -0.23   0.819      -.28992    .2305054
          2011  |    -.27442   .1309857    -2.10   0.042     -.538951    -.009889
          2012  |  -.2165632   .1405829    -1.54   0.131    -.5004762    .0673498
          2013  |    .075454   .1395665     0.54   0.592    -.2064062    .3573142
          2014  |   .4408675   .1457988     3.02   0.004     .1464209    .7353141
                |
          _cons |  -2.183175   2.328384    -0.94   0.354    -6.885442    2.519091
----------------+----------------------------------------------------------------
        sigma_u |  .05728544
        sigma_e |  .17258376
            rho |  .09924226   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
F test that all u_i=0:     F(6, 41) =     0.84               Prob > F = 0.5469

What does it mean? Can I do the regression without clustered or robust se?

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#6

16 Jul 2015, 06:59

Liam:
If you

suspect that there is heteroskedasticity or within-panel serial correlation in the idiosyncratic error term"

you

could specify the vce(robust) option

(source: -xt-entry in Stata 13.1 .pdf manual, page 372).
Besides:

The F test of i = 0 is suppressed because it is too difficult to compute the robust form of the statistic when there are more than a few panels.

(source: -xt-entry in Stata 13.1 .pdf manual, page 373).

Kind regards,
Carlo
(Stata 19.0)
Comment
Liam Taylor

Join Date: Jul 2015

Posts: 4
#7

16 Jul 2015, 07:25

Many thanks for your replies, Carlo
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#8

16 Jul 2015, 11:52

Something Carlo mentioned seems to have been lost. You're estimating 15 direct parameters plus the fixed effects with 60 observations. Instead of 1 to 10, you're looking at 1 to 3 parameters to observations. With this kind of sample to variable ratio, it is quite possible to get high R-square just due to the number of parameters.
You need a lot more data. Data on most of these variables is readily available. Even if you don't have access to the on-line data bases (Compustat or Bloomberg for example), you can collect the data by hand. US public firm firm filings are available through the SEC's Edgar system.

If you were going to really try to interpret the xtreg results, the first thing I would notice is that you cannot reject the hypothesis that the fixed effects are all zero. If this were an appropriately large sample estimation, then you could argue that you don't need to include firm effects at all (saving many degrees of freedom). If the F for all u_i=0 were significant, the low correlation between the fixed effects and the Xb suggests you should use a Hausman test whether you can get away with random effects.
Comment

Announcement