Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Small Sample_Regression-Type

    Dear all, I've got a - more or less - general question (regardless of the specific variables I use): I am working on a small panel data sample, N=7, T=9. Now, I'm wondering which regression specification I should use. I did a OLS regression with time-fixed and entity-fixed effects, but I don't know if it is appropriate regarding the small sample size. Hope you can help me. (I'm a new user in the forum, I didn't found a similar topic. So, if I am wrong, I apologise.)

  • #2
    Liam:
    welcome to the list.
    Your chances of getting helpful replies are conditional on posting exactly what you typed and what Stata geve you back (as per FAQ). Thanks.
    As you posted, the main concern relates to your limited sample size. That said, in addition to pooled OLS, you may want to take a look at -xtpcse- and -xtgls- entries in Stata .pds manual.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks for your reply, Carlo.

      Basically, I want to analyse the relationship between a company's marketing expenditures and a performance measure. I use reg with year-dummies (i.year) and entity-dummies (i.company_id) [xtreg with i.year should have the same effect, shouldn't it?]. Furthermore I use other independent variables to proxy for financial performance, dummies for regional characteristics and a dummy if a company was acquired by another one.

      This is my command and the output:

      Code:
      reg ln_q derivatives marketingexptex cashratio capex_to_sales ln_assets region_1_dum region_2_dum acq_dum i.company_id i.year, vce(cluster company_id)
      Code:
      Linear regression                   Number of obs =      62
                                                             F(  5,     6) =       .
                                                             Prob > F      =     .
                                                             R-squared     =  0.8139
                                                             Root MSE      =  .16284
      
                                      (Std. Err. adjusted for 7 clusters in company_id)
      ---------------------------------------------------------------------------------
                      |               Robust
                 ln_q |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      --------------------------------------------------------------------------------
      derivatives |   .1863721   .1833022     1.02   0.349    -.2621522    .6348964
      marketingexptex |   1.426789   1.370361     1.04   0.338    -1.926363    4.779941
      cashratio |   .0168063   .0154145     1.09   0.317    -.0209116    .0545242
      capex_to_sales |   1.208879   .6356739     1.90   0.106    -.3465594    2.764317
      ln_assets |   .1951717   .3236715     0.60   0.569    -.5968239    .9871674
      region_1_dum |   .2871823   .0906221     3.17   0.019      .065438    .5089266
      region_2_dum |  -.3879954   .1957098    -1.98   0.095      -.86688    .0908891
      acq_dum |   -.073599   .0705417    -1.04   0.337    -.2462083    .0990103
                        |
      company_id |
                   2  |   .1973345   .4842171     0.41   0.698    -.9875019    1.382171
                   3  |   .1509188   .5682934     0.27   0.799    -1.239645    1.541483
                   4  |   .4792309   .5189663     0.92   0.391     -.790634    1.749096
                   5  |   .0135209   .1403323     0.10   0.926    -.3298598    .3569015
                   8  |   .2825196   .3950338     0.72   0.501    -.6840934    1.249133
                   9  |   .2615732    .479411     0.55   0.605    -.9115034     1.43465
                      |
      year |
                2007  |  -.1647865   .1058503    -1.56   0.171    -.4237929    .0942199
                2008  |    -.00062    .121679    -0.01   0.996    -.2983578    .2971178
                2009  |  -.0441678   .1168938    -0.38   0.719    -.3301966    .2418609
                2010  |   .0047404   .1421144     0.03   0.974     -.343001    .3524818
                2011  |  -.2700914   .2085211    -1.30   0.243    -.7803242    .2401413
                2012  |  -.2176848   .2255366    -0.97   0.372     -.769553    .3341834
                2013  |   .0865537   .2119863     0.41   0.697     -.432158    .6052655
                2014  |   .4043251   .1888094     2.14   0.076    -.0576749    .8663251
                      |
                _cons |   -4.58519   4.830504    -0.95   0.379    -16.40501    7.234627
      I hope this isn't too confusing.

      Comment


      • #4
        Liam:
        thanks for providing more details.
        Some remarks about your model:
        - you seem to have too many predictors for a very small sample size (this is probably the reason why F-test was not reported). Hence, the first advice would be to be more parsimoniuos and reduce the number of predictors. Please consider that there should be 20 observations per predictor (Katz MH. Multivariable Analysis. Second Edtion. NY: Cambridge University Press, 2006: 81), even though 10 obs per predictor may sound wise enough;
        - most of your coefficients are not significant (this is probably due to the small sample size; by the way, can't you collect more observations?);
        - you seem to have a quite high R-squared. However, I would take a look at -estat vif- and -estat vce, corr- to test whether there's some dangerous correlation issue;
        - before -xtreg- (but I would not consider this kind of panel data regression feasible for your data, as you have a "small N, large T" datataset) you should -xtset- your data. In your case:


        Code:
        xtset company_id year
        If you add i.year among predictors after having -xtset- your data, you run the risk to see i.year (or, better, some of those years) dropped due to collinearity.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo, many thanks for the detailled answer. I worked with your recommendations and dropped two correlated variables (corr > 0.6) and furthermore did a fe-regression with entity- and year-effects.

          Code:
           xtreg ln_q derivatives marketingexptex cashratio capex_to_sales ln_assets acq_dum i.year, fe vce(cluster company_id)
          It seems to be okay -- regardless the insignificance of the coefficients --, no time-variables were dropped. But I'm wondering about the F-test. It is not reported if I'm using vce(cluster ...) or robust as standard errors. But it is reported with "normal" standard errors.

          via:
          Code:
          xtreg ln_q derivatives marketingexptex cashratio capex_to_sales ln_assets acq_dum i.year, fe
          Output:
          Code:
          Fixed-effects (within) regression               Number of obs      =        62
          Group variable: company_id                      Number of groups   =         7
          
          R-sq:  within  = 0.7447                         Obs per group: min =         8
                 between = 0.7753                                        avg =       8.9
                 overall = 0.7484                                        max =         9
          
                                                          F(14,41)           =      8.54
          corr(u_i, Xb)  = 0.0199                         Prob > F           =    0.0000
          
          ---------------------------------------------------------------------------------
                     ln_q |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
              derivatives |   .0001196   .1896967     0.00   1.000    -.3829808    .3832199
          marketingexptex |   1.580859   1.269536     1.25   0.220    -.9830205    4.144738
                cashratio |   .0232494    .011081     2.10   0.042      .000871    .0456278
           capex_to_sales |   .9651949   .5684589     1.70   0.097    -.1828312    2.113221
                ln_assets |   .0601555   .1482983     0.41   0.687     -.239339    .3596501
                  acq_dum |  -.0055041    .107766    -0.05   0.960     -.223142    .2121337
                          |
                     year |
                    2007  |  -.1404156   .1015493    -1.38   0.174    -.3454986    .0646674
                    2008  |   .0545582   .1480937     0.37   0.714     -.244523    .3536395
                    2009  |  -.0085791   .1168169    -0.07   0.942    -.2444957    .2273374
                    2010  |  -.0297073   .1288475    -0.23   0.819      -.28992    .2305054
                    2011  |    -.27442   .1309857    -2.10   0.042     -.538951    -.009889
                    2012  |  -.2165632   .1405829    -1.54   0.131    -.5004762    .0673498
                    2013  |    .075454   .1395665     0.54   0.592    -.2064062    .3573142
                    2014  |   .4408675   .1457988     3.02   0.004     .1464209    .7353141
                          |
                    _cons |  -2.183175   2.328384    -0.94   0.354    -6.885442    2.519091
          ----------------+----------------------------------------------------------------
                  sigma_u |  .05728544
                  sigma_e |  .17258376
                      rho |  .09924226   (fraction of variance due to u_i)
          ---------------------------------------------------------------------------------
          F test that all u_i=0:     F(6, 41) =     0.84               Prob > F = 0.5469
          What does it mean? Can I do the regression without clustered or robust se?

          Comment


          • #6
            Liam:
            If you
            suspect that there is heteroskedasticity or within-panel serial correlation in the idiosyncratic error term"
            you
            could specify the vce(robust) option
            (source: -xt-entry in Stata 13.1 .pdf manual, page 372).
            Besides:
            The F test of i = 0 is suppressed because it is too difficult to compute the robust form of the statistic when there are more than a few panels.
            (source: -xt-entry in Stata 13.1 .pdf manual, page 373).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Many thanks for your replies, Carlo

              Comment


              • #8
                Something Carlo mentioned seems to have been lost. You're estimating 15 direct parameters plus the fixed effects with 60 observations. Instead of 1 to 10, you're looking at 1 to 3 parameters to observations. With this kind of sample to variable ratio, it is quite possible to get high R-square just due to the number of parameters.
                You need a lot more data. Data on most of these variables is readily available. Even if you don't have access to the on-line data bases (Compustat or Bloomberg for example), you can collect the data by hand. US public firm firm filings are available through the SEC's Edgar system.

                If you were going to really try to interpret the xtreg results, the first thing I would notice is that you cannot reject the hypothesis that the fixed effects are all zero. If this were an appropriately large sample estimation, then you could argue that you don't need to include firm effects at all (saving many degrees of freedom). If the F for all u_i=0 were significant, the low correlation between the fixed effects and the Xb suggests you should use a Hausman test whether you can get away with random effects.

                Comment

                Working...
                X