Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with regression of (unbalanced) panel data - xtreg, statsby/regressby generating difference results

    Hi Statalist!

    First of all, thank you for all the help you give on a daily basis! It's been very helpful to lurk here, but now I find myself in a situation where I simply need to ask.

    I'm trying to regress Return (depvar) on RMRF SMB HML LIQ (indepvars) using panel data. My panel data set is companynum and date (monthly data 2007-03 to 2016-12), however, it is unbalanced.

    Now, when I use the different xtreg options (fe, re, et cetera) and xtgls, I get different results than when I use statsby or regressby. Furthermore, my regression results are quite different than expected, but less so when using statsby or regressby.

    I'm now asking: 1) which method to use (xt or statsby), and 2) if I'm using them incorrectly?
    I'm posting a smaller sample of my data and my regression commands. Not sure if the dataex output has been formatting correctly for your use, but the columns are date, companynum, RMRF, SMB, HML, LIQ. Date was reformatted in dataex, but is normally in a YYYYmm format (e.g. 2007m3).

    The different commands I've been using:
    . xtreg Return RMRF SMB HML LIQ, re vce(robust)
    statsby _b, by(companynum): reg Return RMRF SMB HML LIQ, robust noconstant

    The means of my coefficients from statsby are larger than the coefficients from xtreg. Shouldn't the mean of the coefficients from statsby be equal to the coefficients in the other reg commands, or is it wrong to think of it that way?

    Please let me know if you find anything that is incorrectly posted and I will correct accordingly.
    Many thanks!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(date companynum) double(RMRF SMB HML LIQ)
    677 1  -.03567645   .031857625   .035097148 -.0038981533515944945
    678 1   .05532546  .0022137638  .0075254366  .0028282892888514353
    679 1  .024670409  .0083233304  .0063983239   .048738782080504486
    680 1  .017835448   .025180005  -.006078342    .02920958419799065
    681 1 -.009465334  .0084097693   .012648277  -.011058678221505731
    682 1  .014410426  .0064111673  -.012598168   -.01494227358556188
    683 1  .029204069  -.026093747   .018178428  -.014232915409939827
    566 2  .052961133   .021395301 -.0039642137  -.036748350946197265
    567 2  .064811036   -.02813188  -.015092619  -.017005912917779073
    568 2  .016662389  -.046779487   .004310017   .003553448955587013
    569 2 -.024995016   .011651467  -.011661232  -.023518709608667893
    570 2 -.017945765  .0046412922    .03236885  .0056249091016424176
    571 2 -.029406879 -.0016394301 -.0072570862   .025491101902319215
    end
    format %tm date



  • #2
    The different commands I've been using:
    . xtreg Return RMRF SMB HML LIQ, re vce(robust)
    statsby _b, by(companynum): reg Return RMRF SMB HML LIQ, robust noconstant

    The means of my coefficients from statsby are larger than the coefficients from xtreg.
    Why would you expect regress and xtreg, re to yield the same results? You can only be surprised if you ran the same command and it resulted in different results.
    Last edited by Andrew Musau; 08 Apr 2019, 09:08.

    Comment


    • #3
      Maximilian:
      welcome to this forum
      I fail to get the use of -statsby-; you have panel data with a continuous regressand: hence your first choice should be -xtreg-:
      Code:
      xtreg Return RMRF SMB HML LIQ, re robust (*-vce- is redundant*)
      *Then you can test whether the -re- specification is Ok for your data via the community-contributed command -xtoverid (-search xtoverid- and install it before using it)*
      xtoverid
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X