Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of Data

    Dear all,

    I have a set of financial market data sets and I am testing if herding is present in different markets. As a matter of precaution, I have run the regression along with the -i.year- and -robust- commands to detrend my data and make sure they are robust to heteroskedasticity. I have had no issues executing the said commands, but I do have a trouble interpreting the results, I will show my first example below:

    Click image for larger version

Name:	STATALIST1.png
Views:	1
Size:	33.7 KB
ID:	1434627

    Here's my second example:
    Click image for larger version

Name:	STATALIST2.png
Views:	1
Size:	35.3 KB
ID:	1434628


    If I may draw your attention to the 'improved' R-squared, interpreting an increase in the R-squared as an 'improvement' is, of course, shallow, however, the increase is rather significant and persistent for not just the two examples I have shown above but most of my data sets, have I done the right thing by controlling for yearly trend and detrending? Another phenomenon worth pointing out is that the coefficient of 2007 is not only large, but significant in both examples, what might be the implication?


  • #2
    Guest::
    - usually adjusted R-sq should be used to make comparisons across different regression models;
    - "the best" model (whatever that means) is the one that gives the truest and fairest view of the data generating process underlying your data (the literature in your research field can help you out in this respect);
    you can test the significance of -i.yerar- via the folowing:
    Code:
    testparm(i.year)
    Last edited by sladmin; 09 Apr 2018, 08:57. Reason: anonymize poster
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Carlo,

      I am also dubious about the presence of heteroskedasticity in my data.

      Click image for larger version

Name:	HETERO1.png
Views:	2
Size:	27.5 KB
ID:	1434727


      As you can see, I first ran my regression using robust standard errors, and the second time without. The standard errors and the robust standard errors did not differ significantly, does that not mean that there's no heteroskedasticity? My uncertainty arose because I also ran a Breusch-Pagan test and plotted graphs that indicate heteroskedasticity, which you will see below
      Click image for larger version

Name:	hetero2.png
Views:	4
Size:	321.6 KB
ID:	1434732
      Above is my rvfplot and I do see a trend. My BP test pointed out that there's heteroskedasticity too:



      Click image for larger version

Name:	hetero3.png
Views:	1
Size:	3.8 KB
ID:	1434733

      I then proceeded to scatter my residuals against my two x variables, one at a time, below are the results:
      scatter uhat r_mt:
      Click image for larger version

Name:	hetero4.png
Views:	1
Size:	257.4 KB
ID:	1434734

      scatter uhat abs_r_mt:

      Click image for larger version

Name:	hetero5.png
Views:	1
Size:	268.1 KB
ID:	1434735

      Only in the second graph do I detect heteroskedasticity. What might have gone wrong?
      Attached Files

      Comment


      • #4
        Guest:
        please use CODE delimiters to post what you typed and what Stata gave you back; screenshots are hard to read and difficult to comment on.
        If you suspect heteroskedasticity in your data (with several observations, visual inspection can easily outperforms BP test), you can well use -robust- standard errors.
        Last edited by sladmin; 09 Apr 2018, 08:57. Reason: anonymize poster
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Dear Carlo,

          I apologise for the screenshots. Following are my results:

          Code:
          . . reg csad abs_r_mt r_mt_2
          
                Source |       SS           df       MS      Number of obs   =     2,609
          -------------+----------------------------------   F(2, 2606)      =    369.70
                 Model |  491.324057         2  245.662029   Prob > F        =    0.0000
              Residual |  1731.65225     2,606  .664486665   R-squared       =    0.2210
          -------------+----------------------------------   Adj R-squared   =    0.2204
                 Total |  2222.97631     2,608  .852368215   Root MSE        =    .81516
          
          ------------------------------------------------------------------------------
                  csad |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              abs_r_mt |   .6120725    .028192    21.71   0.000     .5567915    .6673534
                r_mt_2 |  -.0587181   .0049878   -11.77   0.000    -.0684984   -.0489377
                 _cons |   1.051306   .0260192    40.41   0.000     1.000286    1.102327
          ------------------------------------------------------------------------------
          
          
          . reg csad abs_r_mt r_mt_2, robust
          
          Linear regression                               Number of obs     =      2,609
                                                          F(2, 2606)        =     309.63
                                                          Prob > F          =     0.0000
                                                          R-squared         =     0.2210
                                                          Root MSE          =     .81516
          
          ------------------------------------------------------------------------------
                       |               Robust
                  csad |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              abs_r_mt |   .6120725   .0313965    19.49   0.000     .5505078    .6736372
                r_mt_2 |  -.0587181   .0058612   -10.02   0.000    -.0702111    -.047225
                 _cons |   1.051306   .0267547    39.29   0.000     .9988437    1.103769
          ------------------------------------------------------------------------------

          Code:
          . . hettest abs_r_mt r_mt_2
          
          Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
                   Ho: Constant variance
                   Variables: abs_r_mt r_mt_2
          
                   chi2(2)      =   108.45
                   Prob > chi2  =   0.0000
          
          . hettest abs_r_mt
          
          Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
                   Ho: Constant variance
                   Variables: abs_r_mt
          
                   chi2(1)      =    19.32
                   Prob > chi2  =   0.0000
          
          . hettest r_mt_2
          
          Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
                   Ho: Constant variance
                   Variables: r_mt_2
          
                   chi2(1)      =    67.88
                   Prob > chi2  =   0.0000
          Thanks for your advice Carlo, its a common practice in my workplace to use robust standard errors as default, however, the difficulty is that I have to justify why do I use it, that is also why heteroskedasticity tests were done. In short, I have to prove that there is heteroskedasticity, only then I could proceed to use -robust-.

          As you could see from above, the robust and general standard errors do not differ much, implying that there might not be heteroskedasticity. However, the BP test clearly tells us that the data is heteroskedastic, that is also when the confusion came upon.
          Last edited by sladmin; 09 Apr 2018, 08:58. Reason: anonymize poster

          Comment


          • #6
            I think the large sample size may prompt the BP to give a significant p-value. Maybe the "proof of the pudding" lies in the (lack of significant change of) robust SEs, as you noticed.

            Additionnaly, this thread might interest you.
            Best regards,

            Marcos

            Comment


            • #7
              Guest:
              as an aside to Marcos' helpful insight, what if you type:
              Code:
              estat hettest
              after regression with default standard errors?
              Last edited by sladmin; 09 Apr 2018, 08:57. Reason: anonymize poster
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Thanks Carlo and Marcos, below is my -estat hettest- results.

                Code:
                . reg csad abs_r_mt rmt2
                
                      Source |       SS           df       MS      Number of obs   =     2,609
                -------------+----------------------------------   F(2, 2606)      =    369.70
                       Model |  491.324057         2  245.662029   Prob > F        =    0.0000
                    Residual |  1731.65225     2,606  .664486665   R-squared       =    0.2210
                -------------+----------------------------------   Adj R-squared   =    0.2204
                       Total |  2222.97631     2,608  .852368215   Root MSE        =    .81516
                
                ------------------------------------------------------------------------------
                        csad |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                    abs_r_mt |   .6120725    .028192    21.71   0.000     .5567915    .6673534
                        rmt2 |  -.0587181   .0049878   -11.77   0.000    -.0684984   -.0489377
                       _cons |   1.051306   .0260192    40.41   0.000     1.000286    1.102327
                ------------------------------------------------------------------------------
                
                . hettest
                
                Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
                         Ho: Constant variance
                         Variables: fitted values of csad
                
                         chi2(1)      =    10.41
                         Prob > chi2  =   0.0013
                
                . hettest r_mt abs_r_mt
                
                Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
                         Ho: Constant variance
                         Variables: r_mt abs_r_mt
                
                         chi2(2)      =    16.56
                         Prob > chi2  =   0.0003
                
                . hettest abs_r_mt
                
                Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
                         Ho: Constant variance
                         Variables: abs_r_mt
                
                         chi2(1)      =    16.06
                         Prob > chi2  =   0.0001
                
                . hettest rmt2
                
                Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
                         Ho: Constant variance
                         Variables: rmt2
                
                         chi2(1)      =    15.75
                         Prob > chi2  =   0.0001
                
                .

                Comment


                • #9
                  Guest:
                  -what if you log your dependent variable?
                  - I would also perform -estat ovtest- as a regerssion postestimation test;
                  - by the way: where did you interactions go?
                  Last edited by sladmin; 09 Apr 2018, 08:58. Reason: anonymize poster
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Dear Carlo,

                    Did you mean that I should try to log my dependent variables when I am doing the regression? Below is my Ramsey RESET Test result:
                    Code:
                    . estat ovtest
                    
                    Ramsey RESET test using powers of the fitted values of csad
                           Ho:  model has no omitted variables
                                    F(3, 2594) =      1.52
                                      Prob > F =      0.2084
                    I am sorry but I am not sure what interactions were you referring to.

                    Thanks for your inputs.

                    Comment


                    • #11
                      Guest:
                      1) yes, I meant to log the dv;
                      2) the -ovtest- outcome does not show any evidence of non-linear relationships between your dv and predictors;
                      3) I refer to the interactions that you reported in your first post (screenshots).
                      Last edited by sladmin; 09 Apr 2018, 08:59. Reason: anonymize poster
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Dear Carlo,

                        I think I understand what you meant now. Below is my regression result after logging the DVs:

                        Code:
                        . gen log1= log( abs_r_mt)
                        (192 missing values generated)
                        
                        . gen log2=log( r_mt# r_mt)
                        r_mt#r_mt invalid name
                        r(198);
                        
                        . gen rmt2= r_mt^2
                        (1 missing value generated)
                        
                        . gen log2=log( rmt2)
                        (192 missing values generated)
                        
                        . reg csad log1 log2
                        note: log1 omitted because of collinearity
                        
                              Source |       SS           df       MS      Number of obs   =     2,418
                        -------------+----------------------------------   F(1, 2416)      =    223.02
                               Model |  144.370867         1  144.370867   Prob > F        =    0.0000
                            Residual |  1563.95462     2,416   .64733221   R-squared       =    0.0845
                        -------------+----------------------------------   Adj R-squared   =    0.0841
                               Total |  1708.32549     2,417  .706795816   Root MSE        =    .80457
                        
                        ------------------------------------------------------------------------------
                                csad |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                log1 |          0  (omitted)
                                log2 |   .0970239   .0064968    14.93   0.000      .084284    .1097639
                               _cons |   1.774776   .0170143   104.31   0.000     1.741412     1.80814
                        ------------------------------------------------------------------------------
                        
                        .
                        I think you are confused because I have changed the names of my variables, I have to assure you that these are innocuous changes, you will see why:

                        Code:
                        . gen abs_r_mt = abs(r_mt)
                        (1 missing value generated)
                        
                        . regress csad c.abs_r_mt##c.abs_r_mt i.year, robust
                        
                        Linear regression                               Number of obs     =      2,609
                                                                        F(11, 2597)       =     141.73
                                                                        Prob > F          =     0.0000
                                                                        R-squared         =     0.3903
                                                                        Root MSE          =     .72244
                        
                        ------------------------------------------------------------------------------
                                     |               Robust
                                csad |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                            abs_r_mt |   .4994904   .0292174    17.10   0.000     .4421987    .5567822
                                     |
                          c.abs_r_mt#|
                          c.abs_r_mt |   -.052799   .0051352   -10.28   0.000    -.0628686   -.0427295
                                     |
                                year |
                               2002  |  -.2258721   .0459779    -4.91   0.000    -.3160291   -.1357151
                               2003  |    -.04762   .0484289    -0.98   0.326    -.1425832    .0473433
                               2004  |   .0641004   .0472513     1.36   0.175    -.0285536    .1567545
                               2005  |   .4017124   .0588411     6.83   0.000     .2863322    .5170927
                               2006  |   .5451839   .0563295     9.68   0.000     .4347286    .6556392
                               2007  |   1.051536   .0768241    13.69   0.000     .9008934    1.202179
                               2008  |   .8957598   .0699791    12.80   0.000     .7585394     1.03298
                               2009  |   .4675094   .0568641     8.22   0.000     .3560059    .5790129
                               2010  |   .2445753   .0514202     4.76   0.000     .1437465    .3454041
                                     |
                               _cons |   .8224699   .0380373    21.62   0.000     .7478834    .8970565
                        ------------------------------------------------------------------------------
                        
                        . 
                        end of do-file
                        
                        . gen rmt2 = r_mt^2
                        (1 missing value generated)
                        
                        . reg csad abs_r_mt rmt2 i.year, robust
                        
                        Linear regression                               Number of obs     =      2,609
                                                                        F(11, 2597)       =     141.73
                                                                        Prob > F          =     0.0000
                                                                        R-squared         =     0.3903
                                                                        Root MSE          =     .72244
                        
                        ------------------------------------------------------------------------------
                                     |               Robust
                                csad |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                            abs_r_mt |   .4994904   .0292174    17.10   0.000     .4421987    .5567822
                                rmt2 |   -.052799   .0051352   -10.28   0.000    -.0628686   -.0427295
                                     |
                                year |
                               2002  |  -.2258721   .0459779    -4.91   0.000    -.3160291   -.1357151
                               2003  |    -.04762   .0484289    -0.98   0.326    -.1425832    .0473433
                               2004  |   .0641004   .0472513     1.36   0.175    -.0285536    .1567545
                               2005  |   .4017124   .0588411     6.83   0.000     .2863322    .5170927
                               2006  |   .5451839   .0563295     9.68   0.000     .4347286    .6556392
                               2007  |   1.051536   .0768241    13.69   0.000     .9008934    1.202179
                               2008  |   .8957598   .0699791    12.80   0.000     .7585394     1.03298
                               2009  |   .4675094   .0568641     8.22   0.000     .3560059    .5790129
                               2010  |   .2445753   .0514202     4.76   0.000     .1437465    .3454041
                                     |
                               _cons |   .8224699   .0380373    21.62   0.000     .7478834    .8970565
                        ------------------------------------------------------------------------------
                        
                        .
                        As you can tell, the results were identical, the only difference are the names of the variables.

                        Thank you.


                        Comment


                        • #13
                          Guest:
                          1) you seemingly logged the independent variables (instead of the dv, that is -csad-);
                          2) as far as categorical variables and interactions creation is concerned, it's a safe (and rewarding) habit to rely always on -fvvarlist- notation.
                          Last edited by sladmin; 09 Apr 2018, 08:59. Reason: anonymize poster
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            Carlo:

                            Code:
                            . gen log1=log(csad)
                            (192 missing values generated)
                            
                            . reg log1 abs_r_mt rmt2
                            
                                  Source |       SS           df       MS      Number of obs   =     2,418
                            -------------+----------------------------------   F(2, 2415)      =    200.50
                                   Model |  73.7438233         2  36.8719117   Prob > F        =    0.0000
                                Residual |  444.111582     2,415  .183897135   R-squared       =    0.1424
                            -------------+----------------------------------   Adj R-squared   =    0.1417
                                   Total |  517.855405     2,417  .214255443   Root MSE        =    .42883
                            
                            ------------------------------------------------------------------------------
                                    log1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                abs_r_mt |   .2437952   .0158073    15.42   0.000     .2127979    .2747925
                                    rmt2 |  -.0220551   .0027192    -8.11   0.000    -.0273873   -.0167229
                                   _cons |   .1940486   .0152521    12.72   0.000       .16414    .2239573
                            ------------------------------------------------------------------------------
                            
                            .

                            Comment


                            • #15
                              Guest:
                              now perform -estat ovtest- and -estat hettest- after log-linear regression.
                              Last edited by sladmin; 09 Apr 2018, 08:59. Reason: anonymize poster
                              Kind regards,
                              Carlo
                              (Stata 18.0 SE)

                              Comment

                              Working...
                              X