Estamating standard errors in Finance Panel Data Sets

Alberto Noe

Join Date: Oct 2020

Posts: 10
#1

Estamating standard errors in Finance Panel Data Sets

01 Oct 2020, 04:27

Dear Stata users,
I'm dealing with an unbalanced panel dataset of many individuals over two years, my purpose is to study the determinants of italian households financial planning.
I find a paper of Mithcell A.Petersen called "Estimating Standard errors in Finance Panel Data Sets: Comparing approaches" whose has radically changed my point of view in the method of study.
In fact before I runned a Breusch-Pagan LM test for random effects versus OLS, choosing the second.
After I runned an Hausman test for fixed versus random effects model, this time was most suitable a fixed effects model for my data.
The paper by Petersen (2009) change my mind, in fact He didn't consider this approach (very simple) but instead He carried about the unbiasedness of the standard errors to choose the best method to apply for a research dealing with panel dataset. He distinguished between two general forms of dependence:
1) "The residuals of a given firm may be correlated across years for a given firm (time series dependence), unobserved effect firm"
2) "The residuals of a given year may be correlated across different firms (cross-sectional dependence), time effect"

He tested various model used in the literature towards various simulation (considering only one effect by time or togheter) and different fractions of variance in the independent variable and in the residuals that is due by this two effects.
Finally he proposed this approach:
1) Run an OLS with white standard errors
2) Run an OLS with standard errors clusterd by firm
3) Run an OLS with standard errors clusterd by year
4)Run an OlS with standard errors clusterd by both

If the second model has larger (3 / 4 times) standard errors than the first model, there is a firm effect and standard errors clustered by firm produce unbiased standard errors.
If the third model has larger (3 /4 times) standard errors than the first model there is a time effect and the Fama-MacBeth model is the best.
If the fourth model has larget (3 / 4 times) standard errors than the second and third model, cluster by firms and year.

I understood the reasoning behing this approach, but when I try to replicate it on STATA I can't because with the command "xtreg.....vce (cluster year)" I can't estimate the third case beacuse it returns:
"option cluster() not allowed" , the wierd thing is that with a routine provided by professor Petersen I can cluster both for each dimension, but not for only for time.

Can anyone help me, please?
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

01 Oct 2020, 05:49

Yes, we can help if you show us exactly what you typed, and exactly what Stata returned. And explain exactly what you want to achieve, e.g., it is not clear to me when you want to run OLS why do you use xtreg.
--
PS. Petersen's procedure you describe sounds like Voodoo magic to me, but we are not here to judge ;-)
1 like
Comment
Alberto Noe

Join Date: Oct 2020

Posts: 10
#3

01 Oct 2020, 07:28

I would know the best model according to my data ; I used xtreg because Petersen explained the approach with OLS but he argued that you can implement it also with other models.
Since my dependent variable is a dummy, maybe it would be better to use a xtlogit instead of xtreg, but the error is still there.

xtlogit having_fin_plan know_01 male investors savers budget_always_respected edu_univ trust_fin_01 ansia01 risk_av01 ovc_dummy property_01 ///
rent_to_buy high_financial_wealth Bank_mortgage_for_house Bank_mortgage_for_goods Debt_vs_Relatives_for_house Debt_vs_Relatives_for_goods ///
living_with_young_children living_with_sons_over_15s, vce (cluster year)

panels are not nested within clusters

Thank you very much, professor

Last edited by Alberto Noe; 01 Oct 2020, 07:34.
Comment
Alberto Noe

Join Date: Oct 2020

Posts: 10
#4

01 Oct 2020, 07:51

Originally posted by Joro Kolev View Post

Yes, we can help if you show us exactly what you typed, and exactly what Stata returned. And explain exactly what you want to achieve, e.g., it is not clear to me when you want to run OLS why do you use xtreg.
--
PS. Petersen's procedure you describe sounds like Voodoo magic to me, but we are not here to judge ;-)

I would know the best model according to my data ; I used xtreg because Petersen explained the approach with OLS but he argued that you can implement it also with other models.
Since my dependent variable is a dummy, maybe it would be better to use a xtlogit instead of xtreg, but the error is still there.

xtlogit having_fin_plan know_01 male investors savers budget_always_respected edu_univ trust_fin_01 ansia01 risk_av01 ovc_dummy property_01 ///
rent_to_buy high_financial_wealth Bank_mortgage_for_house Bank_mortgage_for_goods Debt_vs_Relatives_for_house Debt_vs_Relatives_for_goods ///
living_with_young_children living_with_sons_over_15s, vce (cluster year)

panels are not nested within clusters

Thank you very much, professor
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#5

01 Oct 2020, 08:09

There are several problems with the Petersen paper. -xtlogit- is inconsistent if there is serial correlation in the residuals which the cluster option supposes. You should simply used pooled logit and cluster the standard errors.This is for time dependence within the firm. If you also want to allow for dependence across firms I cannot help you. If Professor Wooldridge sees this thread, he will propose something. You can also have recourse to what is called Correlated Random Errors (or the Mundlak approach) with the pooled logit model. This is not difficult to do. There are posts here in Statalist explaining how to do it in the case of a linear regression model. Just replace reg by logit.

On edit: this is a thread which explains how to estimate a Correlated Random Effects Model: https://www.statalist.org/forums/for...an-test-result

Last edited by Eric de Souza; 01 Oct 2020, 08:14.
1 like
Comment
Alberto Noe

Join Date: Oct 2020

Posts: 10
#6

01 Oct 2020, 08:55

Originally posted by Eric de Souza View Post

There are several problems with the Petersen paper. -xtlogit- is inconsistent if there is serial correlation in the residuals which the cluster option supposes. You should simply used pooled logit and cluster the standard errors.This is for time dependence within the firm. If you also want to allow for dependence across firms I cannot help you. If Professor Wooldridge sees this thread, he will propose something. You can also have recourse to what is called Correlated Random Errors (or the Mundlak approach) with the pooled logit model. This is not difficult to do. There are posts here in Statalist explaining how to do it in the case of a linear regression model. Just replace reg by logit.

On edit: this is a thread which explains how to estimate a Correlated Random Effects Model: https://www.statalist.org/forums/for...an-test-result

Thank you for the answer Professor de Souza,
So can I use the logit command instead of the xtlogit even if data are panel? If there are no discrepancies between the standard errors (I tried it with the command logit and this is the case) should I consider the model without cluster in my conclusion analysis? If there are no discrepancies it means that there are nor cross-section dependence nor time series dependence to account for, right?

Last edited by Alberto Noe; 01 Oct 2020, 08:58.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#7

01 Oct 2020, 09:48

Yes you can use logit instead of xtlogit, just like you can use reg instead of xtreg, with panel data. Just make sure that you have xtset your data.
I do not understand what you mean by "no discrepancies between the standard errors" and on what basis you affirm this.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

01 Oct 2020, 10:00

If you have " unbalanced panel dataset of many individuals over two years, " that is T=2, the standard logit with individual dummies is inconsistent because of the incidental variables problem.

Similarly it does not make sense to cluster standard errors by year, because you will end up with 2 clusters.

I am not familiar with the result that xtlogit is inconsistent if there is autocorrelation in the idiosyncratic error, I am hearing this for a first time.

Otherwise you are getting the error because of how the data is xtset. Here

Code:

. webuse nlswork, clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtset idcode
       panel variable:  idcode (unbalanced)

. xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp not_smsa south, fe robust
note: grade omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,500
Group variable: idcode                          Number of groups  =      4,708

R-sq:                                           Obs per group:
     within  = 0.1578                                         min =          1
     between = 0.3242                                         avg =        6.1
     overall = 0.2436                                         max =         15

                                                F(6,4707)         =     295.84
corr(u_i, Xb)  = 0.1660                         Prob > F          =     0.0000

                                    (Std. Err. adjusted for 4,708 clusters in idcode)
-------------------------------------------------------------------------------------
                    |               Robust
            ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
              grade |          0  (omitted)
                age |   .0290209   .0052234     5.56   0.000     .0187806    .0392611
                    |
        c.age#c.age |  -.0006683   .0000848    -7.88   0.000    -.0008346    -.000502
                    |
            ttl_exp |   .0613071   .0034563    17.74   0.000     .0545311    .0680831
                    |
c.ttl_exp#c.ttl_exp |  -.0008888   .0001444    -6.15   0.000    -.0011719   -.0006057
                    |
           not_smsa |  -.0913552   .0137498    -6.64   0.000    -.1183113    -.064399
              south |  -.0602612   .0166364    -3.62   0.000    -.0928764    -.027646
              _cons |   1.148869   .0736599    15.60   0.000     1.004462    1.293277
--------------------+----------------------------------------------------------------
            sigma_u |  .35837059
            sigma_e |  .29406991
                rho |  .59760605   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

. xtset year
       panel variable:  year (unbalanced)

. xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp not_smsa south, fe robust

Fixed-effects (within) regression               Number of obs     =     28,500
Group variable: year                            Number of groups  =         15

R-sq:                                           Obs per group:
     within  = 0.3118                                         min =      1,223
     between = 0.9713                                         avg =    1,900.0
     overall = 0.3335                                         max =      2,256

                                                F(7,14)           =    2207.84
corr(u_i, Xb)  = -0.6556                        Prob > F          =     0.0000

                                         (Std. Err. adjusted for 15 clusters in year)
-------------------------------------------------------------------------------------
                    |               Robust
            ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
              grade |   .0645029   .0020186    31.95   0.000     .0601734    .0688324
                age |   .0605298   .0069109     8.76   0.000     .0457075    .0753521
                    |
        c.age#c.age |  -.0009376   .0001114    -8.42   0.000    -.0011765   -.0006987
                    |
            ttl_exp |   .0619083   .0027342    22.64   0.000     .0560441    .0677725
                    |
c.ttl_exp#c.ttl_exp |  -.0008605   .0001624    -5.30   0.000    -.0012088   -.0005123
                    |
           not_smsa |  -.1585848   .0087112   -18.20   0.000    -.1772685   -.1399011
              south |   -.117421   .0054771   -21.44   0.000    -.1291681   -.1056739
              _cons |  -.2984413   .1111369    -2.69   0.018    -.5368062   -.0600764
--------------------+----------------------------------------------------------------
            sigma_u |  .11508054
            sigma_e |  .38208003
                rho |  .08317303   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

Even if you write only robust, xtreg returns standard errors which are robust and clustered by the id variable identifying the panels.

Comment

Alberto Noe

Join Date: Oct 2020

Posts: 10
#9

01 Oct 2020, 10:05

Originally posted by Eric de Souza View Post

Yes you can use logit instead of xtlogit, just like you can use reg instead of xtreg, with panel data. Just make sure that you have xtset your data.
I do not understand what you mean by "no discrepancies between the standard errors" and on what basis you affirm this.

Following the procedure of the Petersen's paper, I have to compare the standard errors of the OLS (in my case I consider logit because I have as dependent variable a dummy) :
1) Run an OLS with white standard errors
2) Run an OLS with standard errors clusterd by firm
3) Run an OLS with standard errors clusterd by year
4)Run an OLS with standard errors clusterd by both

If the second model has larger (3 / 4 times) standard errors than the first model, there is a firm effect and standard errors clustered by firm produce unbiased standard errors.
If the third model has larger (3 /4 times) standard errors than the first model there is a time effect and the Fama-MacBeth model is the best.
If the fourth model has larger (3 / 4 times) standard errors than the second and third model, cluster by firms and year.

Considering that in my case there is no discrepencies between those standard errors, should I conclude that there is nor time series dependence nor cross-section dependence to account for?
I know that you told my that there are several problem with this approach, but my professor wants me to follow what Petersen said for my master thesis.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#10

01 Oct 2020, 11:03

If your professor wants you to follow Petersen, it would be better for you to ignore the discussion above. In order to seriously answer your question I would have to read the article by Petersen and see how proceeds.
I see that he has a web page with instructions, but even staying in the logic of Petersen's approach, some are outdated. They can be used but the output is messy. Example: if you have a recent version of Stata, no one use xi: any more.
Comment

Alberto Noe

Join Date: Oct 2020
Posts: 10

#11

01 Oct 2020, 13:38

Originally posted by Joro Kolev View Post

Code:

. webuse nlswork, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)

. xtset idcode
panel variable: idcode (unbalanced)

. xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp not_smsa south, fe robust
note: grade omitted because of collinearity

Fixed-effects (within) regression Number of obs = 28,500
Group variable: idcode Number of groups = 4,708

R-sq: Obs per group:
within = 0.1578 min = 1
between = 0.3242 avg = 6.1
overall = 0.2436 max = 15

F(6,4707) = 295.84
corr(u_i, Xb) = 0.1660 Prob > F = 0.0000

(Std. Err. adjusted for 4,708 clusters in idcode)
-------------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
grade | 0 (omitted)
age | .0290209 .0052234 5.56 0.000 .0187806 .0392611
|
c.age#c.age | -.0006683 .0000848 -7.88 0.000 -.0008346 -.000502
|
ttl_exp | .0613071 .0034563 17.74 0.000 .0545311 .0680831
|
c.ttl_exp#c.ttl_exp | -.0008888 .0001444 -6.15 0.000 -.0011719 -.0006057
|
not_smsa | -.0913552 .0137498 -6.64 0.000 -.1183113 -.064399
south | -.0602612 .0166364 -3.62 0.000 -.0928764 -.027646
_cons | 1.148869 .0736599 15.60 0.000 1.004462 1.293277
--------------------+----------------------------------------------------------------
sigma_u | .35837059
sigma_e | .29406991
rho | .59760605 (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

. xtset year
panel variable: year (unbalanced)

. xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp not_smsa south, fe robust

Fixed-effects (within) regression Number of obs = 28,500
Group variable: year Number of groups = 15

R-sq: Obs per group:
within = 0.3118 min = 1,223
between = 0.9713 avg = 1,900.0
overall = 0.3335 max = 2,256

F(7,14) = 2207.84
corr(u_i, Xb) = -0.6556 Prob > F = 0.0000

(Std. Err. adjusted for 15 clusters in year)
-------------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
grade | .0645029 .0020186 31.95 0.000 .0601734 .0688324
age | .0605298 .0069109 8.76 0.000 .0457075 .0753521
|
c.age#c.age | -.0009376 .0001114 -8.42 0.000 -.0011765 -.0006987
|
ttl_exp | .0619083 .0027342 22.64 0.000 .0560441 .0677725
|
c.ttl_exp#c.ttl_exp | -.0008605 .0001624 -5.30 0.000 -.0012088 -.0005123
|
not_smsa | -.1585848 .0087112 -18.20 0.000 -.1772685 -.1399011
south | -.117421 .0054771 -21.44 0.000 -.1291681 -.1056739
_cons | -.2984413 .1111369 -2.69 0.018 -.5368062 -.0600764
--------------------+----------------------------------------------------------------
sigma_u | .11508054
sigma_e | .38208003
rho | .08317303 (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

Even if you write only robust, xtreg returns standard errors which are robust and clustered by the id variable identifying the panels.

Thank you professor, I'm going to consider your method to deal with my problem. I really appreciate your kindness.

Comment

Alberto Noe

Join Date: Oct 2020

Posts: 10
#12

01 Oct 2020, 13:51

Originally posted by Eric de Souza View Post

If your professor wants you to follow Petersen, it would be better for you to ignore the discussion above. In order to seriously answer your question I would have to read the article by Petersen and see how proceeds.
I see that he has a web page with instructions, but even staying in the logic of Petersen's approach, some are outdated. They can be used but the output is messy. Example: if you have a recent version of Stata, no one use xi: any more.

Yes, there is a lot of old staff, I wrote to Professor Petersen to have some explanations. Do not worry if you can't read the paper now. However I have four months to develop my thesis, maybe later you will be able to help me. I'm grateful for your help. I will keep you and professer Kolev uptdated!
Comment

Announcement