Problem of Prob > F in various models

Shahzeb Ahmed

Join Date: May 2017

Posts: 5
#1

Problem of Prob > F in various models

27 May 2017, 12:07

Hi there,
I need slight help in my regression model. I am very new to econometric so please excuse if I say something wrong or anything.

I have a balanced panel data and running a regression model with 9 independent variables and 3 control variables for N=9, T=6, where N are the companies.

Where:
Dependent Variable: Performance Measure
Indepvar1, Indepvar2 and Indepvar3 are various ownership structures
Indepvar4 - indepvar9 are my board related other corporate governance variables
Indepvar9 - indepvar 12 are my control variables

Lagrange multiplier test suggests using OLS model so I ran the following model with -reg- command:

Code:

depvar = alpha + indepvar1 + indepvar2 + indepvar3 + indepvar4 + indepvar5 + indepvar6 + indepvar 7 + indepvar8 + indepvar9 + indepvar10 + indepvar 11 + indepvar 12

To control for autocorrelation and heteroscedasticity, I have used -cluster( companies)

Once the overall model is run, I have to run 2 different regression models as follows with -cluster( companies):

Code:

depvar = alpha + indepvar1 + indepvar2 + indepvar3 + indepvar10 + indepvar 11 + indepvar 12

This model tests the impact of various ownership structures on firm performance.

Code:

depvar = alpha + indepvar4 + indepvar5 + indepvar6 + indepvar 7 + indepvar8 + indepvar9 + indepvar10 + indepvar 11 + indepvar 12

This model tests the impact of other board related corporate governance variables on firm performance.

My question is that the first and the third model did not report Prob > F while the second model report Prob > F = 0.0060.

Why is it that the two models did not report the model significance? Since I have made my submission already, I have a presentation in coming week on my topic.What explanation can I give for the missing Prob > F for two models?

I hope my problem is very much clear to readers. Any help will be appreciated.

Thanks in advance.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

27 May 2017, 12:42

When you use the cluster robust VCE, your degrees of freedom for estimation drops to the number of clusters minus 1. (Look at the degrees of freedom in the F-test of the second model and you will see it is very small.) In your case, with N = 9, this makes it impossible to estimate the overall model F test when you have 12 or 9 variables in the model. But for 6 variables, it is possible.

Now, quite independent of this problem, the use of the cluster robust VCE is only appropriate in large N analyses. While there is disagreement among experts as to just how large N needs to be, I think you would have a hard time finding anyone who would say that N = 9 is sufficient. So you shouldn't be using cluster robust VCE with this data structure. If it were my data, I would switch to the simple -robust- VCE here.

All of that said, I'm also not sure how much sense it makes to try to estimate the effects of 12 variables on N = 9 clusters and T = 6 in any case: you're really stretching the data very thinly here.
1 like
Comment
Shahzeb Ahmed

Join Date: May 2017

Posts: 5
#3

27 May 2017, 13:41

Hi Clyde,
Thank you for your response. A few more question if you could spare some time.

Originally posted by Clyde Schechter View Post

... this makes it impossible to estimate the overall model F test when you have 12 or 9 variables in the model. But for 6 variables, it is possible.

In layman terms, can you explain a little why is it so? Like, if this question is raised that why is there no F test, what satisfactory answer can I give for this?

Originally posted by Clyde Schechter View Post

... I think you would have a hard time finding anyone who would say that N = 9 is sufficient. So you shouldn't be using cluster robust VCE with this data structure. If it were my data, I would switch to the simple -robust- VCE here.

Basically I read some notes online which said that with -robust- the standard errors take into account issues concerning heterogeneity and lack of normality while -cluster( varname)- controls for dependence or errors and there is no need to use -robust- with -cluster( varname)- as it is already implied into it.

Originally posted by Clyde Schechter View Post

All of that said, I'm also not sure how much sense it makes to try to estimate the effects of 12 variables on N = 9 clusters and T = 6 in any case: you're really stretching the data very thinly here.

Actually, my dataset comes from 4 countries with varying N's. I have already run my regression model on overall data of four countries with 1000+ firm-year observations. Later on, I have to test the impact for each country individually as well.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

27 May 2017, 14:19

In layman terms, can you explain a little why is it so? Like, if this question is raised that why is there no F test, what satisfactory answer can I give for this?

It's a fairly technical issue, so I'm not sure it can be fully explained in layman terms. But I'll give it a try, by speaking fairly loosely. When you use the cluster robust variance estimator, it treats the data, in a sense, as if each cluster were just a single observation. So with N = 9, you are, in effect, carrying out a regression on just 9 observations. It is well understood that in regression models you cannot estimate 12 effects, or 9 effects, with only 9 observations. So Stata does not calculate an F statistic there. By contrast, it is possible (if inadvisable) to test 6 effects on 9 observations, so your second model does give you an overall F-statistic. This is, as I say, a loose explanation. The analogy breaks down if you push it too far. For example, if you truly had only 9 observations and ran a regression on 9 observations, you would not get estimates of any of the individual effects either, whereas with this panel data and 9 clusters, you do get estimates for the individual predictor variables--and those estimates are valid (if imprecise due to the limited sample size.)

Basically I read some notes online which said that with -robust- the standard errors take into account issues concerning heterogeneity and lack of normality while -cluster( varname)- controls for dependence or errors and there is no need to use -robust- with -cluster( varname)- as it is already implied into it.

That is correct. I guess my explanation was not clear. I'm not saying use -cluster robust-. I'm saying use just -robust- with no mention of -cluster-. You simply can't get valid estimates with -cluster- on just 9 groups.
1 like
Comment
Shahzeb Ahmed

Join Date: May 2017

Posts: 5
#5

27 May 2017, 14:54

Thank you so much for your response. There's one more and a final question that I want to make; however, that is not related to the F statistic.

When I ran the regression model for the data mentioned above, in the very first model, 2 out of 3 ownership structures (indepvar1 & indepvar2) were significant. Similarly 2 of the other board-related variables (from indepvar4 - indepvar9) were also significant in overall model. However, when I ran the 2nd model which tested impact of various ownership structures on my performance measure, one of those independent variable became insignificant. Similarly, in the third model which tested the impact of other board-related variables on my performance measure, all of them were insignificant. Why is it that they were significant before but insignificant when broken down into two different data sets? The values and everything has remained the same, it is just that the first model is broken down in to two different models.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

27 May 2017, 16:28

There are two distinct phenomena that can cause this. The first thing is to be sure to remember that the difference between statistically significant and not statistically significant is not, itself, statistically significant! The .05 significance criterion commonly used is an arbitrary cutoff, and p-values that are close to the cutoff can easily cross that threshold with minor change to the model, even if the corresponding coefficient doesn't change materially. So if all you're seeing is changes in p-values, but the coefficients remain more or less the same, then the problem is simply that you are focusing on the wrong statistic. p-values are the most over-rated statistics around. See http://dx.doi.org/10.1080/00031305.2016.1154108 for an extensive set of papers published in The American Statistician, reflecting the American Statistical Association's position on the uses and misuses of p-values. For my part, I tell my students not to even look at the p-values until they have thoroughly understood all the other statistics in the output. Once they do, they usually find that the p-values have nothing to add anyway.

The other possibility, which is much more substantive, is if the coefficients are materially different across models. There is nothing particularly surprising about this. It happens often. And the changes can be very large--even switching to the opposite sign. This is sometimes referred to as Simpson's paradox, though in fact there isn't really a paradox--just a challenge to misguided expectations. The Wikipedia page on Simpson's paradox is quite good and I recommend you read it. It frames the discussion only in terms of categorical predictors, but the same principles apply to continuous variables as well. Simply put, when you move variables in and out of models, things can change in every imaginable way.

Unfortunately, that leaves you with the dilemma of deciding which model(s) are the appropriate ones. There is a temptation some researchers feel to then select the model that gives "significant" p-values to the variables they prefer. That, of course, is not science. That is, if anything, scientific misconduct (unless presented to audiences explicitly as the result of cherry-picking the results). You should have selected your models on the basis of pre-existing theory and its predictions. Then you should stick with those models that theory supports.
2 likes
Comment
Shahzeb Ahmed

Join Date: May 2017

Posts: 5
#7

28 May 2017, 01:19

Thank you so much Clyde. You have been a really great help to me. Thank you once again. Blessings.
Comment

Announcement

Problem of Prob > F in various models

Comment

Comment

Comment

Comment

Comment

Comment