Estimated Coefficient (1st Stage) as a Dependent Variable in a Second Stage Regression

Hannes Boehm

Join Date: Jul 2017

Posts: 6
#1

Estimated Coefficient (1st Stage) as a Dependent Variable in a Second Stage Regression

14 Jul 2017, 04:13

Dear all,

I have a question that has been touched upon in other topics but so far hasn't been answered to my full satisfaction.

I have two regression stages where I use estimated coefficients from the first stage as dependent variables in the second. More precisely, I estimate seperately for N countries the following time series as the first stage:
Y_t= a₀+ b₁X_t¹+ b_2,...KX_t^2,...K + u_t
where b₁ is the coefficient of (main) interest while the other Xs are the control variables (to give some background: X_t¹ is a proxy for bank risks while Y_t represents sovereign risks. I want to investigate the transmission of bank risks to sovereign risks). This first stage is estimated with daily data, seperately for every country (no panel dimension yet) for every quarter in the time span (which yields around 60 observations in every quarter and is done repeatedly, quarter after quarter).

In a second stage, I collect the b₁s which vary over quarters and across countries and use them as dependent variables in a panel fixed-effects estimation with a quarterly frequency. Basically, I want to investigate why the transmission from bank to sovereign risks is stronger/weaker over various quarters and across countries while controlling for macro measures such as GDP growth that only have a quarterly frequency. The model looks like this:
b_n,t = c₀ + d_1,...LZ_n,t^1,...L + Country_n + e_n,t
with Country_n being the fixed effects term (keep in mind that the first stage is daily (Xs are interest rates, stock prices and the like), while the second stage has a quarterly frequency).

As you probably know, using estimated values from a previous stage as a dependent variable in a second stage introduces biased standard errors because the second step ignores the estimaton error from the first stage. I would like to ask about the most effective way to remedy this problem. Some suggestions have been proposed:

- This Stata blog entry (http://blog.stata.com/2014/12/08/usi...tion-problems/) uses GMM to estimate both stages simultaneously, thus avoiding the two-stage-error problem. However, I don't believe I can apply this method because (aside from not using a GMM model) I have different time frequencies (daily and quarterly) and different models (time series and panel).

- Some authors suggest switching to an FGLS estimator in the second stage which assumes heteroskedastic and cross-correlated error terms (xtgls with the panels(correlated) option). This sounds reasonable as a first step to me because the errors of my second stage are definitely cross-correlated as the Pesaran test statistics shows. But does it really fix the biased standard errors specific to the two-stage problem?

- Lewis and Linzer (2005) (http://www.sscnet.ucla.edu/polisci/f...ewisLinzer.pdf) argue to use an FGLS estimator which uses the variance-covariance matrix from the standard errors of the first stage to generate a weight, which can be used to adjust the second stage regression (basically a weighted least squares regression with country-time specific weights). The weights (adjusted for a panel dimension) are calculated as follows:
w_i,t = 1 / [sqrt(se_DependVariable_i,t + SigmaHat_i,t)]
where se_DependVariable_i,t are the standard errors of the dependent variable from the first stage and SigmaHat_i,tis an estimate for the variance of the error term in the second stage that is not due to the sampling error of the dependent variable. This sounds reasonable to me and I have seen some papers claiming that they use this adjustment. However, is there a Stata command which does this adjustment automatically or do the weights have to be calculated step by step? If there is no command, can I show you my code how I have done it? (I don't feel too confident when it comes to formulating and calculating matrices in Stata).

So maybe to sum up my questions:
1) Do you know of a method that fixes the two-stage-error most efficiently (irrespective of the suggestions I listed above)? If there is an established way to solve this problem (maybe with a ready Stata command) I would be most happy to know about it!

2) How do you judge the suggestions I posted? Do you think FGLS will suffice? And do you know of a Stata command that implements the Lewis, Linzer weights and if not, can I show you my code on how I did it?

Any help is greatly appreciated!
Tags: None
Hannes Boehm

Join Date: Jul 2017

Posts: 6
#2

17 Jul 2017, 05:41

Dear all,

So no answer because...
... the question is expressed in an unclear or too time-consuming way (too many equations; problem unclear)?
... there is no easy solution (which would at least answer question 1) with a "no"), really solving the problem would take some effort?
... this is my first entry and as an inexperienced user I overlooked a major guideline or rule of the forum?

I can gladly reformulate the problem if that makes it easier!

Keep in mind that providing an answer could generate huge positive externalities as many researchers work with estimated dependent variables and might run into this problem. So far, some papers perform no adjustment in this matter or just type "vce(r)" behind their regression which is insufficient to remedy the problem. Therefore, the scientific quality of some papers can really be lifted by a proper guideline on how to handle this problem!
Comment
Alex Stead

Join Date: Aug 2014

Posts: 42
#3

17 Jul 2017, 06:59

Surely a much better way of doing this is in one step. Why not just estimate:

Y_t= a₀+ b_1,i,tX_t¹+ b_2,...KX_t^2,...K + u_t

b_1,i,t = c₀ + d_1,...LZ_n,t^1,...L + Country_n

Which is easy to do?
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

17 Jul 2017, 12:25

Alex - may I suggest that you might expand on your response to Hannes? I'm not sure that "which is easy to do?" means that it is easy or expresses uncertainty about whether it is easy. But, it might help Hannes and others if you mention which Stata routine you had in mind to estimate the model in one stage.

Most of the ways I could see to do this in one step would assume the same observational structure in both equations. I'd worry about more than biased standard errors in the second stage. In essence, you're making the dv in the second equation a function of whatever variables are in the first. This seems like it might result in endogeneity or similar problems.
Comment
Hannes Boehm

Join Date: Jul 2017

Posts: 6
#5

20 Jul 2017, 07:54

Hey Phil, Hey Alex,

thank you for your answers. @Alex: I have to join Phil's thought: If you could elaborate on your response, I'd appreciate it!

If there aren't any more answers, I assume that there is no ready-made answer to this problem. I will proceed with the standard xtreg command and put some alternative estimators (xtscc, fgls and the weight factor from Lewis Linzer) in a robustness part. Thanks nevertheless!
Comment

Announcement

Estimated Coefficient (1st Stage) as a Dependent Variable in a Second Stage Regression

Comment

Comment

Comment

Comment