Using coefficeints from regress command to generate a new variable

gagandeep sharma

Join Date: Jul 2019

Posts: 16
#1

Using coefficeints from regress command to generate a new variable

02 Nov 2021, 15:05

good evening everyone,

I am using Stata 16.1. I have to run a pooled OLS on a panel data and the use the estimated coefficients to generate a new variable:

1. The model I'm trying to implement is: ∆k_i,t = ( λ₀ + ΛZ_i,t-1) (Gap_i,t-1) + η_{i,t

2. I use the following regress command after setting the dataset as panel using xtset:}

regress actual_tier1_gap l.tier1_gap l.((c.tier1_gap)#(i.state1nonstate0 c.size c.return_on_equity_w ///
i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)), noconst

(I apologise for the variable names being unwieldy.)

3. I have run into a few issues:

a. my first question is whether the command is appropriate for the model I am trying to implement

b. the coefficient ofλ₀*Gap_i,t-1tends to remain low if I add few controls, but increases considerably when I add all of the controls above.

c. Finally in the second stage I have to estimate a variable ΛZ_i,t-1from the estimated coefficients of Eq1. I'm unable to grasp how to do it.

Any help would be much appreciated.

regards,
Gagan
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3337
#2

03 Nov 2021, 08:38

Looks right, I think.

The effect of Gap is not independent of the Z. When you change Z, you'll change λ₀

Λ is a vector, so you have multiple ΛZ_i,t-1.

Wouldn't you just take each coefficient of Λi and multiply by Zi?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#3

03 Nov 2021, 08:51

Gagandeep:
1) why did you not cluster the standard errors on your pooled OLS, as the observations within panel (at least) are not independent (and Stata ignores the panel structure of your data since you did not run an -xt- command)? Obviously, Clustering has no bearing on coefficients sample estimate.
2) are you sure that the -noconst- option is what you need?

Kind regards,
Carlo
(Stata 19.0)
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#4

03 Nov 2021, 11:58

Originally posted by Carlo Lazzaro View Post

Gagandeep:

1) why did you not cluster the standard errors on your pooled OLS, as the observations within panel (at least) are not independent (and Stata ignores the panel structure of your data since you did not run an -xt- command)? Obviously, Clustering has no bearing on coefficients sample estimate.
2) are you sure that the -noconst- option is what you need?

Thanks Carlos,

1) I'm setting the data set as panel using xtset. I am running two versions of the command in my Point 2. I use the cl(bankcode) option (my panel identifier), but i do not get the adj R-squared option which I need to report to my supervisor.

2) the model is specified without the constant option and the first interaction term of Gap is treated as constant in subsequent analysis.

3) do you have any pointers on my points 3b) and 3c)?
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#5

03 Nov 2021, 12:04

Originally posted by George Ford View Post

Looks right, I think.

The effect of Gap is not independent of the Z. When you change Z, you'll change λ₀

Λ is a vector, so you have multiple ΛZ_i,t-1.

Wouldn't you just take each coefficient of Λi and multiply by Zi?

thanks for your reply.

So should the coefficient not go down as i add more regressors as controls?

Also my coefficients are all interactions terms for eg. cL.tier1_gap#cL.size

If i need to use just ΛZ_i,t-1then does it mean I need to divide all coefficients by the standalone coefficient of Gap_i,t-1 and then multiply with the variable observations of Z_i,t-1 ? What should I do with the predicted error term of the regression model in that case?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#6

03 Nov 2021, 12:26

Gagandeep:
1) you do not need to -xtset- your data first if you go pooled OLS (BTW: pooled OLS would not be my first choice for panel data regression). With a bit of guess-work, your supervisor may be interested in within- R-sg (if -fe-) or betwee R_sq (if -re-); both are produced by -xtreg- (with a bit of guess-work again, I assume that your regressand is continuous);
2) Ok. I assume that the literature in your research field sponsors your approach;
3) about your question 3b (and with no other pieces of information from your side), provided that it is not clear for what you're controlling for, if the coefficient you're concerned about reports wide variations, I would check your model specification, just to be sure that you're on the right track.

Kind regards,
Carlo
(Stata 19.0)
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#7

03 Nov 2021, 12:37

dY/dGap depends on all the coefficients (often calculated at the means of the Z, but not required). As you interact with more Z, I'm not surprised λ₀changes. It may be, however, that dY/dGap doesn't change all that much once you account for the interactions. Use margins to calculate it to see what's up.

As for passing on those results to a second-stage, I'm not sure what you're up to. Are you passing through one variable or many? There are many ΛZ_i,t-1.Do you want to pass through the prediction? Consider whether you have a generated regressor problem, meaning you'll need to bootstrap both stages to do hypothesis tests.

This problem reminds me a bit of the Bresnahan/Reiss market power model, where there's a pass through of a coefficient from one stage to the next. Might look at that literature.

And, as Carlo suggests, you may want to keep the constant, or at least determine it is in fact zero (even if it should be, theoretically).

It might help us to point to what literature you basing your model on.
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#8

03 Nov 2021, 14:48

Hi Carlo,

first of all apologies for addressing you as Carlos.

I have a panel dataset and the variable Gap in my equation above is actually a predicted variable from a system GMM equation. So it is already set as a panel. Nevertheless I take your point about pooled ols not requiring it.

The model that I am using is pretty standard and comes from Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005

I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation))
But as I mentioned, all these variables are interacted with Gap_i,t-1 so how can i use the estimated coefficients?

Thanks and regards
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#9

03 Nov 2021, 15:01

Originally posted by George Ford View Post

dY/dGap depends on all the coefficients (often calculated at the means of the Z, but not required). As you interact with more Z, I'm not surprised λ₀changes. It may be, however, that dY/dGap doesn't change all that much once you account for the interactions. Use margins to calculate it to see what's up.

As for passing on those results to a second-stage, I'm not sure what you're up to. Are you passing through one variable or many? There are many ΛZ_i,t-1.Do you want to pass through the prediction? Consider whether you have a generated regressor problem, meaning you'll need to bootstrap both stages to do hypothesis tests.

This problem reminds me a bit of the Bresnahan/Reiss market power model, where there's a pass through of a coefficient from one stage to the next. Might look at that literature.

And, as Carlo suggests, you may want to keep the constant, or at least determine it is in fact zero (even if it should be, theoretically).

It might help us to point to what literature you basing your model on.

Thanks George,

I'm not sure of how to use margins but I'm looking into it and how it may help me. It's just that the coefficient of Gap is theoretically bound between 0 and 1 and my first stage GMM results give me an idea of the value (the upper bound at least) it should ideally take. As a matter of fact addition of one of the controls is accounting for majority of the increase in coefficient of Gap_i,t-1

I have linked the paper explaining the model that I am using. Perhaps you can be kind enough just to look at the specification therein. Since this is a second stage in a series of regressions, the literature is quite clear on a pooled OLS with no constant term as also no firm fixed effects, which have been accounted for in the first stage system GMM specification.

I reiterate:
I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector Z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)
But as I mentioned, all these variables are interacted with Gap_i,t-1 so how can i use the estimated coefficients? For eg, my results show a coefficient for L.state1nonstate0#cL.tier1_gap1. Should I divide this coefficient by gapl1 and them multiply the dummy values of state1nonstate0 to get the estimates (and so on for all the variables in Z).

thanks and regards.
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#10

03 Nov 2021, 15:27

no link
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#11

03 Nov 2021, 15:56

Originally posted by George Ford View Post

no link

Here's the link to the paper.

Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005

Thanks and regards.
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#12

03 Nov 2021, 18:06

Looks like voodoo, but what do I know?

Here's what a few minutes of review does for you (by someone unfamiliar with this literature).

This is a three stage model with 2 generated regressors: S1 to S2, and S2 to S3. Unaccounted for, so the hypothesis test are invalid (an error of unknown magnitude, though bootstrap usually increases SEs).

In any case, in Step 2 they take a portion of the prediction from Step 1 to craft a new variable (eq 3). The adjustment λ is assumed constant. This can be done by multiplying Beta*Z's. The goal is to get a mean prediction that they then create a new variable of the difference between the mean and the actual value (the gap) (the true value-predicted value). Not sure taking a portion of the regression makes sense due to scaling (the model includes year dummies and a lagged DV). Insert that generated regressor into Step 2 (which kinda looks like your equation), then take the prediction of S2 and insert into Step 3--another generated regressor.

Your model does not match the ones in that paper. There is no λ₀ just ΛZ_i,t-1 , and it appears λ is a predetermined constant in this paper. (That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor.

You'll need to bootstrap all 3 stages simultaneously for hypothesis testing. You'll have to code it using bsample.

I wonder if they are using predications of the DV for the all the generated regressors? That seems to be what they are after, but it is unclear.

I'd ask the authors' for their code, or find a better approach.
1 like
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#13

04 Nov 2021, 07:53

Originally posted by George Ford View Post

Looks like voodoo, but what do I know?

Here's what a few minutes of review does for you (by someone unfamiliar with this literature).

This is a three stage model with 2 generated regressors: S1 to S2, and S2 to S3. Unaccounted for, so the hypothesis test are invalid (an error of unknown magnitude, though bootstrap usually increases SEs).

In any case, in Step 2 they take a portion of the prediction from Step 1 to craft a new variable (eq 3). The adjustment λ is assumed constant. This can be done by multiplying Beta*Z's. The goal is to get a mean prediction that they then create a new variable of the difference between the mean and the actual value (the gap) (the true value-predicted value). Not sure taking a portion of the regression makes sense due to scaling (the model includes year dummies and a lagged DV). Insert that generated regressor into Step 2 (which kinda looks like your equation), then take the prediction of S2 and insert into Step 3--another generated regressor.

Your model does not match the ones in that paper. There is no λ₀ just ΛZ_i,t-1 , and it appears λ is a predetermined constant in this paper. (That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor.

You'll need to bootstrap all 3 stages simultaneously for hypothesis testing. You'll have to code it using bsample.

I wonder if they are using predications of the DV for the all the generated regressors? That seems to be what they are after, but it is unclear.

I'd ask the authors' for their code, or find a better approach.

George, thanks a ton for going through the model. My supervisor is also of the view that the process is not very sound econometrically, which I guess is your point too.

"(That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor."

Could you be a bit clearer and comment on the following:

I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector Z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)
But as I mentioned, all these variables are interacted with Gap_i,t-1 so how can i use the estimated coefficients? For eg, my results show a coefficient for L.state1nonstate0#cL.tier1_gap1. Should I divide this coefficient by gapl1 and them multiply the dummy values of state1nonstate0 to get the estimates (and so on for all the variables in Z).

I will email the authors, but honestly i have had very little success in getting researchers to part with their codes.

Thanks and regards.
Comment
gagandeep sharma

Join Date: Jul 2019

Posts: 16
#14

04 Nov 2021, 08:02

Originally posted by Carlo Lazzaro View Post

Gagandeep:
1) you do not need to -xtset- your data first if you go pooled OLS (BTW: pooled OLS would not be my first choice for panel data regression). With a bit of guess-work, your supervisor may be interested in within- R-sg (if -fe-) or betwee R_sq (if -re-); both are produced by -xtreg- (with a bit of guess-work again, I assume that your regressand is continuous);
2) Ok. I assume that the literature in your research field sponsors your approach;
3) about your question 3b (and with no other pieces of information from your side), provided that it is not clear for what you're controlling for, if the coefficient you're concerned about reports wide variations, I would check your model specification, just to be sure that you're on the right track.

Hi Carlo,

first of all apologies for addressing you as Carlos.

I have a panel dataset and the variable Gap in my equation above is actually a predicted variable from a system GMM equation. So it is already set as a panel. Nevertheless I take your point about pooled ols not requiring it.

The model that I am using is pretty standard and comes from Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005

P.S. sorry for spamming. I thought maybe you missed my post. I'm at my wits' end and would appreciate any help possible.

Thanks and regards.

I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation))
But as I mentioned, all these variables are interacted with Gap_i,t-1 so how can i use the estimated coefficients?

Thanks and regards
Comment

George Ford

Join Date: Aug 2014
Posts: 3337

#15

04 Nov 2021, 12:35

Code:

sysuse auto, clear
reg price mpg weight length foreign
* use this if you want the prediction of the regression
predict pfit , xb
* use this if you want parts of the regression (2 ways to get to the same result)
gen newvar = _b[_cons] + _b[mpg]*mpg + _b[weight]*weight +_b[length]*length 
gen newvaralt = pfit-_b[foreign]*foreign
* newvar = newvaralt

You've got ugly variable names. If you have trouble matching up the variable names with the coefficients (_b[x]), then

Code:

matrix list e(b)

to see what Stata sees.

Announcement

Using coefficeints from regress command to generate a new variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment