Using coefficeints from regress command to generate a new variable

gagandeep sharma started a topic Using coefficeints from regress command to generate a new variable

02 Nov 2021, 15:05
Using coefficeints from regress command to generate a new variable

good evening everyone,

I am using Stata 16.1. I have to run a pooled OLS on a panel data and the use the estimated coefficients to generate a new variable:

1. The model I'm trying to implement is: ∆k_i,t = ( λ₀ + ΛZ_i,t-1) (Gap_i,t-1) + η_{i,t

2. I use the following regress command after setting the dataset as panel using xtset:}

regress actual_tier1_gap l.tier1_gap l.((c.tier1_gap)#(i.state1nonstate0 c.size c.return_on_equity_w ///
i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)), noconst

(I apologise for the variable names being unwieldy.)

3. I have run into a few issues:

a. my first question is whether the command is appropriate for the model I am trying to implement

b. the coefficient ofλ₀*Gap_i,t-1tends to remain low if I add few controls, but increases considerably when I add all of the controls above.

c. Finally in the second stage I have to estimate a variable ΛZ_i,t-1from the estimated coefficients of Eq1. I'm unable to grasp how to do it.

Any help would be much appreciated.

regards,
Gagan
Tags: None
George Ford replied

22 Nov 2021, 14:30
I don't think you need to include a ratio, but I'm not sure exactly what you're after. You have a linear model with a lot of interactions. Not sure where the ratio comes into play if you want a prediction from that regression. If the theory calls for it, then sure.
Leave a comment:
gagandeep sharma replied

17 Nov 2021, 13:40
Originally posted by George Ford View Post

Use all the variables you want to include in generate X to get a single X.

Thanks George,

Could you please comment on this:

Since in the regression individual Betas are the product of two variables (variable*gap), is the true Λ^hat in vector Z_i,t-1 given by: [_b(l.variable*l.gap)/_b(l.gap)] for each variable in the vector Z?

so that i can use:

generate X = ((_b[cL.tier1_gap1#cL.size] / (_b[L.tier1_gap1])) * L.size) --------- (for one of the variables in vector z, and so on adding all the variables)
Leave a comment:
George Ford replied

15 Nov 2021, 06:57
Use all the variables you want to include in generate X to get a single X.
Leave a comment:
gagandeep sharma replied

10 Nov 2021, 16:19
Originally posted by George Ford View Post

Code:

sysuse auto, clear reg price mpg weight length foreign * use this if you want the prediction of the regression predict pfit , xb * use this if you want parts of the regression (2 ways to get to the same result) gen newvar = _b[_cons] + _b[mpg]*mpg + _b[weight]*weight +_b[length]*length gen newvaralt = pfit-_b[foreign]*foreign * newvar = newvaralt

You've got ugly variable names. If you have trouble matching up the variable names with the coefficients (_b[x]), then

Code:

matrix list e(b)

to see what Stata sees.

Hi George,

Terribly sorry to bother you again. if possible can you address my original query. I have tried multiple iterations but I keep getting results which aren't theoretically possible.

Post the regression command:

regress actual_tier1_gap l.tier1_gap1 l.((c.tier1_gap1)#(i.state1nonstate0 c.size c.return_on_equity_w ///
c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)), noconst

I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector Z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)
since in the regression all these variables are interacted with Gap_i,t-1 so to get the original coefficient in vector z is following approach correct?

generate X = ((_b[cL.tier1_gap1#cL.size] / (_b[L.tier1_gap1])) * L.size) --------- (for one of the variables in vector z, and so on adding all the variables)

in short, is the true Λ^hat in vector Z_i,t-1 given by: [_b(l.variable*l.gap)/_b(l.gap)] for each variable in the vector Z?
Last edited by gagandeep sharma; 10 Nov 2021, 16:35.
Leave a comment:
Carlo Lazzaro replied

05 Nov 2021, 02:38
Gagandeep:
another source to grasp the building blocks ogf George's helpful advice, is Example 3, -bootstrap- entry, Stata .pdf manual.
Leave a comment:
George Ford replied

04 Nov 2021, 14:45
I ripped the general setup from

HTML Code:

https://www.schmidheiny.name/teaching/bootstrap2up.pdf

This bootstraps the critical t-stats for each variable in the second stage equation (since the nominal ones are presumed incorrect). You'd compare the t-stats from the regression to these values rather than the normal table. First set is asymmetric values, the second is a symmetric. I think the former is probably better, but others may disagree.

You can fancy it up where it makes a nice table if that's your desire.

The main thing is that both equations are part of the bootstrap procedure (each round generates a new generated regressor gr).
1 like
Leave a comment:

gagandeep sharma replied

04 Nov 2021, 14:36

Originally posted by George Ford View Post

A start (thoughts welcome).

Code:

sysuse auto, clear

capture program drop myprog
program define myprog, rclass
reg mpg weight c.weight#(c.length c.foreign) , noconstant
capture drop gr
gen gr = _b[c.weight#c.length]*weight*length+_b[c.weight#c.foreign]*weight*foreign
reg price trunk gr
return scalar t_trunk = (_b[trunk]-b_trunk)/_se[trunk]
return scalar t_gr = (_b[gr]-b_gr)/_se[gr]
return scalar t_cons = (_b[_cons]-b_cons)/_se[_cons]
end

reg mpg weight c.weight#(c.length c.foreign) , noconstant
capture drop gr
gen gr = _b[c.weight#c.length]*weight*length+_b[c.weight#c.foreign]*weight*foreign
reg price trunk gr
scalar b_trunk = _b[trunk]
scalar b_gr = _b[gr]
scalar b_cons = _b[_cons]

bootstrap t_trunk=r(t_trunk) t_gr=r(t_gr) t_cons=r(t_cons), reps(100) seed(12345) saving(bs_t, replace): myprog
preserve
use bs_t, replace
centile t_trunk, centile(2.5, 97.5)
centile t_gr, centile(2.5, 97.5)
centile t_cons, centile(2.5, 97.5)
gen t_abs_trunk = abs(t_trunk)
gen t_abs_gr = abs(t_gr)
gen t_abs_cons = abs(t_cons)
centile t_abs_trun t_abs_gr t_abs_cons, centile(95)
restore

Thanks George,

With my limited coding abilities I think it will take me a day or two to understand this. I'm grateful that you spent so much time to help me out.

Leave a comment:

George Ford replied

04 Nov 2021, 13:35

A start (thoughts welcome).

Code:

sysuse auto, clear

capture program drop myprog
program define myprog, rclass
    reg mpg weight c.weight#(c.length c.foreign) , noconstant
    capture drop gr
    gen gr = _b[c.weight#c.length]*weight*length+_b[c.weight#c.foreign]*weight*foreign  
    reg price trunk gr
    return scalar t_trunk = (_b[trunk]-b_trunk)/_se[trunk]
    return scalar t_gr = (_b[gr]-b_gr)/_se[gr]
    return scalar t_cons = (_b[_cons]-b_cons)/_se[_cons]
end

reg mpg weight c.weight#(c.length c.foreign) , noconstant
capture drop gr
gen gr = _b[c.weight#c.length]*weight*length+_b[c.weight#c.foreign]*weight*foreign
reg price trunk gr
scalar b_trunk = _b[trunk]
scalar b_gr = _b[gr]
scalar b_cons = _b[_cons]

bootstrap t_trunk=r(t_trunk) t_gr=r(t_gr) t_cons=r(t_cons), reps(100) seed(12345) saving(bs_t, replace): myprog
preserve
use bs_t, replace
centile t_trunk, centile(2.5, 97.5)
centile t_gr, centile(2.5, 97.5)
centile t_cons, centile(2.5, 97.5)
gen t_abs_trunk = abs(t_trunk)
gen t_abs_gr = abs(t_gr)
gen t_abs_cons = abs(t_cons)
centile t_abs_trun t_abs_gr t_abs_cons, centile(95)
restore

Leave a comment:

George Ford replied

04 Nov 2021, 12:59
cancel that. error.

Last edited by George Ford; 04 Nov 2021, 13:04. Reason: error/deleted
Leave a comment:

George Ford replied

04 Nov 2021, 12:41

Code:

sysuse auto, clear
reg price mpg c.mpg#(c.weight c.length c.foreign) , noconstant
predict pfit , xb
gen pfitalt = _b[mpg]*mpg + _b[c.mpg#c.weight]*mpg*weight+_b[c.mpg#c.length]*mpg*length 
gen newvar = _b[c.mpg#c.weight]*mpg*weight+_b[c.mpg#c.length]*mpg*length 
gen newvaralt = pfit-_b[mpg]*mpg

Leave a comment:

George Ford replied

04 Nov 2021, 12:35

Code:

sysuse auto, clear
reg price mpg weight length foreign
* use this if you want the prediction of the regression
predict pfit , xb
* use this if you want parts of the regression (2 ways to get to the same result)
gen newvar = _b[_cons] + _b[mpg]*mpg + _b[weight]*weight +_b[length]*length 
gen newvaralt = pfit-_b[foreign]*foreign
* newvar = newvaralt

You've got ugly variable names. If you have trouble matching up the variable names with the coefficients (_b[x]), then

Code:

matrix list e(b)

to see what Stata sees.

Leave a comment:

gagandeep sharma replied

04 Nov 2021, 08:02
Originally posted by Carlo Lazzaro View Post

Gagandeep:
1) you do not need to -xtset- your data first if you go pooled OLS (BTW: pooled OLS would not be my first choice for panel data regression). With a bit of guess-work, your supervisor may be interested in within- R-sg (if -fe-) or betwee R_sq (if -re-); both are produced by -xtreg- (with a bit of guess-work again, I assume that your regressand is continuous);
2) Ok. I assume that the literature in your research field sponsors your approach;
3) about your question 3b (and with no other pieces of information from your side), provided that it is not clear for what you're controlling for, if the coefficient you're concerned about reports wide variations, I would check your model specification, just to be sure that you're on the right track.

Hi Carlo,

first of all apologies for addressing you as Carlos.

I have a panel dataset and the variable Gap in my equation above is actually a predicted variable from a system GMM equation. So it is already set as a panel. Nevertheless I take your point about pooled ols not requiring it.

The model that I am using is pretty standard and comes from Jiang, C., Liu, H., & Molyneux, P. (2019). Do different forms of government ownership matter for bank capital behavior? Evidence from China. Journal of Financial Stability, 40, 38–49. https://doi.org/10.1016/j.jfs.2018.11.005

P.S. sorry for spamming. I thought maybe you missed my post. I'm at my wits' end and would appreciate any help possible.

Thanks and regards.

I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation))
But as I mentioned, all these variables are interacted with Gap_i,t-1 so how can i use the estimated coefficients?

Thanks and regards
Leave a comment:
gagandeep sharma replied

04 Nov 2021, 07:53
Originally posted by George Ford View Post

Looks like voodoo, but what do I know?

Here's what a few minutes of review does for you (by someone unfamiliar with this literature).

This is a three stage model with 2 generated regressors: S1 to S2, and S2 to S3. Unaccounted for, so the hypothesis test are invalid (an error of unknown magnitude, though bootstrap usually increases SEs).

In any case, in Step 2 they take a portion of the prediction from Step 1 to craft a new variable (eq 3). The adjustment λ is assumed constant. This can be done by multiplying Beta*Z's. The goal is to get a mean prediction that they then create a new variable of the difference between the mean and the actual value (the gap) (the true value-predicted value). Not sure taking a portion of the regression makes sense due to scaling (the model includes year dummies and a lagged DV). Insert that generated regressor into Step 2 (which kinda looks like your equation), then take the prediction of S2 and insert into Step 3--another generated regressor.

Your model does not match the ones in that paper. There is no λ₀ just ΛZ_i,t-1 , and it appears λ is a predetermined constant in this paper. (That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor.

You'll need to bootstrap all 3 stages simultaneously for hypothesis testing. You'll have to code it using bsample.

I wonder if they are using predications of the DV for the all the generated regressors? That seems to be what they are after, but it is unclear.

I'd ask the authors' for their code, or find a better approach.

George, thanks a ton for going through the model. My supervisor is also of the view that the process is not very sound econometrically, which I guess is your point too.

"(That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor."

Could you be a bit clearer and comment on the following:

I need to construct a new variable VarX = Λ^hatZ_i,t-1where vector Z is (i.state1nonstate0 c.size c.return_on_equity_w i.below_tier1 c.provforNPA_to_net_advances_w i.Listeddummy1iflisted c.GdpGrowthRate c.inflation)
But as I mentioned, all these variables are interacted with Gap_i,t-1 so how can i use the estimated coefficients? For eg, my results show a coefficient for L.state1nonstate0#cL.tier1_gap1. Should I divide this coefficient by gapl1 and them multiply the dummy values of state1nonstate0 to get the estimates (and so on for all the variables in Z).

I will email the authors, but honestly i have had very little success in getting researchers to part with their codes.

Thanks and regards.
Leave a comment:
George Ford replied

03 Nov 2021, 18:06
Looks like voodoo, but what do I know?

Here's what a few minutes of review does for you (by someone unfamiliar with this literature).

This is a three stage model with 2 generated regressors: S1 to S2, and S2 to S3. Unaccounted for, so the hypothesis test are invalid (an error of unknown magnitude, though bootstrap usually increases SEs).

In any case, in Step 2 they take a portion of the prediction from Step 1 to craft a new variable (eq 3). The adjustment λ is assumed constant. This can be done by multiplying Beta*Z's. The goal is to get a mean prediction that they then create a new variable of the difference between the mean and the actual value (the gap) (the true value-predicted value). Not sure taking a portion of the regression makes sense due to scaling (the model includes year dummies and a lagged DV). Insert that generated regressor into Step 2 (which kinda looks like your equation), then take the prediction of S2 and insert into Step 3--another generated regressor.

Your model does not match the ones in that paper. There is no λ₀ just ΛZ_i,t-1 , and it appears λ is a predetermined constant in this paper. (That being so, the generated regressor is just the prediction from S2). If otherwise, then you can just multiply coefficients*Z's to get the prediction. Another generated regressor.

You'll need to bootstrap all 3 stages simultaneously for hypothesis testing. You'll have to code it using bsample.

I wonder if they are using predications of the DV for the all the generated regressors? That seems to be what they are after, but it is unclear.

I'd ask the authors' for their code, or find a better approach.
1 like
Leave a comment:

Announcement