How to save coefficients in order to use them as dependent variables in other regressions

Maiva Ropaul

Join Date: Apr 2014

Posts: 4
#1

How to save coefficients in order to use them as dependent variables in other regressions

18 Apr 2014, 13:57

Hello,

I would like to apply the Shang and Lee model (2007), derived from the Manski-Brock-Durlauf model (2001). This model allows to identify group effects
in a binary choice model of decision. The data set that I use contains both individual-level and country-level data.

The idea of this model is to run a first regression, which is a probit. The latent variable equation can be written:

Y= Xri d + Kr + Vri (1)

with Xri the individual characteristics of individual i in group r, Kr the group dummy variable, and Vri the error term.

A second regression (OLS) has to be made, using the following equation:

Kr = Zr b + er (2)

with er the error term.

My concern is to keep the estimations of Kr from the equation (1) as variables, before running estimation on (2).

I have tried the user-written package parmest, but it does not help me to achieve what I want. Indeed, it saves the coefficients of (1) by erasing the former dataset. This prevents me to do the second regression.

Could you please help me?

Best regards,

Maïva Ropaul
Tags: cross-country, parmest, probit
Joe Canner

Join Date: Mar 2014

Posts: 580
#2

18 Apr 2014, 14:47

Maiva,

Most Stata regression commands (including probit) output coefficients to the matrix e(b). You can use a loop to populate your data set with the coefficients as appropriate. I don't have time to elaborate at the moment, but perhaps more later. If you have more details on how you would like to incorporate the coefficients into your data set, that might be helpful.

Regards,
Joe
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

18 Apr 2014, 23:14

As Joe mentioned, you can loop through the column names of e(b) to pick out the vector of coefficients. The example below uses your names for the variables. It's a little verbose, but that is just to make it fairly explicit as to the flow of operations. You can do something analogous in Mata and you can automate the set of operations in a program, if you've got a lot of them to do.

Code:

version 13.1

clear *
set more off

set seed `=date("2014-04-19", "YMD")'
quietly set obs 200
generate int Kr = _n
generate double Zr = runiform()
quietly expand = 50
generate double Xri = runiform()
generate byte Y = runiform() > 0.5

*
* Begin here
*

// Split out Zr into a separate dataset for later
preserve
contract Kr Zr, freq(discard)
tempfile tmpfil0
quietly save `tmpfil0'
restore

// First regression
probit Y c.Xri i.Kr, nolog

// Get group coefficients
tempname B
matrix define `B' = e(b)
local column_names : colnames `B'

// Get first and last column indexes of group coefficients
local first 0
forvalues column_index = 1/`=colsof(`B')' {
    local column_name : word `column_index' of `column_names'
    if strmatch("`column_name'", "*.Kr") {
        if !`first' local first `column_index'
        else local last `column_index'
    }
}

// Subset the regression coefficient matrix
tempname Kr
matrix define `Kr' = `B'[1, `first'..`last']

// Retrieve group numbers into matrix
local column_names : colnames `Kr'
local vector_length = colsof(`Kr')
tempname A
matrix define `A' = J(1, `vector_length', .)
forvalues column_index = 1 /`vector_length' {
    local column_name : word `column_index' of `column_names'
    gettoken group_number group_varname : column_name, parse(".")
    local group_number : subinstr local group_number "b" "", all
    local group_number : subinstr local group_number "o" "", all
    matrix define `A'[1, `column_index'] = `group_number'
}
matrix define `Kr' = (`A' \ `Kr')

// Create outcome variable
drop _all
matrix define `Kr' = `Kr''
matrix colnames `Kr' = Kr coefficients
svmat double `Kr', names(col)

// Combine and do second regression
merge 1:1 Kr using `tmpfil0', assert(match) noreport nogenerate
regress coefficients Zr

exit

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35685
#4

19 Apr 2014, 01:30

I would like to apply the Shang and Lee model (2007), derived from the Manski-Brock-Durlauf model (2001).

Please note the request in the Advice Guide not to give minimal author (date) references.
Comment
Maiva Ropaul

Join Date: Apr 2014

Posts: 4
#5

19 Apr 2014, 08:33

Thank you very much for your help and comments!
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 530
#6

19 Apr 2014, 09:44

Maiva,

Nick asked you to give full references to authors in your post. The Statalist FAQ says:

Please give precise literature references. The literature familiar to you will be not be familiar to all members of Statalist. Do not refer to publications with just author and date, as in Sue, Grabbit, and Runne (1989). References should be in a form that you would expect in an academic publication or technical document. Good practice is to give a web link accessible to all or alternatively full author name(s), date, paper title, journal title, and volume and page numbers in the case of a journal article.

There is good reason to this request (apart from helping others to help you): Not only you but others reading this "thread" (= topic) should also profit from the advice / solution presented. Thus, you also should put some effort in helping to understand what is going on.

I did some search for Shang and Lee (2007) and Manski-Brock-Durlauf (2001), but my results are not conclusive. Please confirm that these are the references you had in mind (or correct them):
Brock, W. A. & Durlauf, S. N. (2001). Interactions-based models. In J. J. Heckman & E. Leamer (eds.), Handbook of Econometrics, Vol. 5 (pp. 3297-3380). Amsterdam: North-Holland.

Brock, W. A. & Durlauf, S. N. (2001). Discrete choice with social interactions. Review of Economic Studies, 68, 235–260.

Shang, Q. & Lee, L. (2011). Two-step estimation of endogenous and exogenous group effects. Econometric Reviews, 30, 173-207.
2 likes
Comment
Samuel R. Lucas

Join Date: Apr 2014

Posts: 16
#7

19 Apr 2014, 15:51

You might consider using the mixed commands or gllamm to estimate this multi-level model. It is my understanding -meprobit- would allow you to do what it seems you want to do, but, if not, I believe gllamm would work. If I understand your aim, I believe there are statistical problems (bias, poor standard errors, and so forth) in estimating the model sequentially.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#8

19 Apr 2014, 18:38

The OP might want to steer toward gsem before meprobit for this kind of problem. Just to be clear, I haven't had experience with this particular model and so can't speak with authority, but it would seem that gsem would be more suitable.

Last edited by Joseph Coveney; 19 Apr 2014, 18:52.
Comment
Samuel R. Lucas

Join Date: Apr 2014

Posts: 16
#9

20 Apr 2014, 00:43

Perhaps this is for my own edification, but the OP wrote:

The idea of this model is to run a first regression, which is a probit. The latent variable equation can be written:

Y= Xri d + Kr + Vri (1)

with Xri the individual characteristics of individual i in group r, Kr the group dummy variable, and Vri the error term.

A second regression (OLS) has to be made, using the following equation:

Kr = Zr b + er (2)

with er the error term.

which seems to be the same form as the 2-equation means-as-outcomes multilevel model:

1)Y = f(B0_j + B_1 Xri + Vri)
2)B0_j = G_0 + G_1 Zr + er

Equation 1 is a probit (or logit); equation 2 is an OLS; B0_j are the adjusted means of the groups (i.e., the coefficients on the OP's group-level dummy variables).

So, my question: what does -gsem- offer for estimating this model that -meprobit- does not? Or, are they equivalent?

Thanks a bunch!
Sam
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#10

20 Apr 2014, 06:06

After all this discussion I am still confused. The topic of discussion is

How to save coefficients in order to use them as dependent variables in other regressions

And the estimation he wants to do is

The idea of this model is to run a first regression, which is a probit. The latent variable equation can be written:

Y= Xri d + Kr + Vri (1)

with Xri the individual characteristics of individual i in group r, Kr the group dummy variable, and Vri the error term.

A second regression (OLS) has to be made, using the following equation:

Kr = Zr b + er (2)

with er the error term.

Kr are not coefficients but a set of dummy variables, aren't they? Is your question how to set up a variable with the coefficients on the different dummies in (1) as the values for the dependent variable in (2)? If so that could be done with

Code:

gen coefdep = _b[Kr1] * Kr1 + _b[Kr2] * Kr2 + _b[Kr3] * Kr3 + ...

right after the probit estimation. Where the dots just represent that you do that for all dummy variables, and the numbers after the Kr is to identify the different groups, so you understand that you multiply the dummy by its corresponding coefficient.. I am assuming you would want the coefficients as the values for the variable for their corresponding group observations.

Also, for the three references posted by Dirk Enzmann, I have found links to pdf versions of the different works:
Brock, W. A. & Durlauf, S. N. (2001). Interactions-based models. In J. J. Heckman & E. Leamer (eds.), Handbook of Econometrics, Vol. 5 (pp. 3297-3380). Amsterdam: North-Holland.

Brock, W. A. & Durlauf, S. N. (2001). Discrete choice with social interactions. Review of Economic Studies, 68, 235–260.

Shang, Q. & Lee, L. (2011). Two-step estimation of endogenous and exogenous group effects. Econometric Reviews, 30, 173-207

Last edited by Alfonso Sánchez-Peñalver; 20 Apr 2014, 06:13.

Alfonso Sanchez-Penalver
Comment
Maiva Ropaul

Join Date: Apr 2014

Posts: 4
#11

20 Apr 2014, 06:52

Dear Dirk,

You are right. The references I gave were not precise enough.
The econometric model that I would like to apply to my data is fully described in

Shang, Qingyan, and Lung-fei Lee. "Two-step estimation of endogenous and exogenous group effects." Econometric Reviews 30.2 (2011): 173-207.

and their model relies on

Brock, William A., and Steven N. Durlauf. "Discrete choice with social interactions." The Review of Economic Studies 68.2 (2001): 235-260.

Dear Alfonso,

Indeed, you have written what I intended to do.

I am confused concerning the different answers that were given. It seemed to me that the probit command could help me to estimate properly the coefficients. I read nothing specific in the paper by Shang and Lee on particular bias or standard error problems.
Comment
Maiva Ropaul

Join Date: Apr 2014

Posts: 4
#12

20 Apr 2014, 07:23

Hello,

Here are some precisions on the model to be estimated.

This is a binary choice model, with Yri=1 or 0, for an individual i in group r.

The individual behavior model is written as

\begin{equation}
Y_{ri}* = x_{ri} \delta + I_{r} \alpha_{r} + \epsilon_{ri}
\end{equation}

with

\begin{equation}
\alpha_{r} = s_{r} \beta_{1} + E_{r}(x) \beta_{2} + E_{r}(Y) \beta_{3} + u_{r}
\end{equation}

with alpha_r the coefficient for the group r, and it measures the group fixed effect. Ur is the unobserved group effect.
A proxy is used to measure Er(Y), which is the expected average choice in group r.
Thus, this second equation is estimated using an IV approach, with Er(Y) considered as an endogenous variable .
The second equation to estimate becomes:

\begin{equation}
\hat{\alpha_{rm}} = s_{r} \beta_{1} + \hat{E}_{rm}(x) \beta_{2} + \hat{E}_{rm}(Y) \beta_{3} + u_{r} + v_{rm}
\end{equation}

Consequently, I am confused about the different advantages and inconvenients of probit, meprobit, gsem and gllam here.

Last edited by Maiva Ropaul; 20 Apr 2014, 07:26.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#13

21 Apr 2014, 04:37

So, my question: what does -gsem- offer for estimating this model that -meprobit- does not? Or, are they equivalent?

The OP's original model involves a fixed-effects probit as its first equation, and a linear model as its second. meprobit is not really set up to do that, as far as I know. gsem allows both generalized linear models as one equation and linear models as second equation. As I mentioned, I’ve never actually had any experience in this kind of model, but it might be possible to put in indicator variables for groups (c.Xri 2.Kr 3.Kr . . . j.Kr) in the first equation to get the fixed effects levels, and then reference them (that is, their parameters) as endogenous variables in a succeeding equation. Or perhaps reference each one separately in a series of succeeding equations (linear models), constraining their regression coefficients (c.Zr) equal. You should be able do the first (fixed-effects probit) part in gsem—it even allows factor variables—but I'm not sure whether or how about the second part, frankly.

Again, I haven’t thought about it, not having to deal with anything like this before, but, given that the OP is dealing with fixed-effects probit and multiple equations, between the two, gsem just struck me as a better place to start looking than meprobit.

As to your point, I agree, that if we’re dealing with a conventional multilevel / hierarchical probit, then

Code:

gsem (Y <- c.Xri M1[Kr], probit) (M1[Kr] <- c.Zr), latent(M1) nolog

doesn’t buy you much over

Code:

meprobit Y c.Xri c.Zr || Kr: , nolog

. . . although I’m beginning to warm to the former’s syntax: with the latter’s syntax, I keep wanting to do something like

Code:

meprobit Y c.Xri || Kr: Zr, nolog

at first, which wastes time trying to untangle.

Given that the OP now says that real model involves instrumental variables in the second part, I wouldn’t have much to offer; I don’t encounter instrumental variable models in my line of work. Perhaps it’s best just to follow the literature and do the analysis in two stages, if that’s what it’s saying: fit the fixed-effects probit model separately, collect the indicator variables’ regression coefficients as shown above, and then use them in a Stata command that is explicitly designed for instrumental variable regression for the second stage. Regardless, I recommend waiting for the experts to chime in.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#14

21 Apr 2014, 04:50

Kr are not coefficients but a set of dummy variables, aren't they? Is your question how to set up a variable with the coefficients on the different dummies in (1) as the values for the dependent variable in (2)? If so that could be done with

Code:

gen coefdep = _b[Kr1] * Kr1 + _b[Kr2] * Kr2 + _b[Kr3] * Kr3 + ...

right after the probit estimation. Where the dots just represent that you do that for all dummy variables, and the numbers after the Kr is to identify the different groups, so you understand that you multiply the dummy by its corresponding coefficient.

You certainly could do that, especially if there are only a few groups. I was imagining that the OP has dozens or hundreds of groups.
Comment
Roger Newson

Join Date: Apr 2014

Posts: 317
#15

21 Apr 2014, 12:28

Hello Maiva

The parmest package does not have to overwrite the original dataset. You can make it save the output dataset (or resultsset) in a disk file, using the -saving- option.

If you have multiple by-groups, and you want to save the regression coefficients for each by-group and use them as Y-variables in a regression with respect to something else (eg one of the by-variables), then you can use the ;parmby module of the parmest package to run the regression for each by-group, and create an output dataset (or resultsset) with 1 observation per parameter per by-group and data on the parameter names, parameter estimates, confidence intervals and P-values. This resultsset can be saved to a disk file or written to the memory, overwriting the original dataset. Whichever you do, you can then drop the parameters you do not want to use as the Y-variable in a regression, and regress the wanted parameters.

I hope this helps.

Best wishes

Roger
Comment

Announcement

How to save coefficients in order to use them as dependent variables in other regressions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment