Estimating the Firm Fixed Effect

Mike Kraft

Join Date: Dec 2014

Posts: 328
#1

Estimating the Firm Fixed Effect

30 Jun 2016, 12:43

Dear all;

This is a new post that better explains my previous post and show how fixed effect component is estimated in a paper by Billings and Morton (2000). The entire paper is attached to the older post but relevant parts are attached to this post.

Older post is http://www.statalist.org/forums/foru...-fixed-effects

I created a new post as I believe that the document here shows much better and in details the proper way to estimate firm fixed effect in a panel.

My question:
How can I estimate LAG and FIRM components following the authors approach (attached) ?

Thanks a lot
Attached Files

From Billings and Morton.docx (539.9 KB, 1 view)
Tags: None
Mike Kraft

Join Date: Dec 2014

Posts: 328
#2

30 Jun 2016, 13:02

There is a post by Steve Johnson
that seems to be relevant to some extent. But I cannot amend it for my case.

http://www.statalist.org/forums/foru...eraction-terms
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

30 Jun 2016, 13:24

I think you want to do this:

Code:

xtset firm year gen LAG = . gen FIRM = . forvalues y = 1972/2015 { xtreg BTM R L1.R L2.R L3.R L4.R i.year if inrange(year, `y', `y'-4), fe predict lag, xb predict firm_effect, u replace LAG = lag if year == `y' replace FIRM = firm_effect if year == `y' drop lag firm_effect }

See -help xtreg- and -help xtreg postestimation-.

For furture reference, the posting of Microsoft Office files as attachments is discouraged. There are some frequent responders who don't use Office at all. There are others who do, but who will not download those files from a stranger because they can contain malware. Your original post with a pdf raises fewer problems.

Last edited by Clyde Schechter; 30 Jun 2016, 13:24. Reason: Correct error in code
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#4

30 Jun 2016, 13:32

I think the correct approach could be something like:

1- estimate the mean of BTM and changeMV by firm :
egen BTMmeanF=mean(BTM),by (firm)
egen changeMVmeanF=mean(changeMV), by (firm)

2- estimate the mean of BTM and changeMV by year:
egen BTMmeanY=mean(BTM),by (yr)
egen changeMVmeanY=mean(changeMV), by (yr)

3- estimate the grand mean for BTM abd changeMV:

gen BTMgrandmean= ?
gen changeMVgrandmean= ?

5- generate the variables:

gen btm=BTM-BTMmeanF-BTMmeanY + BTMgrandmean
gen mv=changeMV-changeMVmeanF-changeMVmeanY + changeMVgrandmean

4- running a loop to estimate betas:

gen beta1=.
gen beta2=.
gen beta3=.
gen beta4=.
gen beta5=.
gen beta6=.
gen beta7=.

forvalues y = 1972/2015 {
local low = `y' - 4
regress btm mv l.mv l2.mv l3.mv l4.mv l5.mv l6.mv if inrange(yr, `low', `y'), noconstant
estimating betas???
}

5-estimating the LAG component:
gen LAG= beta1*mv+beta2*l.mv+beta3*l2.mv+beta4*l3.mv+beta5* l4.mv+beta6*l5.mv+beta7*l6.mv

6- estimating the FIRM component:

gen FIRM=(BTMmeanF-BTMgrandmean)- (beta1*(changeMVmeanF-changeMVgrandmean)+beta2* ???

This is the best I can do, I will be so thankful if you can help. Please!
Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

30 Jun 2016, 13:39

If you read the help for -xtreg-, and in particular, the method by which -xtreg, fe- works, you will see that it accomplishes all of that for you in just a few lines, as I suggested in #3. The only part of this that I left out of that code was saving the coefficients of the lags, but your original question suggests that you really don't want those coefficients except as a step towards estimating FIRM and LAG effects: but using the -predict- command, you don't need the coefficients.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#6

30 Jun 2016, 13:46

Thanks
I reattach the office document after I have converted it into pdf file.

Thanks Clyde.
I think the code is not doing what I aim:
1- in your code, lag will include year fixed effects
2- I have actually quarterly data for more than 30 years, so when I amend your code to include quarters in this massive panel. This will be so difficult to implement.

Since I have quarterly data, I want to run the regression for each 20 quarters. I attach a sample of my data .

Do you think, it is possible to follow the approach using the grand means, etc. as per the one page pdf file attached here?

Attached Files

From Billings and Morton.pdf (289.7 KB, 1 view)

BTMforstata.dta (1.21 MB, 1 view)
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#7

30 Jun 2016, 13:57

Clyde, I wrote an additional post before I see your reply in #5.
few points:
1- I see that predict, u will get the firm fixed effect component. Will this be equivalent to equation(5) in my attached file?
2- Also, the LAG component in your code will include the year fixed effect. According to equation (4) in my attached file, they should not be included.
4- Can the code be amended to run over the quarters (see my my attached sample of data, please) and to incorporate the points raised here?
5- would it be possible to follow the alternative approach in the one page pdf file as well? As stata will not be able to handle the massive number of the quarter dummies , can this be fixed?

Thanks a lot
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

30 Jun 2016, 14:23

1- I see that predict, u will get the firm fixed effect component. Will this be equivalent to equation(5) in my attached file?

Yes.

2- Also, the LAG component in your code will include the year fixed effect. According to equation (4) in my attached file, they should not be included.

You are correct. Sorry for the oversight on my part. Corrected code is:

Code:

xtset firm year gen LAG = . gen FIRM = . forvalues y = 1972/2015 { xtreg BTM R L1.R L2.R L3.R L4.R i.year if inrange(year, `y', `y'-4), fe predict firm_effect, u replace LAG = _b[R]*R + _b[L.R]*L1.R + _b[L2.R]*L2.R + _b[L3.R]*L3.R + _b[L4.R]*L4.R /// if year == `y' replace FIRM = firm_effect if year == `y' drop firm_effect }

4- Can the code be amended to run over the quarters (see my my attached sample of data, please) and to incorporate the points raised here?

Yes. But first you have to convert your datadate from a daily variable to a quarterly variable:

Code:

gen quarter = qofd(datadate) format quarter %tq

Then just replace year by quarter everywhere in the code, and change the 1972/2015 loop boundaries to the appropriate numeric values of quarter.

5- would it be possible to follow the alternative approach in the one page pdf file as well? As stata will not be able to handle the massive number of the quarter dummies , can this be fixed?

I doubt this will be a problem. You only have 176 quarters between 1972 and 2015 inclusive. That will not come anywhere near the limits of a data set size. Moreover, in any given regression, there are only 5 years involved, so only 4 year indicator ("dummy") variables. So you won't come anywhere near the limits of matrix sizes for the regression. And with only 4 year indicators in the regression it won't even be particularly slow.

Added: And you don't have to worry about a large number of indicator variables for the firms: in -xtreg, fe- there are no actual firm indicator variables created--the calculations are done by differencing from the panel mean, just as in the formulas in your article.

I think Stata will handle this just fine, even if you are only running the IC flavor.

Last edited by Clyde Schechter; 30 Jun 2016, 14:28.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#9

30 Jun 2016, 16:34

Thanks a lot

I have amended the code to run the regression for each successive 5 quarters as following:

gen fqdate = qofd(datadate)
format fqdate %tq

destring gvkey,replace
duplicates drop gvkey fqdate,force
xtset gvkey fqdate, quarterly

levelsof fqdate, local(levels)
foreach p of local levels {
xtreg btm dMV L1.dMV L2.dMV L3.dMV L4.dMV i.fqdate if inrange(fqdate, `p', `p'-4), fe
predict firm_effect, u
replace LAG = _b[dMV]*dMV + _b[L.dMV]*L1.dMV + _b[L2.dMV]*L2.dMV + _b[L3.dMV]*L3.dMV + _b[L4.dMV]*L4.dMV ///
if fqdate == `p'
replace FIRM = firm_effect if fqdate == `p'
drop firm_effect
}

I got an error message:
insufficient observations
r(2001);

1- Is the error message related to my modification for the code? Is my modified code correct?

can you please help with this? Thanks

Last edited by Mike Kraft; 30 Jun 2016, 16:36.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

01 Jul 2016, 07:03

So, some quarters, the total number of observations available is too small to carry out a regression analysis. Remember that to be included in a regression, an observation must have non-missing values for all of the regression variables. So, in your case, DMV and all of its first four lags must be non-missing, as well as btm. The obvious suspects here are the first four quarters in your data set: you are guaranteed that some or all of the lags will have missing values, even if your data set contained no missings at all. So the first thing is, you need to start your loop from the fifth quarter in your data set, not the first. If your file has no missing data, then that will completely solve that problem. In addition, the -if inrange(fqdate, `p', `p'-4)- condition will always be false because `p' > `p'-4, so nothing is greater than `p' and less than `p'-4. (This is a mistake you initially made and I copied without noticing, and then you copied back from me;.)

However, if, as is usually the case in real world data, there are missing values scattered about, you may find that there are other quarters where the data remains too sparse just by the coincidence of missing values of different lags of DMV or of btm in different observations. Now the question then becomes whether this type of occurrence is expected and acceptable, or if it signals a problem with your data file (i.e. perhaps there shouldn't be missing data, or at least not that much.)

The following code should solve these problems for you:

Code:

gen fqdate = qofd(datadate) format fqdate %tq destring gvkey,replace duplicates drop gvkey fqdate,force xtset gvkey fqdate, quarterly summ fqdate local first = r(min) + 4 local last = r(max) forvalues p = `first'/`last' { capture noisily xtreg btm L(0/4).dMV i.fqdate if inrange(fqdate, `p'-4, `p'), fe if c(rc) == 0 { predict firm_effect, u replace LAG = _b[dMV]*dMV + _b[L.dMV]*L1.dMV + _b[L2.dMV]*L2.dMV + _b[L3.dMV]*L3.dMV + _b[L4.dMV]*L4.dMV /// if fqdate == `p' replace FIRM = firm_effect if fqdate == `p' drop firm_effect } else if c(rc) == 2001 // INSUFFICIENT OBSERVATIONS display as error "Insufficient Observations for Analysis in " %tq =`p' } else { // OTHER UNANTICIPATED ERROR display as error "Unexpected Error Encountered Analyzing" %tq =`p' exit `c(rc)' } }

This code will start at the fifth quarter in your data set and will attempt each regression in turn. If the regression runs, it will update the values of LAG and FIRM as before. If it fails due to insufficient observations you will get an error message telling you which quarter(s) gave rise to the problem but it will continue doing additional regressions. You can then separately investigate whether the insufficient data reflects errors in your dataset or not. If the regression fails for other reasons (whatever those might be) you will also get an error message telling you where things went astray, but given that this is not anticipated, it will halt execution there.

I have also shortened the regression command, combining all the dMV terms using the L(0/4).operator. That will not change the behavior of the program: it just makes the code easier to read and understand.

Note that even if you don't get an "insufficient observations" error, that does not necessarily mean that you have enough observations in the estimation sample to do a meaningful regression. So some of your regressions could still have ridiculously small samples leading to several of the variables dropped due to collinearity or coefficient estimates with no standard errors. You can review the output and decide for yourself if this is a problem or not given your purposes.

Finally, let me point out a potential problem earlier in your code. -duplicates drop gvkey fqdate, force- is an invitation to trouble. If gvkey and fqdate uniquely identify the observations in your data set (as they normally would in panel data) then you don't need the -force- option. The fact that you chose to use that option suggests to me that they don't. But then consider that if you have multiple observations for some gvkey and fqdate, they may have different values of bmt or dMV. If that is true, by using -force- you are picking one of those sets of values at random, and in an irreproducible way. That is, if you run the code a second time you will probably get different results. If the observations having the same values for gvkey and fqdate also have the same values for the other variables, then that isn't a problem. But, in that case, the way to go is -duplicates drop-. This will only remove observations that agree on every single variable.

If after -duplicates drop- you still have multiple observations for the same gvkey and fqdate, then your data set is incorrect and it is not ready for this analysis. You need to find those observations and determine which among the sets of duplicates is correct (or perhaps they need to be combined in some way such as averaging, etc.) The proper disposition of the duplicates depends on the meaning and sources of your data and your analytic goals, so I can't advise. What I can say is that simply discarding some of them in an arbitrary and irreproducible way is never the correct solution.

Added: Going forward please post code between code delimiters, as I have done in all of these posts) so that it formats nicely (see FAQ #12).

Last edited by Clyde Schechter; 01 Jul 2016, 07:10.
Comment

Mike Kraft

Join Date: Dec 2014
Posts: 328

#11

01 Jul 2016, 08:34

Thanks a lot for working on this.
Well, I have seen some missing variables in dMV, and although I have dropped all missing values and applied the following code, I got an error message. I noted that you did not generate LAG and FIRM so I have done that as well.

The code is:

Code:

gen fqdate = qofd(datadate)
format fqdate %tq

destring gvkey,replace

xtset gvkey fqdate, quarterly

drop if btm==.
drop if dMV==.

gen LAG=.
gen FIRM=.

summ fqdate
local first = r(min) + 4
local last = r(max)
forvalues p = `first'/`last' {
    capture noisily xtreg btm L(0/4).dMV i.fqdate if inrange(fqdate, `p'-4, `p'), fe
   if c(rc) == 0 {
        predict firm_effect, u
        replace LAG = _b[dMV]*dMV + _b[L.dMV]*L1.dMV + _b[L2.dMV]*L2.dMV + _b[L3.dMV]*L3.dMV + _b[L4.dMV]*L4.dMV ///
            if fqdate == `p'
        replace FIRM = firm_effect if fqdate == `p'
        drop firm_effect
   }
    else if c(rc) == 2001 // INSUFFICIENT OBSERVATIONS
        display as error "Insufficient Observations for Analysis in " %tq =`p'
    }
    else {    // OTHER UNANTICIPATED ERROR
        display as error "Unexpected Error Encountered Analyzing" %tq =`p'
        exit `c(rc)'
    }
}

The error message is:

insufficient observations
{ required
r(100);

end of do-file

r(100);

Thanks for your note regarding the duplicates issue.

Unfortunately, the problem is still not solved now! I will be thankful if you can advise, Please!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

01 Jul 2016, 08:45

Sorry, my coding has been rather sloppy throughout. Yes, you are right that the -gen LAG = .- and -gen FIRM = .- statements need to be there. You apparently removed them in your code in #9, and I copied that without noticing because I was focusing on the "no observations" issue, which is not related to that.

As for the other, the line

Code:

else if c(rc) == 2001 // INSUFFICIENT OBSERVATIONS

should be

Code:

else if c(rc) == 2001 { // INSUFFICIENT OBSERVATIONS

I'm not quite sure how that happened: in the copy of the code I have in my do-file that { is there, but the comment is not. I think when I added the // INSUFFICIENT OBSERVATIONS comment here in the Forum editor I must have positioned the cursor incorrectly so that I overwrote the { instead of placing the comment after it. But this emphasizes, through my own delinquency and its consequences, the importance of the lesson I emphasize so often here on Statalist: copy and paste code and results directly into the forum and do not edit them in any way. I should practice what I preach.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#13

04 Jul 2016, 12:58

Hi
I want to inform you that the code worked after I changed

Code:

local first = r(min) + 4

to

Code:

local first = r(min) + 8

and also

Code:

(fqdate, `p'-4, `p')

to

Code:

(fqdate, `p'-8, `p')

then it proceeded without producing error messages and estimated the components. I want to note that dMV is a change variable claculated as the difference between MV q(t-1) and q(t) and divided by MV at q(t-1). I expected if I update the code to 5 that it should be working but it appears that it is feasible with a larget set (8).

Does anyone also know how to calculate adjusted R2 for the rolling fixed effect regressions.I think we will need to generate a variable nd then being updated for the last year in the panel each time the rolling regression is estimated in the loop..

Any help?
Comment

Announcement