Fixed effect: i.fe or xtreg?

Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#1

Fixed effect: i.fe or xtreg?

18 Jul 2019, 16:45

Hi,

I'm analyzing patent data for my thesis. I have a dataset with unique patents from 1999-2004, so no duplicates. I'd like to run two different regressions with two fixed effects. The first fixed effect is a year fixed effect, from 1999 until 2004. The second is a regional fixed effect based on the CBSA location of the first inventor of the patent. But I have my doubts on the way to execute it.

1st regression: poisson regression( because it is a count data variable)
Number of inventors in patent = indepvar+ Year fixed effect + regional fixed effect

2nd regression: lineair regression

Depvar(i.e. a probability) = indepvar+ Year FE+ regional FE

If my research is right there are 2 different ways of setting the fixed effect:

1) adding i. :
poisson depvar indepvar i.year i.cbsa
regression depvar indepvar i.year i.cbsa

2) via panel data:
xtset cbsa year
xtpoisson depvar indepvar year cbsa, fe
xtreg depvar indepvar year cbsa, fe

My questions:
-is there a preference between the 2 possibilities? should I expect a difference in the outcome between the 2? for example on Rsquared or significance
-I'm I allowed to set it as panel data? The patent-id is only included in the data set once, not reoccurring throughout the years

Thanks
Ludo
Tags: fixed effects, panel data
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

18 Jul 2019, 17:00

Second question first: as long as cbsa and year jointly identify unique observations in the data set, your -xtset- command is fine. The fact that patent_id only occurs once does not matter. It's panel data with cbsa as the panel, not patents.

First question: For the linear regression you can do either

Code:

xtset cbsa year xtreg depvar indepvar i.year, fe

OR

Code:

regress depvar indepvar i.cbsa i.year

The results will be the same.

For the Poisson regression, however, you have only one legitimate option:

Code:

xtset cbsa year xtpoisson depvar indepvar i.year, fe

The -poisson depvar indepvar i.year i.cbsa- command is syntactically legal but is statistically invalid due to what is known as the "incidental parameters problem" (you can Google it). The use of i.panelvar instead of the -xt..., fe- analysis is only correct for linear regression.

If you include i.cbsa in either -xtreg, fe- or -xtpoisson, fe-, the i.cbsa variables will be omtited due to colinearity with the cbsa fixed effects already provided automatically by the -xtwhatever- command. No harm done, but conceptually an error.

Also, it makes a big difference whether you specify year or i.year. If you specify year, it is treated as a continuous variable and you are modeling a linear time trend. If you specify i.year, it is treated as a discrete variable and you are modeling yearly idiosyncratic shocks to the outcome variable. Either one might be correct, depending on circumstances, but you need to decide which it is.

As an aside, it is the norm in this community to use our real given and surnames as our username. This practice promotes collegiality and professionalism. Although you cannot edit your user profile to change your user name, you can click on contact us in the lower right corner of this page and then send a message to the system administrator to make that change for you. Your adherence to this practice will be appreciated.
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

18 Jul 2019, 18:48

Addition to above:

Although both

Code:

xtset cbsa xtreg depvar indvar, fe // AND regress depvar indvar i.cbsa

will produce the same results, -xtreg, fe- will be much faster if the number of cbsa's is large. Also, the -regress- output will be littered with coefficient estimates for all of the cbsa indicators--which are usually meaningless and seldom of interest even when they are not meaningless. So for these reasons, the -xtreg, fe- approach is more practical.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#4

18 Jul 2019, 18:55

Other than the linear model, the Poisson is the only case where included the dummies in a pooled analysis and eliminating them using a condition argument give the same estimates on the parameters of internet but is use xtpoisson. It’s faster and produces the correct standard errors. But you should use the vce(robust) option. In the linear case, use the vce(cluster cbsa) option.
3 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

18 Jul 2019, 19:23

the Poisson is the only case where included the dummies in a pooled analysis and eliminating them using a condition argument give the same estimates on the parameters of internet

I did not know that. Thank you.
Comment
Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#6

19 Jul 2019, 05:58

Originally posted by Clyde Schechter View Post

Second question first: as long as cbsa and year jointly identify unique observations in the data set, your -xtset- command is fine. The fact that patent_id only occurs once does not matter. It's panel data with cbsa as the panel, not patents.

There will probably be multiple observations with the same combination of cbsa and year, so not really unique... Does this interfere with panel data?
Also I'd like to include a third fixed effect on inventor id, is it still possible to use xtset and xt reg?
Comment
Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#7

19 Jul 2019, 06:01

Originally posted by Jeff Wooldridge View Post

Other than the linear model, the Poisson is the only case where included the dummies in a pooled analysis and eliminating them using a condition argument give the same estimates on the parameters of internet but is use xtpoisson. It’s faster and produces the correct standard errors. But you should use the vce(robust) option. In the linear case, use the vce(cluster cbsa) option.

What would the code look like when combining the xtreg/xtpoission , the vce_option and three fixed effects? Because i'm not familiar with the vce_option.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#8

19 Jul 2019, 08:01

Originally posted by Ludovic VC View Post

What would the code look like when combining the xtreg/xtpoission , the vce_option and three fixed effects? Because i'm not familiar with the vce_option.

I'm still not clear on how the data are structured. What is the cross-sectional unit? The inventor? I'm picturing that in each year you know how many patents were awarded to each inventor. But I can't answer until I know more. Notice that if you show us a sample of data we could be more helpful.
Comment
Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#9

19 Jul 2019, 09:19

A snapshot of my data set

The data set contains patents from 1999 until 2004. For each patent a single inventor (invt_id) is picked, so only unique patent_id-invt_id pairs in the set. His location is set by Zipcode and cbsacode. Via the zipcode external indepvar data is linked (here providers).

2 regression:

1) Team size in the patent (depvar) = indepvar + year fixed effects + regional fe (by cbsa) + inventor fe (+some control variables not included in the snapshot)

This is a count data variable, so poisson is used.
indepvar are variables representing internet characteristics

2) co-inventor in the patent situated in the same state/county (depvar) = indepvar + year fe + region fe+ inventor fe

for this i would use a normal regression

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

19 Jul 2019, 10:10

There will probably be multiple observations with the same combination of cbsa and year, so not really unique... Does this interfere with panel data?

Yes and no. If you need to use lag or lead operators, or run models with autoregressive correlation structure, then this is a problem as there would be no unique definition of "previous" or "next." But if you don't need those things for your purposes, then just go ahead and -xtset cbsa- (leave out the time variable) and you're fine with other -xt- commands.

Also I'd like to include a third fixed effect on inventor id, is it still possible to use xtset and xt reg?

Just ad i.inventor to the variable list of the -xtreg- command; leave -xtset- as it was. And it's -xtreg-, not -xt reg-.
Comment
Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#11

20 Jul 2019, 09:51

Taking into account both your posts I have the following in mind:

Code:

xtset cbsa xtreg depvar indepvar i.year i.inventor, fe vce(cluster cbsa) and xtset cbsa xtpoisson depvar indepvar i.year i.inventor, fe vce(robust)

Does this make more sense?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

20 Jul 2019, 10:59

I think so.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#13

21 Jul 2019, 07:11

Agree with Clyde. Your data set isn’t a traditional panel because the cross-sectional — patent — appears only once. An inventor can have more than one but the outcome variable is not for the inventor. Your code appears to trick Stata into doing the right thing: cbsa effects, inventor effects, time effects and clustering at the cbsa level.
Comment
Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#14

21 Jul 2019, 16:16

Originally posted by Jeff Wooldridge View Post

Your data set isn’t a traditional panel because the cross-sectional — patent — appears only once.

Indeed, that is what I was worried about. Anyway, I'll try this code and also check my supervisor's point of view.

Thanks to both of you for your help and feedback.
Comment

Ludovic Van Cau

Join Date: Jul 2019
Posts: 23

#15

03 Aug 2019, 11:00

Originally posted by Jeff Wooldridge View Post

But you should use the vce(robust) option. In the linear case, use the vce(cluster cbsa) option.

I have a question regarding this vce option. Why shouldn't I use a vce(cluster cbsa)? I ran both regresssion and indeed found different outcomes, but I don't understand why.

1)

Code:

nbreg teamsize internetdummy invt_network_size i.cbsacode i.appyear, vce(robust)

Negative binomial regression                    Number of obs     =    462,187
                                                Wald chi2(497)    =          .
Dispersion           = mean                     Prob > chi2       =          .
Log pseudolikelihood = -851639.86               Pseudo R2         =     0.0225

-----------------------------------------------------------------------------------
                  |               Robust
         teamsize |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
    internetdummy |  -.0138526   .0022496    -6.16   0.000    -.0182618   -.0094434
invt_network_size |   .0094635   .0001072    88.32   0.000     .0092535    .0096735
                  |

and 2)

Code:

nbreg teamsize internetdummy invt_network_size i.cbsacode i.appyear, vce(cluster cbsacode)

Negative binomial regression                    Number of obs     =    462,187
                                                Wald chi2(6)      =          .
Dispersion           = mean                     Prob > chi2       =          .
Log pseudolikelihood = -851639.86               Pseudo R2         =     0.0225

                                  (Std. Err. adjusted for 495 clusters in cbsacode)
-----------------------------------------------------------------------------------
                  |               Robust
         teamsize |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
    internetdummy |  -.0138526   .0092945    -1.49   0.136    -.0320695    .0043643
invt_network_size |   .0094635   .0006227    15.20   0.000      .008243     .010684

There is a big difference in significance for 'internetdummy' variable.
Which should be the one to go with?

Announcement

Fixed effect: i.fe or xtreg?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment