Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effect: i.fe or xtreg?

    Hi,

    I'm analyzing patent data for my thesis. I have a dataset with unique patents from 1999-2004, so no duplicates. I'd like to run two different regressions with two fixed effects. The first fixed effect is a year fixed effect, from 1999 until 2004. The second is a regional fixed effect based on the CBSA location of the first inventor of the patent. But I have my doubts on the way to execute it.

    1st regression: poisson regression( because it is a count data variable)
    Number of inventors in patent = indepvar+ Year fixed effect + regional fixed effect

    2nd regression: lineair regression

    Depvar(i.e. a probability) = indepvar+ Year FE+ regional FE


    If my research is right there are 2 different ways of setting the fixed effect:

    1) adding i. :
    poisson depvar indepvar i.year i.cbsa
    regression depvar indepvar i.year i.cbsa

    2) via panel data:
    xtset cbsa year
    xtpoisson depvar indepvar year cbsa, fe
    xtreg depvar indepvar year cbsa, fe

    My questions:
    -is there a preference between the 2 possibilities? should I expect a difference in the outcome between the 2? for example on Rsquared or significance
    -I'm I allowed to set it as panel data? The patent-id is only included in the data set once, not reoccurring throughout the years

    Thanks
    Ludo

  • #2
    Second question first: as long as cbsa and year jointly identify unique observations in the data set, your -xtset- command is fine. The fact that patent_id only occurs once does not matter. It's panel data with cbsa as the panel, not patents.

    First question: For the linear regression you can do either

    Code:
    xtset cbsa year
    xtreg depvar indepvar i.year, fe
    OR
    Code:
    regress depvar indepvar i.cbsa i.year
    The results will be the same.

    For the Poisson regression, however, you have only one legitimate option:

    Code:
    xtset cbsa year
    xtpoisson depvar indepvar i.year, fe
    The -poisson depvar indepvar i.year i.cbsa- command is syntactically legal but is statistically invalid due to what is known as the "incidental parameters problem" (you can Google it). The use of i.panelvar instead of the -xt..., fe- analysis is only correct for linear regression.

    If you include i.cbsa in either -xtreg, fe- or -xtpoisson, fe-, the i.cbsa variables will be omtited due to colinearity with the cbsa fixed effects already provided automatically by the -xtwhatever- command. No harm done, but conceptually an error.

    Also, it makes a big difference whether you specify year or i.year. If you specify year, it is treated as a continuous variable and you are modeling a linear time trend. If you specify i.year, it is treated as a discrete variable and you are modeling yearly idiosyncratic shocks to the outcome variable. Either one might be correct, depending on circumstances, but you need to decide which it is.

    As an aside, it is the norm in this community to use our real given and surnames as our username. This practice promotes collegiality and professionalism. Although you cannot edit your user profile to change your user name, you can click on contact us in the lower right corner of this page and then send a message to the system administrator to make that change for you. Your adherence to this practice will be appreciated.

    Comment


    • #3
      Addition to above:

      Although both
      Code:
      xtset cbsa
      xtreg depvar indvar, fe
      
      // AND
      regress depvar indvar i.cbsa
      will produce the same results, -xtreg, fe- will be much faster if the number of cbsa's is large. Also, the -regress- output will be littered with coefficient estimates for all of the cbsa indicators--which are usually meaningless and seldom of interest even when they are not meaningless. So for these reasons, the -xtreg, fe- approach is more practical.

      Comment


      • #4
        Other than the linear model, the Poisson is the only case where included the dummies in a pooled analysis and eliminating them using a condition argument give the same estimates on the parameters of internet but is use xtpoisson. It’s faster and produces the correct standard errors. But you should use the vce(robust) option. In the linear case, use the vce(cluster cbsa) option.

        Comment


        • #5
          the Poisson is the only case where included the dummies in a pooled analysis and eliminating them using a condition argument give the same estimates on the parameters of internet
          I did not know that. Thank you.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Second question first: as long as cbsa and year jointly identify unique observations in the data set, your -xtset- command is fine. The fact that patent_id only occurs once does not matter. It's panel data with cbsa as the panel, not patents.
            There will probably be multiple observations with the same combination of cbsa and year, so not really unique... Does this interfere with panel data?
            Also I'd like to include a third fixed effect on inventor id, is it still possible to use xtset and xt reg?

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              Other than the linear model, the Poisson is the only case where included the dummies in a pooled analysis and eliminating them using a condition argument give the same estimates on the parameters of internet but is use xtpoisson. It’s faster and produces the correct standard errors. But you should use the vce(robust) option. In the linear case, use the vce(cluster cbsa) option.
              What would the code look like when combining the xtreg/xtpoission , the vce_option and three fixed effects? Because i'm not familiar with the vce_option.

              Comment


              • #8
                Originally posted by Ludovic VC View Post

                What would the code look like when combining the xtreg/xtpoission , the vce_option and three fixed effects? Because i'm not familiar with the vce_option.
                I'm still not clear on how the data are structured. What is the cross-sectional unit? The inventor? I'm picturing that in each year you know how many patents were awarded to each inventor. But I can't answer until I know more. Notice that if you show us a sample of data we could be more helpful.

                Comment


                • #9
                  A snapshot of my data set
                  Click image for larger version

Name:	Schermafbeelding 2019-07-19 om 17.11.52.png
Views:	1
Size:	47.1 KB
ID:	1508437

                  The data set contains patents from 1999 until 2004. For each patent a single inventor (invt_id) is picked, so only unique patent_id-invt_id pairs in the set. His location is set by Zipcode and cbsacode. Via the zipcode external indepvar data is linked (here providers).

                  2 regression:

                  1) Team size in the patent (depvar) = indepvar + year fixed effects + regional fe (by cbsa) + inventor fe (+some control variables not included in the snapshot)

                  This is a count data variable, so poisson is used.
                  indepvar are variables representing internet characteristics


                  2) co-inventor in the patent situated in the same state/county (depvar) = indepvar + year fe + region fe+ inventor fe

                  for this i would use a normal regression


                  Attached Files

                  Comment


                  • #10
                    There will probably be multiple observations with the same combination of cbsa and year, so not really unique... Does this interfere with panel data?
                    Yes and no. If you need to use lag or lead operators, or run models with autoregressive correlation structure, then this is a problem as there would be no unique definition of "previous" or "next." But if you don't need those things for your purposes, then just go ahead and -xtset cbsa- (leave out the time variable) and you're fine with other -xt- commands.

                    Also I'd like to include a third fixed effect on inventor id, is it still possible to use xtset and xt reg?
                    Just ad i.inventor to the variable list of the -xtreg- command; leave -xtset- as it was. And it's -xtreg-, not -xt reg-.

                    Comment


                    • #11
                      Taking into account both your posts I have the following in mind:

                      Code:
                      xtset cbsa
                      xtreg depvar indepvar i.year i.inventor, fe vce(cluster cbsa)
                      
                      and
                      
                      xtset cbsa
                      xtpoisson depvar indepvar i.year i.inventor, fe vce(robust)
                      Does this make more sense?

                      Comment


                      • #12
                        I think so.

                        Comment


                        • #13
                          Agree with Clyde. Your data set isn’t a traditional panel because the cross-sectional — patent — appears only once. An inventor can have more than one but the outcome variable is not for the inventor. Your code appears to trick Stata into doing the right thing: cbsa effects, inventor effects, time effects and clustering at the cbsa level.

                          Comment


                          • #14
                            Originally posted by Jeff Wooldridge View Post
                            Your data set isn’t a traditional panel because the cross-sectional — patent — appears only once.
                            Indeed, that is what I was worried about. Anyway, I'll try this code and also check my supervisor's point of view.

                            Thanks to both of you for your help and feedback.


                            Comment


                            • #15
                              Originally posted by Jeff Wooldridge View Post
                              But you should use the vce(robust) option. In the linear case, use the vce(cluster cbsa) option.
                              I have a question regarding this vce option. Why shouldn't I use a vce(cluster cbsa)? I ran both regresssion and indeed found different outcomes, but I don't understand why.

                              1)
                              Code:
                              nbreg teamsize internetdummy invt_network_size i.cbsacode i.appyear, vce(robust)
                              
                              Negative binomial regression                    Number of obs     =    462,187
                                                                              Wald chi2(497)    =          .
                              Dispersion           = mean                     Prob > chi2       =          .
                              Log pseudolikelihood = -851639.86               Pseudo R2         =     0.0225
                              
                              -----------------------------------------------------------------------------------
                                                |               Robust
                                       teamsize |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                              ------------------+----------------------------------------------------------------
                                  internetdummy |  -.0138526   .0022496    -6.16   0.000    -.0182618   -.0094434
                              invt_network_size |   .0094635   .0001072    88.32   0.000     .0092535    .0096735
                                                |
                              and 2)
                              Code:
                              nbreg teamsize internetdummy invt_network_size i.cbsacode i.appyear, vce(cluster cbsacode)
                              
                              Negative binomial regression                    Number of obs     =    462,187
                                                                              Wald chi2(6)      =          .
                              Dispersion           = mean                     Prob > chi2       =          .
                              Log pseudolikelihood = -851639.86               Pseudo R2         =     0.0225
                              
                                                                (Std. Err. adjusted for 495 clusters in cbsacode)
                              -----------------------------------------------------------------------------------
                                                |               Robust
                                       teamsize |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                              ------------------+----------------------------------------------------------------
                                  internetdummy |  -.0138526   .0092945    -1.49   0.136    -.0320695    .0043643
                              invt_network_size |   .0094635   .0006227    15.20   0.000      .008243     .010684
                              There is a big difference in significance for 'internetdummy' variable.
                              Which should be the one to go with?

                              Comment

                              Working...
                              X