Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to set up panel data xtset

    Hi, I am new to STATA and would need some help. I have a large set of data that looks like the following, with multiple firms per year. How can I xtset? When I tried this " xtset ID Year, yearly", there is an error "repeated time values within panel". So I decided to go ahead with "xtset ID" instead. But I am not sure whether am I doing it right, I guess not because when I run a regression"xreg, fe" I have 6000 observations missing, furthermore, there are two variabes that are omitted due to collinearity. Thanks in advance for your help.
    ID Year Name
    1 2000 ABC
    1 2001 ABC
    1 2002 ABC
    1 2003 ABC
    2 2002 CDE
    2 2003 CDE
    2 2004 CDE

  • #2
    You have multiple observations for some combinations of Year and company ID, and Stata wants one at most. See further explanation and remedy here, note in particular the suggestions on 'duplicates'. Tag these, and figure ou what you want to do with these.

    Comment


    • #3
      Hi Jorrit,

      The problem is those are not duplicates, but they are observations that are required.

      Comment


      • #4
        What Stata is telling you is that for some ID's, there are multiple rows with the same year, and Stata cannot decide which to use.
        These observations are duplicates, at least in the sense that ID and Year are the same in multiple rows. Perhaps you mean to say that they are 'not duplicates', because values in other variables are not the same? If that is the case, you are going to have to think of a way to resolve this, because Stata cannot decide which of the two rows to use.

        Did you try to use below code, and did it report observations with 2 or more copies?
        Code:
        duplicates report ID Year
        Last edited by Jorrit Gosens; 08 Jul 2015, 08:24.

        Comment


        • #5
          It depends a lot on what you want to do with the within-panel analysis. If you don't care about the order of the observations (i.e., the years), then you can xtset ID [without the time indicator] and then use xtreg. This gives you the panel effects but ignores time completely. While if you xtset ID year, xtreg will not accept duplicate years, I don't see that this actually modifies the estimates. I think the actual estimate from xtreg will be the same with and without the time variable, assuming you end up using the same observations and don't want to mess with more sophisticated issues (like serial correlation).

          For fixed effects, if you just want different intercepts for each panel, then you can also use regress with i.ID or areg.

          This does not preclude you putting in dummy variables for years in any of the analyses.

          Alternatively, you could generate an artificial time variable by enumerating the observations within panels and using that enumeration as the "time" variable in xtreg. It would like something like:

          sort ID Year
          by ID: g seq=_n
          xtset ID seq
          xreg ...

          Given more than one observation per year, I'm not sure how you should handle time. The discussion above assumes you ignore time or just have dummy variables for year.

          With more than one observation per year, I don't think the built-in procedures would properly estimate serial correlation. You may have some very curious error structures (since some observations differ by years and others are in the same year).

          Comment

          Working...
          X