Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unrecognized panel data

    Hello, I have the following panel data where pid= identification of individuals and time=date that they were in the labour market. The data comes from a Population Survey. The number of observations are 40.000.000 and the number of individuals 400.000 (aprox). The time goes from 1992/02 to 2016/12. I want to tell stata that I have a panel data in order to run a regression but when I run >>xtset pid time<< it says
    "repeated time values within panel
    r(451);"
    I plug the command >>duplicates list pid time<< And it gives me a huge list of duplicate variables, but I don´t understand why I have duplicated variables over time.
    Moreover, if I run the following, >>xtset pid<< it says "panel variable: pid (unbalanced)" so It seems fine, but I am not controlling for time...

    So my questions are, Why stata does not recognized my panel data? What am I doing wrong? What is the proper command?
    The idea is to run the following linear regression: temporary= Bo+B1*temporary(previous period)*sex+....

    Thank you in advanced for your help,

    Cristina.





    Click image for larger version

Name:	panel.png
Views:	1
Size:	272.4 KB
ID:	1453076

  • #2
    Your screenshot is hardly readable. Please visit FAQ Advice #12 which explains how to post example data. You need to show us examples in context of the duplicates for most of us to have any idea of why they arise. I say most of us because it's possible in principle that someone recognises which survey you're using.

    Note that exact duplicates on all variables are not a problem: they should just be dropped, but I'd be surprised if that was the issue here.

    Comment


    • #3
      The "pid" variable doesn't change over the three successive years. For panel data, it should be different for each year.

      Comment


      • #4
        You have already largely answered your question yourself:
        when I run >>xtset pid time<< it says "repeated time values within panel r(451);"
        I plug the command >>duplicates list pid time<< And it gives me a huge list of duplicate variables, but I don´t understand why I have duplicated variables over time.
        So you have checked to see if there are duplicates in terms of pid & time, and Stata reports back that there are. You will not be able to xtset on pid & time until you fix this.
        Your task now is to investigate and figure out how to fix this. Should duplicate values just be dropped, or should mean values be taken? There is not much Stata can do to advice you on this. You can make it easier to investigate by doing:
        Code:
        duplicates report pid time
        to get a general overview of the number of duplicates.

        Or:
        Code:
        duplicates tag pid time, gen(tag)
        sort pid time
        edit if tag!=0
        to have a better look at what your duplicates look like

        Last edited by Jorrit Gosens; 12 Jul 2018, 07:42.

        Comment


        • #5
          Thank you so much all of you. Of course I had duplicates, I do not why but there were. Now everything is fixed.

          Thanks again.

          Comment

          Working...
          X