Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using tsset for time-series - any alternative?

    Dear Statalist,

    I am using three years of household data and I want to declare my dataset to be a time series. As I have more than one cell for (almost) every household ID, I cannot apply tsset. Accordingly, I have "repeated time values within panel" or in other words duplicates. However, I cannot delete these duplicates because my ID is for the whole household. Is there an alternative command I could use?

    Thanks

  • #2
    Well, what you are saying here is that you do not actually have panel data at the household level. The panel data is, it appears, at the person level. So you should find (or create) the person identifier and use that as the panel identifier in your -xtset- (or -tsset-) command.

    Comment


    • #3
      That's got to be xtset; tsset allows you to specify just a time variable, but not just an identifier.

      Comment


      • #4
        I see. Thank you, both! Does that mean that I can only specify both (ID and time) when I create a unique ID for each individual of my cross-sectional data? Can you recommend me how I can create unique IDs for 50,000 observations?

        Thank you in advance

        Comment


        • #5
          If you're say that

          Code:
          xtset id time
          or

          Code:
          tsset id time 
          fails, then it's hard to see that a different identifier would help.

          If still puzzled, then please read and act on FAQ Advice #12 and show us example data and code to make your question clearer.

          Comment


          • #6
            here are some variables to understand how my data is constructed:

            [CODE]
            Code:
            clear
            ID ID_Per Gender Age B1 B2 B3 year
             471 1 1 39 3  9 4 2009
             471 2 2 39 3  6 3 2009
             471 3 2 19 1 13 5 2009
             471 4 1 16 3  9 4 2009
             471 5 1 14 1  8 3 2009
             471 6 2 12 1  5 1 2009
             472 1 1 40 2  9 4 2009
             472 2 2 38 3 12 5 2009
             472 3 1 17 1 11 4 2009
             472 4 1 16 1 10 4 2009
             472 5 2 11 1  5 1 2009
             473 1 1 61 3  6 3 2009
             473 2 2 48 2  7 3 2009
             473 3 2 18 3 12 5 2009
             473 4 1 16 2  8 3 2009
             473 5 2 15 1  9 4 2009
             473 6 1 14 1  8 3 2009
             473 7 2 12 1  6 3 2009
             473 8 1 10 1  4 1 2009
             511 1 2 72 4  0 1 2010
             511 2 2 41 2  7 3 2010
             517 1 1 43 3  9 4 2010
             517 2 2 33 2  8 3 2010
             517 3 2 16 1 10 4 2010
             517 4 2 15 1  9 4 2010
             517 5 1 13 1  7 3 2010
             518 1 1 38 3 12 5 2010
             518 2 2 36 2  8 3 2010
             518 3 1 16 1 10 4 2010
             518 4 1 15 1  9 4 2010
             519 1 1 35 2 11 4 2010
             519 2 2 34 2 11 4 2010
             519 3 2 10 1  4 1 2010
            1106 1 1 46 3 12 5 2011
            1106 3 1 19 1 13 5 2011
            1108 1 1 45 3 14 6 2011
            1108 2 2 42 2  6 3 2011
            1108 3 1 19 1 12 5 2011
            1108 4 2 16 1 10 4 2011
            1108 5 2 15 1  8 3 2011
            1110 1 1 74 3  9 4 2011
            1110 2 2 64 4  0 1 2011
            1110 3 2 43 3  9 4 2011
            1110 4 2 40 3 16 7 2011
            1110 5 2 36 2 11 4 2011
            1110 6 2 27 3 16 7 2011
            1110 7 1 24 1 14 5 2011
            1112 1 2 69 4  0 1 2011
            1114 1 1 28 2 11 4 2011
            1114 2 2 24 2  8 3 2011
            1116 2 2 41 2  7 3 2011
            end
            format %ty year
            (ID_Per = number of person of household ID, B1=school attendence, B2= years of schooling, B3= educ.attainment)


            Though I do not have a real panel data set, I want to declare the data to be a timeseries.
            Last edited by Dani Vasquez; 23 Nov 2017, 02:29.

            Comment


            • #7
              Thanks for the data example. You lost the input code in copying. That's easy brain surgery.

              Whatever this is, it's not a pure timeseries. You can force it to be panel data after e.g.


              Code:
              egen ID_Joint = group(ID ID_Per), label
              but that loses the main structure and seems artificial. I would

              Code:
              xtset ID
              and use year as a covariate.

              Comment

              Working...
              X