Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preparing Panel Data with repeated time values within panel (xtset)

    Hey everyone,
    I am working with survey data (PSID) and am currently facing some difficulties regarding my data structure. My data looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float obs_no double year str8 state float(state_frauds_ratio_2y log_Age log_FamilyIncome)
     2 2001 "TN"          0   3.89182 10.837225
     2 2003 "TN"          0 3.9318256 10.389487
     2 2005 "TN"          0  3.970292   10.0622
     2 2007 "TN"          0 4.0073333 11.402664
     2 2009 "TN"  .05555556 4.0430512  10.12503
     2 2011 "TN"          0 4.0775375  10.60698
     3 2001 "TN"          0 3.8501475 10.819778
     3 2003 "TN"          0   3.89182  10.82973
     3 2005 "TN"          0 3.9318256 11.103453
     3 2007 "TN"          0  3.970292 12.527431
     3 2009 "TN"  .05555556 4.0073333 11.182252
     3 2011 "TN"          0  4.060443   11.0021
     3 2013 "TN"          0 4.0775375 12.043553
     4 2003 "KS"          0 3.8066626  11.45105
     4 2005 "KS"          0 3.8501475  9.913438
     4 2007 "TN"          0   3.89182  9.152076
     4 2009 "TN"  .05555556 3.9318256 10.871535
     4 2011 "TN"          0  3.970292  9.821627
     4 2013 "TN"          0 4.0073333 10.916542
     4 2015 "TN"          0 4.0430512 11.013765
     4 2017 "TN" .019607844 4.0775375  9.287301
     4 2019 "TN"          0 4.1108737  9.308193
     5 2001 "TN"          0  3.713572 10.926227
     5 2005 "TN"          0 3.8066626 11.112448
     5 2017 "TN" .019607844 4.0430512 10.933107
     5 2019 "TN"          0 4.0775375 11.018137
     6 2001 "TN"          0  3.583519 10.427032
     6 2003 "TN"          0  3.637586  9.929837
     6 2005 "TN"          0 3.6888795 10.529426
     6 2009 "TN"  .05555556   3.78419 10.998393
     6 2011 "TN"          0 3.8286414 10.657353
     7 2015 "TN"          0  3.713572 10.716283
     7 2017 "TN" .019607844    3.7612  8.217439
     7 2019 "TN"          0 3.8066626  9.146335
     8 2001 "TN"          0 3.2580965 10.203592
     8 2003 "TN"          0 3.3322046  10.23996
     8 2005 "TN"          0 3.4011974  10.25766
     8 2007 "TN"          0  3.465736 10.463103
     8 2009 "TN"  .05555556 3.5263605  10.37349
     9 2001 "KS"          0  3.218876  10.04325
     9 2003 "KS"          0  3.295837  10.12663
     9 2005 "KS"          0  3.367296   9.87817
     9 2009 "KS"          0  3.496508  9.305651
     9 2013 "KS"          0  3.610918  8.853665
     9 2015 "KS"          0 3.6635616    10.859
     9 2017 "KS"          0   3.73767 11.400283
     9 2019 "KS"          0    3.7612 9.2103405
    10 2001 "TN"          0  3.178054  9.903487
    10 2005 "TN"          0 3.3322046 10.493494
    10 2009 "TN"  .05555556  3.465736 11.014555
    11 2005 "TN"          0 3.2580965 11.758222
    12 2009 "TN"  .05555556 3.0910425 10.351374
    12 2011 "TN"          0  3.178054  9.506735
    12 2013 "TN"          0  3.218876   8.14613
    13 2017 "TN" .019607844  3.295837 10.064755
    13 2019 "TN"          0  3.367296 10.341743
    14 2017 "KS"          0  3.295837 10.165852
    14 2019 "KS"          0 3.3322046 10.395926
    15 2011 "TN"          0  2.890372 8.9981365
    15 2017 "TN" .019607844  3.218876 10.524252
    17 2001 "TN"          0  3.713572 10.596635
    17 2003 "TN"          0    3.7612 10.700994
    18 2007 "TN"          0 3.0910425  8.306225
    18 2009 "TN"  .05555556  3.178054  9.605755
    18 2013 "TN"          0 3.3322046 10.699642
    19 2001 "TN"          0 4.0430512  10.97164
    19 2003 "TN"          0 4.0775375 11.204633
    20 2001 "KS"          0    3.7612 11.429543
    21 2001 "TN"          0 3.2580965  10.04325
    21 2003 "TN"          0 3.3322046  9.729135
    22 2001 "TN"          0  3.433987  10.73642
    22 2003 "TN"          0  3.496508 11.194193
    23 2001 "TN"          0 3.3322046 10.539005
    23 2003 "TN"          0 3.4011974 10.610883
    23 2005 "TN"          0  3.465736 10.609945
    23 2007 "TN"          0 3.5263605 10.611376
    23 2009 "TN"  .05555556  3.610918 10.678168
    23 2011 "TN"          0  3.637586  10.74456
    23 2013 "TN"          0  3.713572 10.793188
    24 2003 "TN"          0 3.2580965 10.819778
    24 2005 "TN"          0 3.3322046  9.615806
    24 2007 "TN"          0 3.4011974  10.28875
    24 2009 "TN"  .05555556  3.465736 11.407565
    24 2011 "TN"          0 3.5263605  10.27505
    24 2013 "TN"          0  3.583519 11.289782
    25 2009 "TN"  .05555556  3.496508  11.74245
    25 2013 "TN"          0  3.610918 11.173248
    25 2015 "TN"          0  3.637586 10.463103
    26 2013 "TN"          0 3.0445225  10.08556
    27 2011 "TN"          0  3.367296 10.060492
    27 2013 "TN"          0  3.465736  10.25766
    27 2015 "TN"          0 3.5263605 10.359646
    28 2011 "TN"          0  3.367296 10.935354
    29 2015 "TN"          0  3.496508 11.272547
    29 2017 "TN" .019607844  3.555348  11.30392
    29 2019 "TN"          0  3.610918 11.187196
    30 2015 "TN"          0 4.1108737 10.267992
    31 2017 "TN" .019607844 3.5263605 11.082143
    31 2019 "TN"          0  3.555348  11.08368
    32 2019 "TN"          0 3.2580965 11.082143
    end
    format %ty year
    There are many more variables but I think this is not that important for now. For a following analysis I need to create the first differences of some variables.

    I tried:
    gen D_log_age = D.log_Age

    I get the answer: "time variable not set".

    Subsequently I wanted to xtset my data panel but this yields the error message:
    xtset obs_no year

    "repeated time values within panel"

    I see why this is a problem, but struggle to adjust the dataset in a way that I do not lose any observations. Is this even possible or is my data structure in such a way that the desired steps are not possible?

    I am sorry if this might be trivial to you, but I am kind of stuck and would appreciate help.

    Best regards



  • #2
    Ronny:
    time-series iperators want -tsset- beforehand.
    If, as it seems, you're planning to ude time-series operator, switching from -tsset- to -xtset- is not beneficial (as they do the very same job).
    The only issue, if feasible, is to obtain a more detailed -timevar-. (say, month + year).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3


      See https://www.stata.com/support/faqs/d...d-time-values/ for a discussion of what to do here. In short,

      Code:
      xtset obs_no year 
      works fine for your data example, so the problem lies elsewhere. The FAQ linked above arose out of a thread in which the OP's initial reaction was that the report isn't true, or should not be, but it was correct.

      Code:
      duplicates report obs_no year
      is a tool of choice. In my experience a dataset can often be almost fine, except for a few observations incorrectly entered. Or there is a bundle of observations copied in error from a spreadsheet in which identifier and/or year are missing, which are just junk and should be dropped.

      EDIT Another possibility is that the needed identifier is a composite of state and obs_no.
      Last edited by Nick Cox; 15 Feb 2023, 02:20.

      Comment


      • #4
        Thanks for your answers, I found the issue. Some duplicates existed as a result from previous merging, but they didn't carry any relevant observations so I could drop them.

        Comment


        • #5
          Thanks for the closure!

          Comment

          Working...
          X