Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data

    Hello everyone.
    I am trying to do a panel regression and I am having some error. To be honest, I'm new to all this. Here is how my data looks like (still incomplete):
    Click image for larger version

Name:	data.png
Views:	1
Size:	21.9 KB
ID:	1708897





    I have 3 regions, and the values of different variables of each region are recorded quarterly from 2020 to 2022., I followed this video and modified some values to ensure that my data is reflected properly. When I tried entering the command "tsset quarter", I got the message: repeated time values in sample.

    Does anyone have any suggestion? I saw this post however I got more confused.

    Thanks in advance.

    Update:
    created separate columns for quarter and year
    userd comand "tsset quarter year" but the same error was displayed: repeated time values within panel

    Is it better to make separate tables for each region? The "Region" variable is the reason why the "quarter" variable has repeated values.
    Last edited by nester alcular; 07 Apr 2023, 13:38.

  • #2
    Stata is telling you that, notwithstanding what you think your data is, there is some region-quarter combination that appears more than once. Fiddling with the variables is not going to help. You need to find the surplus observation(s) and figure out, a) how they got there, and b) how to resolve them.

    To find them, run
    Code:
    duplicates tag region quarter, gen(flag)
    browse if flag
    and you will see them. What to do depends on what those observations are. If the surplus observations are exact duplicates of each other in all variables, then you could just eliminate all but one copy and have a valid panel data set. You shouldn't actually do just that, because the presence of the surplus observations usually indicates that something went wrong in the data management that built your data set, so that data management should be carefully reviewed. Where one error is found, others often lurk nearby. So you should really take advantage of the opportunity to really clean up the code that built the data set and eliminate all the bugs.

    If, however, the surplus observations disagree on some variables other than region and quarter, then you have a deeper problem: you have contradictory information and must, in addition to fixing the data management code that created this situation, figure out which, if any, of the conflicting observations is correct and how to resolve their contradictions, ending up with a single observation for each region-quarter combination.

    It is also possible when you see the data that you will recognize that the surplus observations, even though perhaps they disagree in some respects, are all correct. This could arise, for example, if each observation describes a different sub-region within the region. In that case the problem is that you have incorrectly conceptualized region as the panel variable when it really should be sub-region or the combination of region and sub-region. A similar thing could happen if the surplus observations represent observations of the same region at different months or weeks within the quarter.

    Added: In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
    Last edited by Clyde Schechter; 07 Apr 2023, 13:44.

    Comment


    • #3
      Hi Clyde, thanks for helping out. It seems that upon copying from excel, the regions were shuffled. I managed to correct it and run the code that you provided and finally got 0 flags

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float quarter str9 region long totalcovid float avg_covid byte(viirs_mean emp flag)
      240 "NCR"            8462    136.48 . . 0
      241 "NCR"          795136   8737.76 . . 0
      242 "NCR"         8104122  88088.28 . . 0
      243 "NCR"        17454659 189724.55 . . 0
      244 "NCR"        21283323 236481.38 . . 0
      245 "NCR"        42401382 465949.25 . . 0
      246 "NCR"        57326750  623116.9 . . 0
      247 "NCR"        77951719  847301.3 . . 0
      248 "NCR"       100497751 1116641.6 . . 0
      249 "NCR"       106684413 1172356.3 . . 0
      250 "NCR"       112403284 1221774.9 . . 0
      251 "NCR"       118817951 1291499.5 . . 0
      240 "Region 7"         31       .51 . . 0
      241 "Region 7"       8105      88.1 . . 0
      242 "Region 7"      13807    151.73 . . 0
      243 "Region 7"       4485     48.75 . . 0
      244 "Region 7"      25272     280.8 . . 0
      245 "Region 7"      24424     268.4 . . 0
      246 "Region 7"      64890    705.33 . . 0
      247 "Region 7"      13222    143.72 . . 0
      248 "Region 7"      38192    424.36 . . 0
      249 "Region 7"       1163     12.78 . . 0
      250 "Region 7"       9786    106.37 . . 0
      251 "Region 7"       5710     62.07 . . 0
      240 "Region 11"       143      2.31 . . 0
      241 "Region 11"     20557     20557 . . 0
      242 "Region 11"    158344   1721.13 . . 0
      243 "Region 11"    699402    7602.2 . . 0
      244 "Region 11"   1677451  18638.34 . . 0
      245 "Region 11"   2588376  28443.69 . . 0
      246 "Region 11"   5975431  64950.34 . . 0
      247 "Region 11"   9588206 104219.63 . . 0
      248 "Region 11"  11672371    129693 . . 0
      249 "Region 11"  12770744 140337.84 . . 0
      250 "Region 11"  13246544 143984.17 . . 0
      251 "Region 11"  14011138 152294.98 . . 0
      end
      format %tq quarter

      I apologize about the way the result from dataex was posted. I entered the code again and got the same error:
      tsset quarter
      repeated time values in sample

      It's late now, I'll get back to this post after resting. I look forward to your response Clyde. Will also review your notes later, in case I missed something... or everything.

      Comment


      • #4
        Before going to bed, I did some modification. I replaced the Regions with numeric values and used the following commands to define the variable:

        label variable regions "Metropolitan Areas"
        label define regions 1 NCR 2 Region_7 3 Region_11


        Then, I realized I should use xtset regions quarter instead of tsset quarter.

        Let me know if I did the right thing. Thanks so much for your help. I really appreciate it.

        Comment


        • #5
          Then, I realized I should use xtset regions quarter instead of tsset quarter.

          Let me know if I did the right thing.
          Yes, that's exactly right.

          Comment

          Working...
          X