Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations in panel dataset

    Hello guys and girls!

    I have a panel dataset whose panelvar is id and timevar is ano. For each id and for each ano I observe a value in the variable pop.
    I want to drop all id's whose value of pop in ano 2012 is lower than the value of pop in ano 2012 of the id 23.
    How to do that, please?

    Thanks in advance.

  • #2
    Code:
    isid id ano, sort
    by id (ano): egen pop2012 = max(cond(ano == 2012, pop, .))
    summarize pop2012 if id == 23
    by id: drop if pop2012 < r(mean)
    You do not show example data and metadata, so this code is based on guesses about the organization and structure of your data. If these guesses are wrong, the code will not work. In that event, please post back and use the -dataex- command to provide this needed information. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Except for the first row of it, your code worked perfectly. Thank you, Clyde.
      Next time I will present a sample of my dataset.

      Comment


      • #4
        Red flag!!! If the first line didn't work, then you almost surely have wrong results. You said you have panel data. The failure of the first line proves that you don't.

        Somewhere in your data set you have, inappropriately for panel data, multiple observations with the same value of id and ano. That shouldn't happen. In particular, you may have some id's with two different observations for ano = 2012. And if those two observations have different values of pop, then the value of pop2012 for those id's was chosen at random from among the conflicting values. Since they can't all be right, there is a good chance the wrong one was chosen.

        So, you have to fix your data set. The first step is to identify the surplus observations.
        Code:
        duplicates tag id ano, gen(flag)
        browse if flag
        will show them to you.

        The next step is to figure out how those surplus observations got there. If they disagree on the values of any other variables, then you have an inconsistent data set. You have to figure out which of the observations, if any, is correct. If none are correct, you may need to figure out a way to piece them together into a single correct observation. In any case, you need to review the data management that created this data set and fix the errors that led to its creation. In the course of doing that, you may uncover other errors made along the way. Fix those, too, while you are at it.

        If the surplus observations are exact duplicates on all variables, then the situation is a bit less urgent. You can easily eliminate them by using the -duplicates drop- command. (N.B. Do NOT add the -force- option to that command.) That will allow you to safely proceed with the rest of the code. BUT, you should still check the data management that created this data set because the presence of purely duplicate observations often indicates errors in the code that created it. And where one error has been found, others my lurk as yet undetected. Better to find and fix them now, rather than later on when it may be much harder to do so.
        Last edited by Clyde Schechter; 28 Jan 2023, 13:24.

        Comment


        • #5
          Well, I'm pretty sure I have a panel data. Here is a sample of my data:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte id int ano long pop
          1 2012  758786
          1 2013  776463
          1 2014  790101
          1 2015  803513
          1 2016  816687
          1 2017  829619
          1 2018  869265
          1 2019  881935
          1 2020  894470
          1 2021  906876
          2 2012 3165472
          2 2013 3300935
          2 2014 3321730
          2 2015 3340932
          2 2016 3358963
          end
          By using "tab id", "tab ano" and "tab id ano", I can see that there is nothing duplicated. There are exactly 10 observations in ano for each of the 27 units in id. Therefore my dataset has 270 rows.

          When I use the command "isid id ano, sort", I receive the message "unexpected end of do-file / command isid not defined by isid.ado". Also, when I use the command "duplicates tag id ano, gen(flag)", I receive the same error message replacing isid with duplicates.

          Comment


          • #6
            I receive the message "unexpected end of do-file / command isid not defined by isid.ado".
            Oh! That's not the error message I assumed you were getting. The usual error message after -isid- is a statement that the variables in the command do not uniquely identify observations. So it does seem that you don't have that problem. Your problem, rather, is that your ado files are somehow corrupted. You will probably start encountering similar message with other perfectly valid commands.

            You should run -update all, force- to replace your existing Stata installation with correct, up-to-date versions. Then verify that the -isid- and -duplicate- commands work. Based on what you say about your data in #3, the -isid id ano, sort- command should produce no output at all. And the should produce a new variable, flag, with only zero values.

            If -update-ing your installation does not solve the problem, re-boot your computer, and then uninstall Stata and re-install it, and then run -update, all- again. If that doesn't work, you may have a problem on your hard drive. You should run whatever diagnostics your OS provides for this. If all of this fails, contact Stata Technical Support.

            Comment


            • #7
              The way to contact StataCorp Technical Support is detailed at https://www.stata.com/support/faqs/t...-tech-support/

              Comment

              Working...
              X