Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing specific observations based on other observations within a group

    Good afternoon,

    I'm currently working on my dissertation using the dataset from Brazil's municipal elections from 2000 to 2016. The data is like this:

    Code:
     . list codmun_TSE  nome_candidato NUM_TURNO votos in 305/340, table
    
         +---------------------------------------------------------------------+
         | codmun~E                          nome_candidato   NUM_TU~O   votos |
         |---------------------------------------------------------------------|
    305. |    58335       JOSE CAMILO ZITO DOS SANTOS FILHO          1    2742 |
    306. |    58335             AUREO LIDIO MOREIRA RIBEIRO          1    8927 |
    307. |    58335                  JORGE MOREIRA THEODORO          1   14137 |
    308. |    58335            GUTEMBERG CARDOSO DOS SANTOS          1     313 |
    309. |    58335               FLORINDA MOREIRA LOMBARDI          1     123 |
         |---------------------------------------------------------------------|
    310. |    58335                  JORGE MOREIRA THEODORO          1    9379 |
    311. |    58335             WASHINGTON REIS DE OLIVEIRA          2   13435 |
    312. |    58335             WASHINGTON REIS DE OLIVEIRA          2   30794 |
    313. |    58335             WASHINGTON REIS DE OLIVEIRA          2    9437 |
    314. |    58335             WASHINGTON REIS DE OLIVEIRA          2   35184 |
         |---------------------------------------------------------------------|
    315. |    58335                  JORGE MOREIRA THEODORO          2   10240 |
    316. |    58335                  JORGE MOREIRA THEODORO          2   21744 |
    317. |    58335                  JORGE MOREIRA THEODORO          2   30629 |
    318. |    58335                  JORGE MOREIRA THEODORO          2    9059 |
    319. |    58335             WASHINGTON REIS DE OLIVEIRA          2   25881 |
         |---------------------------------------------------------------------|
    320. |    58335             WASHINGTON REIS DE OLIVEIRA          2   37061 |
    321. |    58335             WASHINGTON REIS DE OLIVEIRA          2   27949 |
    322. |    58335                  JORGE MOREIRA THEODORO          2   12535 |
    323. |    58335                  JORGE MOREIRA THEODORO          2   13461 |
    324. |    58335             WASHINGTON REIS DE OLIVEIRA          2   12403 |
         |---------------------------------------------------------------------|
    325. |    58335             WASHINGTON REIS DE OLIVEIRA          2   14570 |
    326. |    58335                  JORGE MOREIRA THEODORO          2   27187 |
    327. |    58335                  JORGE MOREIRA THEODORO          2   14277 |
    328. |    58335                  JORGE MOREIRA THEODORO          2   19770 |
    329. |    58335             WASHINGTON REIS DE OLIVEIRA          2   11086 |
         |---------------------------------------------------------------------|
    330. |    58335                  JORGE MOREIRA THEODORO          2   25328 |
    331. |    58343             MARCIONILIO BOTELHO MOREIRA          1    3155 |
    332. |    58343            GEAN MARCOS PEREIRA DA SILVA          1    2999 |
    333. |    58351           JOELMA ISABEL SANTANA ROMEIRO          1      81 |
    334. |    58351          MARCO AURELIO SA PINTO SALGADO          1       0 |
         |---------------------------------------------------------------------|
    335. |    58351               RICARDO DE LIMA BALTHAZAR          1     406 |
    336. |    58351      JAULDO DE SOUZA BALTHAZAR FERREIRA          1    4874 |
    337. |    58351   CASSIA APARECIDA DIAS REZENDE PEREIRA          1     445 |
    338. |    58360                   EDSON ALVES DE MARINS          1     221 |
    339. |    58360         VALBER LUIZ MARCELO DE CARVALHO          1   11230 |
         |---------------------------------------------------------------------|
    340. |    58360               RODRIGO DA COSTA MEDEIROS          1    8264 |
         +---------------------------------------------------------------------+
    
    .

    where codmun_TSE = each city's individual ID number, nome_candidato = name of the mayor candidate, NUM_TURNO = number of the specific round of the observation, so that if it's the first round it's 1, and if it's the second round it's 2 (maximum of 2 rounds), votos = number of votes in the voting zone.

    The names are being repeated because there are multiple voting zones per city, I'll gather them all in one afterwards.

    I want to get rid of the first round observations when there's a second round, I'll explain: Here in Brazil, there are 3 possible scenarios in a mayor election: If a city has less than 250000 inhabitants, the candidate with most votes win (no second round can happen), if a city has more than 250000 inhabitants then there can be a second round or not (the specifics are irrelevant to the question i'm posing). (I don't have the population of each city available on my dataset). As you can see, some of the examples above are cities that didn't have second rounds.

    I want to do this: When a codmun_TSE (city) has a second round, I want to eliminate all of the first round observations within that specific codmun_TSE, which means that, for the example listed above, I would like to delete all the observations matching NUM_TURNO = 1 and codmun_TSE = 58335 , since we can already see that at some point there is a second round in that city.

    I've been trying to do some loops for specific observations, but to no avail, I also tried using bysort but I can't seem to find out how to do what I want.

    edit: As suggested below me, here's a sample of the data using dataex:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 codmun_TSE str56 nome_candidato byte NUM_TURNO long votos
    "58335" "JOSE CAMILO ZITO DOS SANTOS FILHO"     1  2742
    "58335" "AUREO LIDIO MOREIRA RIBEIRO"           1  8927
    "58335" "JORGE MOREIRA THEODORO"                1 14137
    "58335" "GUTEMBERG CARDOSO DOS SANTOS"          1   313
    "58335" "FLORINDA MOREIRA LOMBARDI"             1   123
    "58335" "JORGE MOREIRA THEODORO"                1  9379
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 13435
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 30794
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2  9437
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 35184
    "58335" "JORGE MOREIRA THEODORO"                2 10240
    "58335" "JORGE MOREIRA THEODORO"                2 21744
    "58335" "JORGE MOREIRA THEODORO"                2 30629
    "58335" "JORGE MOREIRA THEODORO"                2  9059
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 25881
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 37061
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 27949
    "58335" "JORGE MOREIRA THEODORO"                2 12535
    "58335" "JORGE MOREIRA THEODORO"                2 13461
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 12403
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 14570
    "58335" "JORGE MOREIRA THEODORO"                2 27187
    "58335" "JORGE MOREIRA THEODORO"                2 14277
    "58335" "JORGE MOREIRA THEODORO"                2 19770
    "58335" "WASHINGTON REIS DE OLIVEIRA"           2 11086
    "58335" "JORGE MOREIRA THEODORO"                2 25328
    "58343" "MARCIONILIO BOTELHO MOREIRA"           1  3155
    "58343" "GEAN MARCOS PEREIRA DA SILVA"          1  2999
    "58351" "JOELMA ISABEL SANTANA ROMEIRO"         1    81
    "58351" "MARCO AURELIO SA PINTO SALGADO"        1     0
    "58351" "RICARDO DE LIMA BALTHAZAR"             1   406
    "58351" "JAULDO DE SOUZA BALTHAZAR FERREIRA"    1  4874
    "58351" "CASSIA APARECIDA DIAS REZENDE PEREIRA" 1   445
    "58360" "EDSON ALVES DE MARINS"                 1   221
    "58360" "VALBER LUIZ MARCELO DE CARVALHO"       1 11230
    "58360" "RODRIGO DA COSTA MEDEIROS"             1  8264
    end


    Many thanks!
    Last edited by Daniel Earp; 07 Jun 2018, 13:04.

  • #2
    You may get an answer more quickly, or a more complete answer, if you post your sample data using the dataex command, as you did in this earlier post.

    Comment


    • #3
      Originally posted by William Lisowski View Post
      You may get an answer more quickly, or a more complete answer, if you post your sample data using the dataex command, as you did in this earlier post.
      Indeed, my friend! Thank you for reminding me of that, I had actually forgotten about dataex, sorry about that, I edited my post.

      Comment


      • #4
        Thank you for the sample data. Starting with that data in memory the following demonstrates using bysort to solve your problem as I understand it. In this example, you should understand that for each group of observations with the same codmun_TSE, NUM_TURNO[_N] will be the final observation of NUM_TURNO within that group, and if there is a 2 to be found in the group, the final observation of the group (which is sorted further by NUM_TURNO) will contain a 2.
        Code:
        . tab codmun_TSE NUM_TURNO
        
                   |       NUM_TURNO
        codmun_TSE |         1          2 |     Total
        -----------+----------------------+----------
             58335 |         6         20 |        26 
             58343 |         2          0 |         2 
             58351 |         5          0 |         5 
             58360 |         3          0 |         3 
        -----------+----------------------+----------
             Total |        16         20 |        36 
        
        
        . bysort codmun_TSE (NUM_TURNO): drop if NUM_TURNO==1 & NUM_TURNO[_N]==2
        (6 observations deleted)
        
        . tab codmun_TSE NUM_TURNO
        
                   |       NUM_TURNO
        codmun_TSE |         1          2 |     Total
        -----------+----------------------+----------
             58335 |         0         20 |        20 
             58343 |         2          0 |         2 
             58351 |         5          0 |         5 
             58360 |         3          0 |         3 
        -----------+----------------------+----------
             Total |        10         20 |        30

        Comment


        • #5
          Originally posted by William Lisowski View Post
          Thank you for the sample data. Starting with that data in memory the following demonstrates using bysort to solve your problem as I understand it. In this example, you should understand that for each group of observations with the same codmun_TSE, NUM_TURNO[_N] will be the final observation of NUM_TURNO within that group, and if there is a 2 to be found in the group, the final observation of the group (which is sorted further by NUM_TURNO) will contain a 2.
          Code:
          . tab codmun_TSE NUM_TURNO
          
          | NUM_TURNO
          codmun_TSE | 1 2 | Total
          -----------+----------------------+----------
          58335 | 6 20 | 26
          58343 | 2 0 | 2
          58351 | 5 0 | 5
          58360 | 3 0 | 3
          -----------+----------------------+----------
          Total | 16 20 | 36
          
          
          . bysort codmun_TSE (NUM_TURNO): drop if NUM_TURNO==1 & NUM_TURNO[_N]==2
          (6 observations deleted)
          
          . tab codmun_TSE NUM_TURNO
          
          | NUM_TURNO
          codmun_TSE | 1 2 | Total
          -----------+----------------------+----------
          58335 | 0 20 | 20
          58343 | 2 0 | 2
          58351 | 5 0 | 5
          58360 | 3 0 | 3
          -----------+----------------------+----------
          Total | 10 20 | 30
          My friend, thank you so much, I'm both impressed and embarassed, I've been trying to figure this out since yesterday, spent hours on it, and you did it with a single line of command, I do have much to learn, thank you!

          Comment


          • #6
            No need to be embarrassed. Sometimes the purpose of Statalist is to provide a fresh set of eyes to see an overlooked - or not so obvious - exit to the maze your mind is trapped in. And certainly no need to be impressed - I learned that approach from other Statalist members when I was new to Stata.

            Comment


            • #7
              Indeed! Thanks again!

              Comment

              Working...
              X