Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with collapse command

    Hello everyone,
    I have a dataset with Italian data on electoral outcomes. My dataset contains information about all the election years (from 1987 to 2013), provinces, municipalities, eligible voters, actual voters, parties, and votes per party. My goal is to have the dataset at the province level with the votes for the political parties in the election years, together with all the eligible and actual voters.
    I, therefore, used this command: collapse (sum) votes eligibles voters, by (province party year).
    It works but I obtain different numbers of voters for the same province in the same year...which is wrong.
    Where do you think I am doing wrong?

    Thank you in advance for your help!!
    Last edited by Margherita Lazzeri; 18 Mar 2023, 17:19.

  • #2
    It works but I obtain different numbers of voters for the same province in the same year...which is wrong.
    The sum of the voters are also partitioned into unique province-party-year combinations, not just province-year, so they will not be the same.

    Try this:

    Code:
    bysort province year: egen total_voter = sum(voters)
    after the collapse command and see if that's what you expect.

    Comment


    • #3
      Thank you very much for your answer. Unluckily thought it does not help me to overcome the problem....I have bigger numbers that are different for the same province in the same year.

      Comment


      • #4
        Originally posted by Margherita Lazzeri View Post
        Thank you very much for your answer. Unluckily thought it does not help me to overcome the problem....I have bigger numbers that are different for the same province in the same year.
        In that case it'd be desirable to post some sample data (of a couple years, a couple parties, and a couple provinces) so that we can see how the problem can be replicated (and then we can start solving it). Please see FAQ (https://www.statalist.org/forums/help) section 12 for how to ask a question with code-based data using command dataex. Notice that the data don't have to be real, you can obfuscate the numbers, but we need a common data set to work on so that we can have a clear understanding of the format and what do you mean by "wrong".

        In addition, check how missing is captured in your data, if there are missing that has not been coded as Stata-recognizable missing, they can also get into the calculation as well.
        Last edited by Ken Chui; 18 Mar 2023, 19:34.

        Comment


        • #5
          Screenshot 2023-03-19 at 11.56.12.png
          Thank you for helping me.
          I dont know if you can see from here but for example i have provinces, municipality, year, eligible voters, male eligible voters, actual voters, male actual voters, invalid votes, name of the party and votes for every party in every municipality.
          Then i created the three last variable:
          bysort provincia lista year : egen var1_sum = sum(elettori)
          bysort provincia lista year : egen var2_sum = sum(votanti)
          bysort provincia lista year : egen var3_sum = sum(voti_lista)

          and as you see, for some parts it works, for other the sum is not working (for example: municipality: Terranova di sibari, castelnovo bariano.

          Why do you think it is the case?

          Thanks!

          Comment


          • #6
            with the command dataex, this is what i get:
            Screenshot 2023-03-19 at 11.58.33.png

            Comment


            • #7
              Please read the FAQ on how to post a data example using -dataex-. Screenshots cannot be imported into Stata. You need to copy contents that Stata returns after -dataex- as text into your post. -help dataex- also explain this.

              Comment


              • #8
                You need to copy and paste the dataex output directly into the forum. Screenshots are not as helpful as you think (see FAQ Advice #12). Your data is at the municipal-year level, so you should tag a municipality-year and only include this in the calculation of your first two wanted variables.The political party can be included in the third wanted variable. Also, use the -total()- function of egen and not the -sum()- function, even though both are equivalent. This avoids confusion that may arise from the -sum()- function of generate that generates a running sum.

                Code:
                egen tag=tag(provincia comune year)
                bys provincia year: egen wanted1= total(cond(tag, elettori, .))
                bys provincia year: egen wanted2= total(cond(tag, votanti, .))
                bys provincia lista year: egen wanted3= total(voti_lista)

                Comment

                Working...
                X