Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! Recoding a variable of aggregate percentages

    Hi everyone!
    I am trying to work with data from Eurobarometer on the last European Elections, but there is something weird in the way the original csv data is imported into stata: as in the attached picture, I have the variables party_id and votes_percent so that e.g. Pd obtained 22.74% of the total vote. However, if I try to tabulate votes_percent, the vote share (which should be the frequency attached to each party) appears as the value label. I have tried recoding the data in many different ways but none seems to work! I would need a single variable with the names of the party as value label and the party vote percent as frequency of that variable.
    Would anyone be able to help?
    Click image for larger version

Name:	Data browse.png
Views:	1
Size:	11.7 KB
ID:	1720049
    Click image for larger version

Name:	tabs.png
Views:	1
Size:	25.5 KB
ID:	1720048

    I attach the original csv in case you need it (dataex is not working, I am very sorry). Let me know if you need any more information on my side!
    results-parties-it.csv




  • #2
    I would need a single variable with the names of the party as value label and the party vote percent as frequency of that variable.
    For that, -tab- is the wrong command. Try -tabstat votes_percent, by(party_id)-. Read -help tabstat- for options that can help you customize the appearance of the table.

    dataex is not working
    What does this mean? In what way is it not working? What goes wrong when you try to use it? -dataex- is an indispensable tool for Statalist members. If it is somehow "broken" you need to get it fixed to get the most out of Statalist.

    Comment


    • #3
      I will gently disagree with Clyde and suggest that you may use tab, including the noninteger vote shares as -iweights-.

      Code:
      tab party_id [iw=votes_percent]

      Comment


      • #4
        Ah, I see what Andrew Musau's code does. That is not what my -tabstat- would have done, and is not what I understood to be the question. But his is a more sensible interpretation of the question than mine, so probably #2 is what O.P. wants.

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          I will gently disagree with Clyde and suggest that you may use tab, including the noninteger vote shares as -iweights-.

          Code:
          tab party_id [iw=votes_percent]
          Thank you so much! Both commands worked! It works. Is there a way I can save this as a new variable? Cause generate does not allow weights in the options if I am not mistaken... Any suggestions?

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            For that, -tab- is the wrong command. Try -tabstat votes_percent, by(party_id)-. Read -help tabstat- for options that can help you customize the appearance of the table.


            What does this mean? In what way is it not working? What goes wrong when you try to use it? -dataex- is an indispensable tool for Statalist members. If it is somehow "broken" you need to get it fixed to get the most out of Statalist.
            Hi Clyde! Thank you so much for the command, it worked (and thanks for the many contributions to this forum: they've helped me time and again!) - For dataex I meant that I installed in stata, ran the command, but when I tried to copy and paste everything to make the post my computer just froze. I tried many times and it always did the same... so I don't know what it could be but please if you ever heard of similar problems let me know how it got solved!

            Comment


            • #7
              For dataex I meant that I installed in stata, ran the command, but when I tried to copy and paste everything to make the post my computer just froze. I tried many times and it always did the same...
              I have not heard of that happening to anyone before. I would reboot my computer and see if this behavior persists. If so, I think you should contact Stata Technical Support about this.

              Comment


              • #8
                Originally posted by Federica Di Chiara View Post
                I would need a single variable with the names of the party as value label and the party vote percent as frequency of that variable.
                If this is your goal, install labmask from the Stata Journal. However, note that you can not label noninteger values. So you need to round to the nearest whole number if you want to do this. This may inadvertently lead to totals exceeding 100%.

                Code:
                net install gr0034, from(http://www.stata-journal.com/software/sj8-2)
                Then simply:

                Code:
                clonevar wanted=votes_percent
                replace wanted= round(wanted, 1)
                labmask wanted, values(party_id)
                Last edited by Andrew Musau; 11 Jul 2023, 18:45.

                Comment


                • #9
                  Originally posted by Andrew Musau View Post

                  If this is your goal, install labmask from the Stata Journal. However, note that you can not label noninteger values. So you need to round to the nearest whole number if you want to do this. This may inadvertently lead to totals exceeding 100%.

                  Code:
                  net install gr0034, from(http://www.stata-journal.com/software/sj8-2)
                  Then simply:

                  Code:
                  clonevar wanted=votes_percent
                  replace wanted= round(wanted, 1)
                  labmask wanted, values(party_id)
                  Thank you so much for this! I really appreciate it. However, the command is not working for me. Stata reports that "party_id not constant within groups of wanted". I have destringed the variable so that know it takes values between 1 and 10 (and each of them is then labeled with the name of a party). The format is byte. Can you help me understanding what is happening? Thank you so much!

                  Comment


                  • #10
                    That indicates that the same value is mapped to different labels. From #1, rounding to the nearest whole number implies that "+E" with 3.11 and "Other" with 3.01 each have a vote share equaling 3 percent. I think you can have the value as vote_percent*100 instead of vote_percent. However, if two parties had exactly the same number of votes, you will not be able to label the vote share with the party name as the vote share does not uniquely identify a party.

                    Code:
                    clear
                    input str5 party_id float(vote_percent)
                    "+E" 3.11
                    "Other" 3.01
                    end
                    
                    gen wanted= round(vote_percent, 1)
                    *REPLICATES THE ERROR
                    labmask wanted, values(party_id)
                    *VOTE PERCENT*100
                    gen vote_pct100= round(vote_percent*100, 1)
                    gen wanted2= vote_pct100
                    labmask wanted2, values(party_id)
                    lab list
                    Res.:

                    Code:
                    . gen wanted= round(vote_percent, 1)
                    
                    .
                    . *REPLICATES THE ERROR
                    
                    .
                    . labmask wanted, values(party_id)
                    party_id not constant within groups of wanted
                    r(198);
                    
                    .
                    . *VOTE PERCENT*100
                    
                    .
                    . gen vote_pct100= round(vote_percent*100, 1)
                    
                    .
                    . gen wanted2= vote_pct100
                    
                    .
                    . labmask wanted2, values(party_id)
                    
                    .
                    . lab list
                    wanted2:
                             301 Other
                             311 +E
                    So above, 301 is 3.01% and 311 is 3.11%.

                    Comment

                    Working...
                    X