Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge and Replace Observations

    Hello,

    Im coming on here for some support on merging observations from one row into two columns, specifically moving OTHER to the column othervote, and REPUBLICAN to the column rep as shown in the attachment. I have tried using find and replace code and filter but couldn't get the result I wanted. I'd appreciate any input on this. Thanks.
    Attached Files

  • #2
    Perhaps somebody else understands what you are asking and will respond first. But I have to say I have no idea what you want here. Where do OTHER and REPUBLICAN come from. There is nothing in your post showing the existence of anything by those names. And I can't even begin to imagine what "merging one row into two columns" means. Please clarify your question.

    Also, while I'm sure you intended to be helpful showing the screenshot of your data, if you read the Forum FAQ you will learn that this is really the least helpful way to show example data. When you post back, and in the future, please instead use the -dataex- command to show example data. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      You'll need to use egen/cond to do so.

      Code:
      egen rep = mean(cond(party=="REPUBLICAN",candidatevotes,.)), by(county_fips)
      egen other = mean(cond(party=="OTHER",candidatevotes,.)), by(county_fips)
      then

      Code:
      keep if party=="DEMOCRAT"
      drop party
      Last edited by George Ford; 14 Dec 2023, 11:11. Reason: short some )

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Perhaps somebody else understands what you are asking and will respond first. But I have to say I have no idea what you want here. Where do OTHER and REPUBLICAN come from. There is nothing in your post showing the existence of anything by those names. And I can't even begin to imagine what "merging one row into two columns" means. Please clarify your question.

        Also, while I'm sure you intended to be helpful showing the screenshot of your data, if you read the Forum FAQ you will learn that this is really the least helpful way to show example data. When you post back, and in the future, please instead use the -dataex- command to show example data. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
        Thanks for letting me know. By other and republican, I'm referring to the cells under the variable Party. what I'm intended to do is that have each county as a row with three column for types of votes, instead of having each county repeated three times.

        input int year str20 state str2 state_po str21 county_name long county_fips str11 party long(candidatevotes totalvotes) float(dem othervote rep)
        2020 "ALABAMA" "AL" "AUTAUGA" 1001 "DEMOCRAT" 7503 27770 7503 7503 7503
        2020 "ALABAMA" "AL" "AUTAUGA" 1001 "OTHER" 429 27770 429 429 429
        2020 "ALABAMA" "AL" "AUTAUGA" 1001 "REPUBLICAN" 19838 27770 19838 19838 19838
        2020 "ALABAMA" "AL" "BALDWIN" 1003 "DEMOCRAT" 24578 109679 24578 24578 24578
        2020 "ALABAMA" "AL" "BALDWIN" 1003 "OTHER" 1557 109679 1557 1557 1557
        2020 "ALABAMA" "AL" "BALDWIN" 1003 "REPUBLICAN" 83544 109679 83544 83544 83544
        2020 "ALABAMA" "AL" "BARBOUR" 1005 "DEMOCRAT" 4816 10518 4816 4816 4816
        2020 "ALABAMA" "AL" "BARBOUR" 1005 "OTHER" 80 10518 80 80 80
        2020 "ALABAMA" "AL" "BARBOUR" 1005 "REPUBLICAN" 5622 10518 5622 5622 5622
        2020 "ALABAMA" "AL" "BIBB" 1007 "DEMOCRAT" 1986 9595 1986 1986 1986
        2020 "ALABAMA" "AL" "BIBB" 1007 "OTHER" 84 9595 84 84 84
        2020 "ALABAMA" "AL" "BIBB" 1007 "REPUBLICAN" 7525 9595 7525 7525 7525
        2020 "ALABAMA" "AL" "BLOUNT" 1009 "DEMOCRAT" 2640 27588 2640 2640 2640
        2020 "ALABAMA" "AL" "BLOUNT" 1009 "OTHER" 237 27588 237 237 237
        2020 "ALABAMA" "AL" "BLOUNT" 1009 "REPUBLICAN" 24711 27588 24711 24711 24711

        Comment


        • #5
          OK, now I get it. And thanks for the data example.

          Code:
          drop dem othervote rep
          rename candidatevotes _
          reshape wide _, i(county_fips year) j(party) string
          rename _* *
          
          
          assert totalvotes == DEMOCRAT + OTHER + REPUBLICAN
          Let me just point out that for most purposes in Stata, the long layout (each county in three observations) works better than the wide (one observation per county) one. I don't know how you are going to be using these variables DEMOCRAT, OTHER, and REPUBLICAN, so I can't say which way will make your life easier. But give it some thought before proceeding with this.

          If you do decide to stay with the long layout, you should definitely remove the original variables dem othervote and rep as they are largely useless as currently set up: two of the three values are always wrong, and the correct one is redundant with the candidatevotes variable. Assuming you also need the total votes if you keep a long layout, you can get that with:
          Code:
          by county_fips year (party), sort: egen total_votes = total(candidatevotes)

          Comment

          Working...
          X