Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping Data and Changing observations to Variables

    Hi all,

    I am relatively new to using Stata and I am struggling with reshaping some data.
    I have a .csv file which comprises of election results. The data is not properly arranged in columns and therefore looks odd (data sample below). I wish to make District, Block, Seat Name, Total Elector, Votes Polled, Votes Rejected, OSN, Candidate Name, Party Name and Vote Secured into variables. Currently they are observations. I have multiple .csv sheets like this and I wish to repeat the same steps for the others preferably using a loop.

    Thanks!
    Noyonika
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str51 v1 str13 v2 str12 v3 str14 v4 str3 v5 str21 v6 str10 v7 str12 v8
    "Panchayat General Election 2013 — Detailed Result" "" "" "" "" "" "" ""
    "District : Burdwan" "" "" "" "" "" "" ""
    "Block : BHATAR" "" "" "" "" "" "" ""
    "" "" "" "" "" "" "" ""
    "Polling Date : 15-07-2013" "" "" "" "" "" "" ""
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "MAHATA /PS-1" "5668" "4896" "218" "1" "TUMPA NASKAR" "CPIM" "1871"
    "SCW" "" "" "" "" "" "" ""
    "" "" "" "" "2" "SUSHANTI MISTRI" "AITC" "2807"
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "MAHATA /PS-2" "4877" "4213" "99" "1" "DILIP SOREN" "AITC" "2463"
    "ST" "" "" "" "" "" "" ""
    "" "" "" "" "2" "SUNIL MURMU" "CPIM" "1651"
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "MAHATA /PS-3" "5585" "" "" "1" "BIBI MURMU" "AITC" ""
    "STW" "" "" "" "" "" "" ""
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "ERUAR /PS-4" "5078" "" "" "1" "BHAGIRATHI HALDER" "AITC" ""
    "W" "" "" "" "" "" "" ""
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "ERUAR /PS-5" "4826" "" "" "1" "ABIR DAS" "AITC" ""
    "SC" "" "" "" "" "" "" ""
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "ERUAR /PS-6" "5342" "" "" "1" "RUPALI SAHA" "AITC" ""
    "W" "" "" "" "" "" "" ""
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "NITYANANDAPUR /PS-7" "5247" "4228" "68" "1" "ANITA SANTRA" "BJP" "126"
    "SCW" "" "" "" "" "" "" ""
    "" "" "" "" "2" "APARNA SAHA" "AITC" "2421"
    "" "" "" "" "3" "SUMITRA KUSHMETE" "CPIM" "1613"
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "NITYANANDAPUR /PS-8" "5387" "4286" "125" "1" "GITA RANI KARFA" "CPIM" "1496"
    "W" "" "" "" "" "" "" ""
    "" "" "" "" "2" "PIU SAMANTA" "AITC" "2665"
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "NITYANANDAPUR /PS-9" "5404" "4600" "191" "1" "GOUTAM SAMANTA" "BJP" "122"
    "General" "" "" "" "" "" "" ""
    "" "" "" "" "2" "JAMAL HOSSAIN SEKH" "CPIM" "1772"
    "" "" "" "" "3" "MOYNUL HAQUE" "AIUDF" "291"
    "" "" "" "" "4" "SAMIR HAZRA" "IND" "18"
    "" "" "" "" "5" "SK NURALAM" "AITC" "2206"
    "Seat Name" "Total Elector" "Votes Polled" "Votes Rejected" "OSN" "Candidate Name" "Party Name" "Vote Secured"
    "BOLGONA /PS-10" "5040" "4146" "203" "1" "TAPAS MAJI" "IND" "463"
    "SC" "" "" "" "" "" "" ""
    "" "" "" "" "2" "DILIP KUMAR MALIK" "AIFB" "1443"
    "" "" "" "" "3" "MANOJ KUMAR MA

    Attached Files
    Last edited by Noyonika Das; 01 Sep 2022, 04:09.

  • #2
    Noyonika Das screenshots are not very useful. Please provide a data extract using -dataex- so that others can use it to help create a solution for you (see also the Statalist FAQ, esp. #12).

    Comment


    • #3
      Does this work?
      Code:
      gen district = subinstr(v1[2],"District : ","",1)
      gen block = subinstr(v1[3],"Block : ","",1)
      gen polling_date = daily(subinstr(v1[5],"Polling Date : ","",1),"DMY")
      format polling_date %td
      
      rename v1 seat_name
      rename v2 total_electors
      rename v3 votes_polled
      rename v4 votes_rejected
      rename v5 osn
      rename v6 candidate_name
      rename v7 party_name
      rename v8 vote_secured
      
      drop in 1/5
      replace seat_name = trim(seat_name)
      drop if seat_name == "Seat Name"
      
      gen seat_type = seat_name[_n+1] if inlist(seat_name[_n+1],"General","W","SCW","SC","STW","ST")
      drop if inlist(seat_name,"General","W","SCW","SC","STW","ST")
      
      destring osn total_electors votes_polled votes_rejected vote_secured, replace
      
      order district block polling_date seat_name seat_type osn
      
      foreach var of varlist seat_name seat_type total_electors-vote_secured {
          replace `var' = `var'[_n-1] if missing(`var') & osn != 1
      }
      
      compress
      Last edited by Hemanshu Kumar; 01 Sep 2022, 04:54.

      Comment


      • #4
        Hemanshu Kumar Yes! That works. Thanks a lot!
        Another question: I want the seat name to appear in every row. For eg; seat_name MAHATA/PS-1 has two candidates but the seat name appears for only the first candidate. How do I ensure that the seat name is present beside every candidate name?

        Thanks again!

        [CODE]

        clear
        input str7 district str6 block float polling_date str51 seat_name str7 seat_type byte osn int(total_electors votes_polled votes_rejected) str21 candidate_name str10 party_name int vote_secured
        "Burdwan" "BHATAR" 19554 "MAHATA /PS-1" "SCW" 1 5668 4896 218 "TUMPA NASKAR" "CPIM" 1871
        "Burdwan" "BHATAR" 19554 "" "" 2 . . . "SUSHANTI MISTRI" "AITC" 2807
        "Burdwan" "BHATAR" 19554 "MAHATA /PS-2" "ST" 1 4877 4213 99 "DILIP SOREN" "AITC" 2463
        "Burdwan" "BHATAR" 19554 "" "" 2 . . . "SUNIL MURMU" "CPIM" 1651
        "Burdwan" "BHATAR" 19554 "MAHATA /PS-3" "STW" 1 5585 . . "BIBI MURMU" "AITC" .
        "Burdwan" "BHATAR" 19554 "ERUAR /PS-4" "W" 1 5078 . . "BHAGIRATHI HALDER" "AITC" .
        "Burdwan" "BHATAR" 19554 "ERUAR /PS-5" "SC" 1 4826 . . "ABIR DAS" "AITC" .
        "Burdwan" "BHATAR" 19554 "ERUAR /PS-6" "W" 1 5342 . . "RUPALI SAHA" "AITC" .

        Comment


        • #5
          This problem does not appear in the data extract you provided and the solution I created from it, at least when I run it on my machine. Could you copy my code from above afresh, run it again and double-check?

          I get:
          Code:
          . li seat_name-vote_secured, sepby(seat_name) noobs
          
            +------------------------------------------------------------------------------------------------------------------+
            |           seat_name   seat_t~e   osn   total_~s   vote~led   vote~ted       candidate_name   party_~e   vote_s~d |
            |------------------------------------------------------------------------------------------------------------------|
            |        MAHATA /PS-1        SCW     1       5668       4896        218         TUMPA NASKAR       CPIM       1871 |
            |        MAHATA /PS-1        SCW     2       5668       4896        218      SUSHANTI MISTRI       AITC       2807 |
            |------------------------------------------------------------------------------------------------------------------|
            |        MAHATA /PS-2         ST     1       4877       4213         99          DILIP SOREN       AITC       2463 |
            |        MAHATA /PS-2         ST     2       4877       4213         99          SUNIL MURMU       CPIM       1651 |
            |------------------------------------------------------------------------------------------------------------------|
            |        MAHATA /PS-3        STW     1       5585          .          .           BIBI MURMU       AITC          . |
            |------------------------------------------------------------------------------------------------------------------|
            |         ERUAR /PS-4          W     1       5078          .          .    BHAGIRATHI HALDER       AITC          . |
            |------------------------------------------------------------------------------------------------------------------|
            |         ERUAR /PS-5         SC     1       4826          .          .             ABIR DAS       AITC          . |
            |------------------------------------------------------------------------------------------------------------------|
            |         ERUAR /PS-6          W     1       5342          .          .          RUPALI SAHA       AITC          . |
            |------------------------------------------------------------------------------------------------------------------|
            | NITYANANDAPUR /PS-7        SCW     1       5247       4228         68         ANITA SANTRA        BJP        126 |
            | NITYANANDAPUR /PS-7        SCW     2       5247       4228         68          APARNA SAHA       AITC       2421 |
            | NITYANANDAPUR /PS-7        SCW     3       5247       4228         68     SUMITRA KUSHMETE       CPIM       1613 |
            |------------------------------------------------------------------------------------------------------------------|
            | NITYANANDAPUR /PS-8          W     1       5387       4286        125      GITA RANI KARFA       CPIM       1496 |
            | NITYANANDAPUR /PS-8          W     2       5387       4286        125          PIU SAMANTA       AITC       2665 |
            |------------------------------------------------------------------------------------------------------------------|
            | NITYANANDAPUR /PS-9    General     1       5404       4600        191       GOUTAM SAMANTA        BJP        122 |
            | NITYANANDAPUR /PS-9    General     2       5404       4600        191   JAMAL HOSSAIN SEKH       CPIM       1772 |
            | NITYANANDAPUR /PS-9    General     3       5404       4600        191         MOYNUL HAQUE      AIUDF        291 |
            | NITYANANDAPUR /PS-9    General     4       5404       4600        191          SAMIR HAZRA        IND         18 |
            | NITYANANDAPUR /PS-9    General     5       5404       4600        191           SK NURALAM       AITC       2206 |
            |------------------------------------------------------------------------------------------------------------------|
            |      BOLGONA /PS-10         SC     1       5040       4146        203           TAPAS MAJI        IND        463 |
            |      BOLGONA /PS-10         SC     2       5040       4146        203    DILIP KUMAR MALIK       AIFB       1443 |
            +------------------------------------------------------------------------------------------------------------------+
          Last edited by Hemanshu Kumar; 01 Sep 2022, 06:17.

          Comment


          • #6
            Hemanshu Kumar Sorry! My bad, I did not run the " foreach var of varlist..." part of the code. I thought it was a loop function for when there are multiple .csv files.
            It works when I run the whole thing! Thanks so much!

            Comment


            • #7
              Hello,
              I am trying to run the above command for multiple files. I wish to save the new .csv files in a different folder. This is my code but it does not seem to work. Where am I going wrong?

              local files : dir "C:\Users\Ndas1\OneDrive - UvA\Desktop\1st Chapter\Block_STATA" files "*.csv"
              cd "C:\Users\Ndas1\OneDrive - UvA\Desktop\1st Chapter\Block_STATA"
              foreach file of local files {
              import delimited "`file'", clear
              gen district = subinstr(v1[2],"District : ","",1)
              gen block = subinstr(v1[3],"Block : ","",1)
              gen polling_date = daily(subinstr(v1[5],"Polling Date : ","",1),"DMY")
              format polling_date %td

              rename v1 seat_name
              rename v2 total_electors
              rename v3 votes_polled
              rename v4 votes_rejected
              rename v5 osn
              rename v6 candidate_name
              rename v7 party_name
              rename v8 vote_secured

              drop in 1/5
              replace seat_name = trim(seat_name)
              drop if seat_name == "Seat Name"

              gen seat_type = seat_name[_n+1] if inlist(seat_name[_n+1],"General","W","SCW","SC","STW","ST", "BC", "BCW")
              drop if inlist(seat_name,"General","W","SCW","SC","STW","S T","BC", "BCW")

              destring osn total_electors votes_polled votes_rejected vote_secured, replace

              order district block polling_date seat_name seat_type osn

              foreach var of varlist seat_name seat_type total_electors-vote_secured {
              replace `var' = `var'[_n-1] if missing(`var') & osn != 1
              }
              local new : subinstr local file ".csv" "_new.csv", all
              export delimited using "`new'", replace
              export delimited using `""C:/Users/Ndas1/OneDrive - UvA/Desktop/1st Chapter/Block_STATA/Output/'newfilename'"', replace
              }

              Thanks!
              Noyonika

              Comment


              • #8
                Noyonika Das could you please enclose your code in CODE delimiters (use the # button on the formatting toolbar)?

                Also, could you tell us precisely what the error message is, and after which line you get the error?

                Comment


                • #9
                  You also definitely have multiple issues (wrong local name, problems with quotes) in your export delimited lines, of which in any case, you probably need just one. Try this code as a replacement:
                  Code:
                  local new : subinstr local file ".csv" "_new.csv", all 
                  export delimited using `"C:/Users/Ndas1/OneDrive - UvA/Desktop/1st Chapter/Block_STATA/Output/`new'"', replace

                  Comment

                  Working...
                  X