Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to reorganize my raw dataset into the following form: each row is an observation and each column is a variable? Thanks!

    Hi, guys, I'm a new stata user, and I got the following raw data, which documents the brand choice for different teams in different neighborhoods as well as the expenditure on environmental education in each neighborhood. In the following illustrative table, the brand column documents brand choice, team column documents team name, loc_id column documents neighborhood id, and loc_edu documents education expenditure:

    brand team loc_id loc_edu
    A team1 n1 13.1
    B team1 n1 13.1
    A team1 n2 9.7
    B team1 n2 9.7
    X team2 n1 13.1
    Y team2 n1 13.1
    X team2 n3 5
    B team1 n4 6.2
    Y team2 n4 6.2

    team1 could choose brand A and B, only brand A, only brand B or none. team 2 could choose brand X and Y, only brand X, only brand Y or none.
    I want to reorganize my data in the following way:

    loc_id choice_team1 choice_team2 loc_edu
    n1 1 1 13.1
    n2 1 0 9.7
    n3 0 2 5
    n4 3 3 6.2

    In the new table each row corresponds to a loc_id (an observation) and each column corresponds to a variable. team1's choice equals 1 if both A and B are purchased, and equals 2 if only A is purchased, and equals 3 if only B is purchased and equals 0 if none is purchased. team2' choice equals 1 if both X and Y are purchased, equals 2 if only X is purchased, equals 3 if only Y is purchased and equals 0 if none is purchased.

    What's the stata codes that could achieve this? Thanks!




  • #2
    Some people charge hundreds of USD, or the equivalent, for this kind of thing!


    Code:
     
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 brand str5 team str2 loc_id float loc_edu
    "A" "team1" "n1" 13.1
    "B" "team1" "n1" 13.1
    "A" "team1" "n2"  9.7
    "B" "team1" "n2"  9.7
    "X" "team2" "n1" 13.1
    "Y" "team2" "n1" 13.1
    "X" "team2" "n3"    5
    "B" "team1" "n4"  6.2
    "Y" "team2" "n4"  6.2
    end
    
    fillin loc_id team 
    bysort loc_id (loc_edu) : replace loc_edu = loc_edu[1]
    
    gen choice = 0 
    bysort loc_id team (brand): replace choice = 1 if inlist(brand[1], "A", "X") & inlist(brand[2], "B", "Y")
    by loc_id team: replace choice = 2 if _N == 1 & inlist(brand, "A", "X")
    by loc_id team: replace choice = 3 if _N == 1 & inlist(brand, "B", "Y")
    
    duplicates drop loc_id team, force 
    drop brand _fillin 
    
    reshape wide choice, i(loc_id) j(team) string  
    
    list 
    
        +----------------------------------------+
         | loc_id   choice~1   choice~2   loc_edu |
         |----------------------------------------|
      1. |     n1          1          1      13.1 |
      2. |     n2          1          0       9.7 |
      3. |     n3          0          2         5 |
      4. |     n4          3          3       6.2 |
         +----------------------------------------+

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Some people charge hundreds of USD, or the equivalent, for this kind of thing!


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str1 brand str5 team str2 loc_id float loc_edu
      "A" "team1" "n1" 13.1
      "B" "team1" "n1" 13.1
      "A" "team1" "n2" 9.7
      "B" "team1" "n2" 9.7
      "X" "team2" "n1" 13.1
      "Y" "team2" "n1" 13.1
      "X" "team2" "n3" 5
      "B" "team1" "n4" 6.2
      "Y" "team2" "n4" 6.2
      end
      
      fillin loc_id team
      bysort loc_id (loc_edu) : replace loc_edu = loc_edu[1]
      
      gen choice = 0
      bysort loc_id team (brand): replace choice = 1 if inlist(brand[1], "A", "X") & inlist(brand[2], "B", "Y")
      by loc_id team: replace choice = 2 if _N == 1 & inlist(brand, "A", "X")
      by loc_id team: replace choice = 3 if _N == 1 & inlist(brand, "B", "Y")
      
      duplicates drop loc_id team, force
      drop brand _fillin
      
      reshape wide choice, i(loc_id) j(team) string
      
      list
      
      +----------------------------------------+
      | loc_id choice~1 choice~2 loc_edu |
      |----------------------------------------|
      1. | n1 1 1 13.1 |
      2. | n2 1 0 9.7 |
      3. | n3 0 2 5 |
      4. | n4 3 3 6.2 |
      +----------------------------------------+
      Thank you so much Nick! This works!

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Some people charge hundreds of USD, or the equivalent, for this kind of thing!


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str1 brand str5 team str2 loc_id float loc_edu
        "A" "team1" "n1" 13.1
        "B" "team1" "n1" 13.1
        "A" "team1" "n2" 9.7
        "B" "team1" "n2" 9.7
        "X" "team2" "n1" 13.1
        "Y" "team2" "n1" 13.1
        "X" "team2" "n3" 5
        "B" "team1" "n4" 6.2
        "Y" "team2" "n4" 6.2
        end
        
        fillin loc_id team
        bysort loc_id (loc_edu) : replace loc_edu = loc_edu[1]
        
        gen choice = 0
        bysort loc_id team (brand): replace choice = 1 if inlist(brand[1], "A", "X") & inlist(brand[2], "B", "Y")
        by loc_id team: replace choice = 2 if _N == 1 & inlist(brand, "A", "X")
        by loc_id team: replace choice = 3 if _N == 1 & inlist(brand, "B", "Y")
        
        duplicates drop loc_id team, force
        drop brand _fillin
        
        reshape wide choice, i(loc_id) j(team) string
        
        list
        
        +----------------------------------------+
        | loc_id choice~1 choice~2 loc_edu |
        |----------------------------------------|
        1. | n1 1 1 13.1 |
        2. | n2 1 0 9.7 |
        3. | n3 0 2 5 |
        4. | n4 3 3 6.2 |
        +----------------------------------------+
        Hi Nick, one last question. I just noticed that I have some neighborhoods in which teams purchase A or B multiple times (see n5 in the example below). For such observations, I did not get what I wanted.
        brand team loc_id loc_edu
        A team1 n1 13.1
        B team1 n1 13.1
        A team1 n2 9.7
        B team1 n2 9.7
        X team2 n1 13.1
        Y team2 n1 13.1
        X team2 n3 5
        B team1 n4 6.2
        Y team2 n4 6.2
        A team1 n5 24
        A team1 n5 24
        A team1 n5 24
        B team1 n5 24
        B team1 n5 24
        In this example, for loc_id=n5, the choice is set to 0(see the output below):

        loc_id choiceteam1 choiceteam2 loc_edu
        n1 1 1 13.1
        n2 1 0 9.7
        n3 0 2 5
        n4 3 3 6.2
        n5 0 0 24

        But I still want choiceteam1=1 for n5 as both A and B are purchased. How could I fix this? Thanks!

        Comment


        • #5
          As in #2, the duplicates command can be used to drop unwanted duplicates.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            As in #2, the duplicates command can be used to drop unwanted duplicates.
            Thanks Nick! This is very helpful!

            Comment

            Working...
            X