Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shift observations to new variables IF certain conditions are met

    Dear all,

    I don't know if reshape is the appropriate command to use for what I want but I don't think so. Suppose I have the following dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int position str12 name str5 type byte isnumeric str6(format vallab) str22 varlab float year
     1 "make"         "str18" 0 "%-18s"  ""       "Make and model"         2000
     2 "price"        "int"   1 "%8.0gc" ""       "Price"                  2000
     3 "mpg"          "int"   1 "%8.0g"  ""       "Mileage (mpg)"          2000
     4 "rep78"        "int"   1 "%8.0g"  ""       "Repair record 1978"     2000
     5 "headroom"     "float" 1 "%6.1f"  ""       "Headroom (in.)"         2000
     6 "trunk"        "int"   1 "%8.0g"  ""       "Trunk space (cu. ft.)"  2000
     7 "weight"       "int"   1 "%8.0gc" ""       "Weight (lbs.)"          2010
     8 "length"       "int"   1 "%8.0g"  ""       "Length (in.)"           2010
     9 "turn"         "int"   1 "%8.0g"  ""       "Turn circle (ft.)"      2010
    10 "displacement" "int"   1 "%8.0g"  ""       "Displacement (cu. in.)" 2010
    11 "gear_ratio"   "float" 1 "%6.2f"  ""       "Gear ratio"             2010
    12 "foreign"      "byte"  1 "%8.0g"  "origin" "Car origin"             2010
    end


    My real dataset is a list of variables from different survey waves. I wish to identify those who contain the same information - but may have different names, formats, categories, etc. - and to put them as the same observation. So for this toy example, suppose I say that the variable "turn" is the 2010 equivalent of the 2000 variable "make", and that the variable "weight" is the 2010 equivalent of the 2000 variable price.

    How can I code a do-file so that my final dataset looks lile:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 name str5 type byte isnumeric str6(format vallab) str22 varlab float year str12 name2 str5 type2 byte isnumeric2 str6(format2 vallab2) str22 varlab2 float year2
    "make"         "str18" 0 "%-18s"  ""       "Make and model"         2000 "turn"   "int" 1 "%8.0g"  "" "Turn circle (ft.)" 2010
    "price"        "int"   1 "%8.0gc" ""       "Price"                  2000 "weight" "int" 1 "%8.0gc" "" "Weight (lbs.)"     2010
    "mpg"          "int"   1 "%8.0g"  ""       "Mileage (mpg)"          2000 ""       ""    . ""       "" ""                     .
    "rep78"        "int"   1 "%8.0g"  ""       "Repair record 1978"     2000 ""       ""    . ""       "" ""                     .
    "headroom"     "float" 1 "%6.1f"  ""       "Headroom (in.)"         2000 ""       ""    . ""       "" ""                     .
    "trunk"        "int"   1 "%8.0g"  ""       "Trunk space (cu. ft.)"  2000 ""       ""    . ""       "" ""                     .
    "length"       "int"   1 "%8.0g"  ""       "Length (in.)"           2010 ""       ""    . ""       "" ""                     .
    "displacement" "int"   1 "%8.0g"  ""       "Displacement (cu. in.)" 2010 ""       ""    . ""       "" ""                     .
    "gear_ratio"   "float" 1 "%6.2f"  ""       "Gear ratio"             2010 ""       ""    . ""       "" ""                     .
    "foreign"      "byte"  1 "%8.0g"  "origin" "Car origin"             2010 ""       ""    . ""       "" ""                     .
    end

    The position of the observations do not matter. I assume I would identify matching variables by manually storing them in a local, e.g. :

    Code:
    local match_make make turn
    or by assigning them a common id, for example by generating a common id for matching variable, even if I would prefer the first solution. Any help is much appreciated !

    Edit : It would also be best if the names of the variables follow a common name, as I want to classify the variable by reverse chronological order.

    Edit 2 because I feel like i was unclear: I want to have a do-file containing only locals of the equivalences like

    Code:
    local match1 make price
    local match2 price weight
    * ...
    and then to have a loop doing like:

    Code:
    forvalues x = 1/9 { // Assuming I have 9 macros match
    foreach var of local match`x' {
    
    * The actual code to put the matching variable in the same observation
    
    }
    }
    Adam
    Last edited by Adam Sadi; 18 Sep 2023, 01:10.

  • #2
    The important thing is to generate a variable that has a common name for the matched variables (below named "Name"). However you do this is up to you. Then you can reshape wide.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int position str12 name str5 type byte isnumeric str6(format vallab) str22 varlab float year
     1 "make"         "str18" 0 "%-18s"  ""       "Make and model"         2000
     2 "price"        "int"   1 "%8.0gc" ""       "Price"                  2000
     3 "mpg"          "int"   1 "%8.0g"  ""       "Mileage (mpg)"          2000
     4 "rep78"        "int"   1 "%8.0g"  ""       "Repair record 1978"     2000
     5 "headroom"     "float" 1 "%6.1f"  ""       "Headroom (in.)"         2000
     6 "trunk"        "int"   1 "%8.0g"  ""       "Trunk space (cu. ft.)"  2000
     7 "weight"       "int"   1 "%8.0gc" ""       "Weight (lbs.)"          2010
     8 "length"       "int"   1 "%8.0g"  ""       "Length (in.)"           2010
     9 "turn"         "int"   1 "%8.0g"  ""       "Turn circle (ft.)"      2010
    10 "displacement" "int"   1 "%8.0g"  ""       "Displacement (cu. in.)" 2010
    11 "gear_ratio"   "float" 1 "%6.2f"  ""       "Gear ratio"             2010
    12 "foreign"      "byte"  1 "%8.0g"  "origin" "Car origin"             2010
    end
    
    
    gen Name= cond(name=="turn" & year==2010, "make",cond( name=="weight" & year==2010, "price", name))
    bys Name (year): gen which= _n
    drop pos
    reshape wide name type isnumeric format vallab varlab year , i(Name) j(which)
    Res.:

    Code:
    . l, sep(0)
    
         +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
         |         Name          name1   type1   isnume~1   format1   vallab1                  varlab1   year1    name2   type2   isnume~2   format2   vallab2             varlab2   year2 |
         |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
      1. | displacement   displacement     int          1     %8.0g             Displacement (cu. in.)    2010                           .                                               . |
      2. |      foreign        foreign    byte          1     %8.0g    origin               Car origin    2010                           .                                               . |
      3. |   gear_ratio     gear_ratio   float          1     %6.2f                         Gear ratio    2010                           .                                               . |
      4. |     headroom       headroom   float          1     %6.1f                     Headroom (in.)    2000                           .                                               . |
      5. |       length         length     int          1     %8.0g                       Length (in.)    2010                           .                                               . |
      6. |         make           make   str18          0     %-18s                     Make and model    2000     turn     int          1     %8.0g             Turn circle (ft.)    2010 |
      7. |          mpg            mpg     int          1     %8.0g                      Mileage (mpg)    2000                           .                                               . |
      8. |        price          price     int          1    %8.0gc                              Price    2000   weight     int          1    %8.0gc                 Weight (lbs.)    2010 |
      9. |        rep78          rep78     int          1     %8.0g                 Repair record 1978    2000                           .                                               . |
     10. |        trunk          trunk     int          1     %8.0g              Trunk space (cu. ft.)    2000                           .                                               . |
         +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

    Comment

    Working...
    X