Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace observation by all other values that are not in the group

    Hello everyone.

    Following is an example data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 state int region long pop65p float medage long death
    "Alabama"     3  440015 29.3      .
    "Alabama"     3  440015 29.3      .
    "Alabama"     3  440015 29.3      .
    "Alabama"     3  440015 29.3  35305
    "Alaska"      4   11547 26.1      .
    "Alaska"      4   11547 26.1      .
    "Alaska"      4   11547 26.1      .
    "Alaska"      4   11547 26.1   1604
    "Arizona"     4  307362 29.2      .
    "Arizona"     4  307362 29.2      .
    "Arizona"     4  307362 29.2      .
    "Arizona"     4  307362 29.2  21226
    "Arkansas"    3  312477 30.6      .
    "Arkansas"    3  312477 30.6      .
    "Arkansas"    3  312477 30.6      .
    "Arkansas"    3  312477 30.6  22676
    "California"  4 2414250 29.9      .
    "California"  4 2414250 29.9      .
    "California"  4 2414250 29.9      .
    "California"  4 2414250 29.9 186428
    "Colorado"    4  247325 28.6      .
    "Colorado"    4  247325 28.6      .
    "Colorado"    4  247325 28.6      .
    "Colorado"    4  247325 28.6  18925
    "Connecticut" 1  364864   32      .
    "Connecticut" 1  364864   32      .
    "Connecticut" 1  364864   32      .
    "Connecticut" 1  364864   32  26005
    "Delaware"    3   59179 29.8      .
    "Delaware"    3   59179 29.8      .
    "Delaware"    3   59179 29.8      .
    "Delaware"    3   59179 29.8   5123
    "Florida"     3 1687573 34.7      .
    "Florida"     3 1687573 34.7      .
    "Florida"     3 1687573 34.7      .
    "Florida"     3 1687573 34.7 104190
    "Georgia"     3  516731 28.7      .
    "Georgia"     3  516731 28.7      .
    "Georgia"     3  516731 28.7      .
    "Georgia"     3  516731 28.7  44230
    "Hawaii"      4   76150 28.4      .
    "Hawaii"      4   76150 28.4      .
    "Hawaii"      4   76150 28.4      .
    "Hawaii"      4   76150 28.4   4849
    "Idaho"       4   93680 27.6      .
    "Idaho"       4   93680 27.6      .
    "Idaho"       4   93680 27.6      .
    "Idaho"       4   93680 27.6   6753
    end
    label values region cenreg
    label def cenreg 1 "NE", modify
    label def cenreg 2 "N Cntrl", modify
    label def cenreg 3 "South", modify
    label def cenreg 4 "West", modify
    In the example data if we look at state of "Alabama", we see that region takes the value 3 where deaths are non-missing. I would like to replace region with all other values of region where the variable death is missing. Example of desired output:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 state int region long pop65p float medage long death
    "Alabama"     1  440015 29.3      .
    "Alabama"     2  440015 29.3      .
    "Alabama"     4  440015 29.3      .
    "Alabama"     3  440015 29.3  35305
    end
    Over here the example has sequenced numbers for region however, what if the sequence did not exist, how will the above problem be solved?

  • #2
    Code:
    by state (death), sort: gen new_region = _n
    by state: replace death = death[1] if new_region == region
    replace death = . if new_region != region
    gsort state new_region

    Comment


    • #3
      Thank you Clyde Schechter for sharing this solution.

      Just to build on this, i would like ask how this solution would be modified if the numbers were not in sequence. Will I then use -levelsof-, if so, then how will that assure we have unique observations filling in missing rows.

      Comment


      • #4
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str14 state int region long pop65p float medage long death
        "Alabama"     22  440015 29.3      .
        "Alabama"     22  440015 29.3      .
        "Alabama"     22  440015 29.3      .
        "Alabama"     22  440015 29.3  35305
        "Alaska"      71   11547 26.1      .
        "Alaska"      71  11547 26.1      .
        "Alaska"      71  11547 26.1      .
        "Alaska"      71   11547 26.1   1604
        "Arizona"     94  307362 29.2      .
        "Arizona"     94  307362 29.2      .
        "Arizona"     94  307362 29.2      .
        "Arizona"     94  307362 29.2  21226
        "California"  53  440015 29.3      .
        "California"  53  440015 29.3      .
        "California"  53  440015 29.3      .
        "California"  53  440015 29.3  35305
        end
        
        bys state (death): gen which=_n
        frame put region state if !missing(death), into(mylist)
        frame mylist{
            fillin region state
            bys state (_fillin): gen which=_n
        }
        frlink 1:1 state which, frame(mylist)
        frget region2=region, from(mylist)
        replace region= region2
        drop which-region2
        frame drop mylist
        Res.:

        Code:
        
        
        . l, sepby(state)
        
             +-----------------------------------------------+
             |      state   region   pop65p   medage   death |
             |-----------------------------------------------|
          1. |    Alabama       22   440015     29.3   35305 |
          2. |    Alabama       71   440015     29.3       . |
          3. |    Alabama       53   440015     29.3       . |
          4. |    Alabama       94   440015     29.3       . |
             |-----------------------------------------------|
          5. |     Alaska       71    11547     26.1    1604 |
          6. |     Alaska       22    11547     26.1       . |
          7. |     Alaska       53    11547     26.1       . |
          8. |     Alaska       94    11547     26.1       . |
             |-----------------------------------------------|
          9. |    Arizona       94   307362     29.2   21226 |
         10. |    Arizona       22   307362     29.2       . |
         11. |    Arizona       53   307362     29.2       . |
         12. |    Arizona       71   307362     29.2       . |
             |-----------------------------------------------|
         13. | California       53   440015     29.3   35305 |
         14. | California       22   440015     29.3       . |
         15. | California       94   440015     29.3       . |
         16. | California       71   440015     29.3       . |
             +-----------------------------------------------+
        Last edited by Andrew Musau; 17 Apr 2024, 01:25.

        Comment


        • #5
          Awesome, thanks Andrew Musau

          Comment

          Working...
          X