Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with coding

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 Name byte var2 byte var3
    "Sushila " . 2
    "Sushila " . 2
    "Sushila " 1 1
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Sushila " 2 2
    "Ram "     1 1
    "Ram "     2 2
    "Ram "     1 1
    "Ram "     2 2
    "Ram "     1 1
    "Ram "     1 1
    "Ram "     1 1
    "Ram "     1 1
    "Ram "     1 1
    "Ram "     1 1
    "Ram "     . 1
    "Ram "     1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    2 2
    "Vinay"    2 2
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    . 1
    "Vinay"    . 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    "Vinay"    1 1
    end

    Hi, I have the above data. I want to do the following. For every value of variable Name , I want to replace missing var2 with the value of var2 that it takes atleast 80 % within that name.

    For example , for the name Sushila, var2 takes value 2 , 80% of the times (considering only non missing values of var2). So within Name taking value Sushila, I want to replace the two missing values of var2 with 2.


    After replacement, var2 should look like var3.

    How do I code the above in Stata.

    Thanks
    Last edited by Akanksha Aggarwal; 16 Jun 2021, 09:21.

  • #2
    this isn't pretty but should do the job:


    Code:
    *first steps identify fraction of obs to get at 80% threshold
    bysort Name var2: gen subcounter = _N //count instances of specific var2 value within name & var2 value
    gen namecounter=1 if var2!=. //setup to count how many nonmissings of var2 there are for each name
    bysort Name: replace namecounter=sum(namecounter)
    bysort Name: replace namecounter=namecounter[_N] //get count ofnomissings var2 by name
    gen countfraction = subcounter/namecounter //create fraction of overall number of nonmissings by specific var2 value
    
    
    *next steps create the replacement variable
    gen var2sub = var2 if countfraction>0.8
    sort Name var2sub
    replace var2sub=var2sub[_n-1] if Name==Name[_n-1] & var2sub==. //replace missings with the value you want to substitute in
    
    
    *create new variable equal to your var3
    gen newvar2 = var2
    replace newvar2 = var2sub if newvar2==.
    
    
    *just some extra checks and housekeeping
    gen artificialvar2 = (var2==.) //denotes which vars were added "artificially"
    gen check = (var3==newvar2) //tab to check whether my var equals your manual var3 construction
    drop subcounter namecounter countfraction var2sub check
    Last edited by John Kirk; 16 Jun 2021, 09:47.

    Comment


    • #3
      Here is another approach. This code collapses the data to count how many unique responses are there per person, then it would retain those with more than 0.8 (80%). Afterwards, it gets merged back to the main data set:

      Code:
      preserve
      gen count = !missing(var2)
      collapse (sum) count, by(Name var2)
      drop if var2 == .
      bysort Name: egen allcount = total(count)
      keep if count / allcount >= 0.8
      rename var2 var2rep
      drop count allcount
      save temp01, replace
      restore
      
      merge m:1 Name using temp01
      gen wanted = var2
      replace wanted = var2rep if var2 == .

      Comment


      • #4
        Thanks so much, John and Ken!

        Comment


        • #5
          The original order might still be needed.
          Code:
          gen long OriginalOrder = _n
          
          bys Name: egen CountNonMiss = count(var2)
          bys Name var2: gen PickValue = var2 if _N/CountNonMiss >=0.8
          bys Name (PickValue): gen Newvar2 = cond(var2 !=., var2, PickValue[1])
          
          sort OriginalOrder
          drop OriginalOrder CountNonMiss PickValue

          Comment


          • #6
            This is great. Thanks a lot, Romalpa!

            Comment

            Working...
            X