Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing values with values from different rows

    Dear all,

    I am a Stata 14 user and I was wondering if you could help with the following.

    I have a dataset containing survey information from teachers about their school. Some of them have taken the survey multiple times, and I need to remove duplicates from the dataset. I would like to keep the oldest response, ie the row corresponding to the first time they took the survey. However, in some cases I have information for a variable missing in the oldest response but not missing in the following ones.

    Example:
    id school_type years_teaching subjects_taught n. students
    0001 . 5 .
    0001 1 5 1,2,5 .
    0002 2 . 6,7 24
    0002 . 2 6,7 .

    As you can see, school_type and subjects_taught, for example, are missing for id == 0001 in the first row but not in the second. So I wrote the following code to replace the information in row 1 with that in row 2:

    Code:
    //Deal with duplicates
    duplicates tag email survey_treated finished, g(dup)
    
    ta dup
    
    sort email teacher_startdate
    
    egen email_id = group(email) if dup>0 & dup<=2 // to identify with the same number those who have the same email
    
    fre email_id // shows it takes values from 1 to 172
    
    forval i = 1(1)172 {
    
    ds, has(type string) // I seprated string and numeric variables because the code was producing a "type mismatch" error
    
    unab str_vars: `r(varlist)'
    
    unab vars_to_exclude1: consent teacher_startdate teacher_enddate ipaddress progress durationinseconds finished teacher_recordeddate responseid locationlatitude locationlongitude randomid
    
        foreach var in `:list str_vars - vars_to_exclude1' {
    
            replace `var' = "`=`var'[_n + 2]'" if `var' == "" & _n == 1 ///
            & `var'[_n + 1] == "" & `var'[_n + 2] != "" & dup == 2 & email_id == `i' // if the same person took the survey 3 times, replace the first row with information from the third if the second is empty 
    
            replace `var' = "`=`var'[_n + 1]'" if `var' == "" & _n == 1 ///
            & `var'[_n + 1] != "" & `var'[_n + 2] == "" & dup == 2 & _n == `i' // if the same person took the survey 3 times, replace the first row with information from the second if the third is empty 
    
            replace `var' = "`=`var'[_n + 1]'" if `var' == "" & _n == 1 & `var'[_n + 1] != "" & dup == 1 & email_id == `i' // if the same person took the survey 2 times, replace the first row with information from the second row
    }
    
    ds, has(type numeric)
    
    unab num_vars: `r(varlist)'
    
    unab vars_to_exclude2: teacher_startdate teacher_enddate ipaddress progress durationinseconds finished teacher_recordeddate responseid locationlatitude locationlongitude randomid
        
        foreach var in `:list num_vars - vars_to_exclude2' {
    
            replace `var' = `=`var'[_n + 2]' if `var' == . & _n == 1 ///
            & `var'[_n + 1] == . & `var'[_n + 2] != . & dup == 2 & email_id == `i' // if the same person took the survey 3 times, replace the first row with information from the third if the second is empty
    
            replace `var' = `=`var'[_n + 1]' if `var' == . & _n == 1 ///
            & `var'[_n + 1] != . & `var'[_n + 2] == . & dup == 2 & email_id == `i' // if the same person took the survey 3 times, replace the first row with information from the second if the third is empty 
             
            replace `var' = `=`var'[_n + 1]' if `var' == . & _n == 1 & `var'[_n + 1] != . & dup == 1 & email_id == `i' // if the same person took the survey 2 times, replace the first row with information from the second row
    }
    
    }
    The code doesn't seem to be doing anything: when I look at the dataset before and after, the information hasn't been replaced. Could you please help?
    Thanks in advance!

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 id byte(school_type years_teaching) str5 subjects_taught byte nstudents
    "0001" . 5 ""       .
    "0001" 1 5 "1,2,5"  .
    "0002" 2 . "6,7"   24
    "0002" . 2 "6,7"    .
    end
    
    collapse (firstnm) school_type years_teaching subjects_taught nstudents, by(id)
    See

    Code:
    help collapse

    Res.:

    Code:
    . l
    
         +--------------------------------------------------+
         |   id   school~e   years_~g   subjec~t   nstude~s |
         |--------------------------------------------------|
      1. | 0001          1          5      1,2,5          . |
      2. | 0002          2          2        6,7         24 |
         +--------------------------------------------------+
    
    .

    Comment


    • #3
      Thank you, it worked perfectly! Didn't know about this command!

      Comment

      Working...
      X