Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging and analysisng double-coded data

    Dear all,

    I am conducting a meta-regression analysis. With my team, we have doouble-coded many studies. This leaves us with two datasets of many variables with many observations, containing the same studies.

    Every study has a unique ID. The goal is to compare two independent codes/ datasets (created by two coders) for every study. If they conincide, then I would kick out the duplicate. If they differ, I am interested in which variable do they differ so I can go back to the original study and find out the actual value for the respective variable.

    I already appended the datasets/codes from both coders into one dataset. Hence, every ID is now represented two times in the dataset.
    What would be the most efficient way to find out, which observations differ and in which specific variables (since I do nto want to go one by one variable for every observation)?

    Many thanks.





  • #2
    Also imporant to say that the unit of observation is a study.

    Also I can put the problem in a different way: I want to find the variable that differentiates two observatons that are supposed to be duplicates.
    Last edited by Barbora Sedova; 02 Apr 2019, 05:53.

    Comment


    • #3
      I'm presuming that your observations have a pairid variable, indicating which observations are to be compared, and a "study" variable, indicating which study each observation represents. In that case, you could do this, which records the differences in two ways:

      Code:
      sort pairid study
      gen str diffvars = ""  // to hold variable names where a difference exists
      foreach v of varlist var1 var2... {
         by pairid: gen diff`v' = (`v'[1] != `v'[2])  // flag problem variables
         replace diffvars = diffvars + " `v'" if (diff`v' == 1)
      }
      gen anydiff = (diffvars != "")
      // Examine difference variables for observations in which at least one variable differs within pair
      browse pairid diff* if anydiff  // -list- would produce cumbersome output
      // Or, if you prefer a list of variable names.
      list pairid diffvars if anydiff

      Comment


      • #4
        Mike posted while I was working on the following examples of two approaches to your problem; I'm not sure how they compare with his.
        Code:
        // read in pretend data
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(study coder x1 x2 x3)
        101 1 1 2 3
        101 2 1 2 3
        102 1 4 5 6
        102 2 5 4 6
        103 1 7 8 9
        103 2 7 8 8
        end
        tempfile studies
        save `studies'
        
        // ====================================
        
        // first aproach
        use `studies', clear
        
        // confirm two observations per study
        sort study coder
        by study: assert _N==2
        
        // compare each pair of observations
        foreach var of varlist x1-x3 {
            by study: replace `var' = `var'[1]!=`var'[2]
            by study: replace coder = 0
            by study: drop if _n==2
            }
        
        // append the original observations
        append using `studies'
        sort study coder
        list, noobs sepby(study)
        
        // ====================================
        
        // second aproach
        use `studies', clear
        
        // confirm two observations per study
        sort study coder
        by study: assert _N==2
        
        // reshape to one observation per study/coder/variable
        rename (x1-x3) (v_=)
        reshape long v_, i(study coder) j(varname) string
        rename v_ value
        order study varname coder value
        sort study varname coder
        
        // drop identical copies
        duplicates tag study varname value, generate(copies)
        drop if copies!=0
        drop copies
        
        // reshape to one observation per study/variable
        reshape wide value, i(study varname) j(coder)
        list, noobs sepby(study)
        Code:
        . list, noobs sepby(study)
        
          +------------------------------+
          | study   coder   x1   x2   x3 |
          |------------------------------|
          |   101       0    0    1    1 |
          |   101       1    1    2    3 |
          |   101       2    1    2    3 |
          |------------------------------|
          |   102       0    1    1    1 |
          |   102       1    4    5    6 |
          |   102       2    5    4    6 |
          |------------------------------|
          |   103       0    0    1    1 |
          |   103       1    7    8    9 |
          |   103       2    7    8    8 |
          +------------------------------+
        Code:
        . list, noobs sepby(study)
        
          +-----------------------------------+
          | study   varname   value1   value2 |
          |-----------------------------------|
          |   102        x1        4        5 |
          |   102        x2        5        4 |
          |-----------------------------------|
          |   103        x3        9        8 |
          +-----------------------------------+

        Comment


        • #5
          Thanks so much Will. I was trying to use the first approach but cannot really understand what the listed output says...
          Last edited by Barbora Sedova; 04 Apr 2019, 05:55.

          Comment

          Working...
          X