Merging and analysisng double-coded data

Barbora Sedova

Join Date: Apr 2017

Posts: 63
#1

Merging and analysisng double-coded data

02 Apr 2019, 04:02

Dear all,

I am conducting a meta-regression analysis. With my team, we have doouble-coded many studies. This leaves us with two datasets of many variables with many observations, containing the same studies.

Every study has a unique ID. The goal is to compare two independent codes/ datasets (created by two coders) for every study. If they conincide, then I would kick out the duplicate. If they differ, I am interested in which variable do they differ so I can go back to the original study and find out the actual value for the respective variable.

I already appended the datasets/codes from both coders into one dataset. Hence, every ID is now represented two times in the dataset.
What would be the most efficient way to find out, which observations differ and in which specific variables (since I do nto want to go one by one variable for every observation)?

Many thanks.
Tags: None
Barbora Sedova

Join Date: Apr 2017

Posts: 63
#2

02 Apr 2019, 05:34

Also imporant to say that the unit of observation is a study.

Also I can put the problem in a different way: I want to find the variable that differentiates two observatons that are supposed to be duplicates.

Last edited by Barbora Sedova; 02 Apr 2019, 05:53.
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2421

02 Apr 2019, 09:12

I'm presuming that your observations have a pairid variable, indicating which observations are to be compared, and a "study" variable, indicating which study each observation represents. In that case, you could do this, which records the differences in two ways:

Code:

sort pairid study
gen str diffvars = ""  // to hold variable names where a difference exists
foreach v of varlist var1 var2... {
   by pairid: gen diff`v' = (`v'[1] != `v'[2])  // flag problem variables
   replace diffvars = diffvars + " `v'" if (diff`v' == 1)
}
gen anydiff = (diffvars != "")
// Examine difference variables for observations in which at least one variable differs within pair
browse pairid diff* if anydiff  // -list- would produce cumbersome output
// Or, if you prefer a list of variable names.
list pairid diffvars if anydiff

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Apr 2019, 09:24

Mike posted while I was working on the following examples of two approaches to your problem; I'm not sure how they compare with his.

Code:

// read in pretend data
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(study coder x1 x2 x3)
101 1 1 2 3
101 2 1 2 3
102 1 4 5 6
102 2 5 4 6
103 1 7 8 9
103 2 7 8 8
end
tempfile studies
save `studies'

// ====================================

// first aproach
use `studies', clear

// confirm two observations per study
sort study coder
by study: assert _N==2

// compare each pair of observations
foreach var of varlist x1-x3 {
    by study: replace `var' = `var'[1]!=`var'[2]
    by study: replace coder = 0
    by study: drop if _n==2
    }

// append the original observations
append using `studies'
sort study coder
list, noobs sepby(study)

// ====================================

// second aproach
use `studies', clear

// confirm two observations per study
sort study coder
by study: assert _N==2

// reshape to one observation per study/coder/variable
rename (x1-x3) (v_=)
reshape long v_, i(study coder) j(varname) string
rename v_ value
order study varname coder value
sort study varname coder

// drop identical copies
duplicates tag study varname value, generate(copies)
drop if copies!=0
drop copies

// reshape to one observation per study/variable
reshape wide value, i(study varname) j(coder)
list, noobs sepby(study)

Code:

. list, noobs sepby(study)

  +------------------------------+
  | study   coder   x1   x2   x3 |
  |------------------------------|
  |   101       0    0    1    1 |
  |   101       1    1    2    3 |
  |   101       2    1    2    3 |
  |------------------------------|
  |   102       0    1    1    1 |
  |   102       1    4    5    6 |
  |   102       2    5    4    6 |
  |------------------------------|
  |   103       0    0    1    1 |
  |   103       1    7    8    9 |
  |   103       2    7    8    8 |
  +------------------------------+

Code:

. list, noobs sepby(study)

  +-----------------------------------+
  | study   varname   value1   value2 |
  |-----------------------------------|
  |   102        x1        4        5 |
  |   102        x2        5        4 |
  |-----------------------------------|
  |   103        x3        9        8 |
  +-----------------------------------+

Comment

Barbora Sedova

Join Date: Apr 2017

Posts: 63
#5

04 Apr 2019, 05:50

Thanks so much Will. I was trying to use the first approach but cannot really understand what the listed output says...

Last edited by Barbora Sedova; 04 Apr 2019, 05:55.
Comment

Announcement

Merging and analysisng double-coded data

Comment

Comment

Comment

Comment