Hello everyone,
I am designing a quality control framework for a longitudinal cohort study that uses survey data. Every year, twenty new datasets must be checked for potential mistakes in data collection. For each dataset, the "diagnosis" is a dataset of the form:
Where "id" is the participant id, "question" is the name of the variable that flagged an exception (the letters are an illustration, but could be any string), "wrong" is the wrong value, and "correct" is the correct value.
From this, I want to be able to output a corrected version of the original dataset. So I want to transform the diagnosis to the following:
The variables are the distinct values of "question", the observations are grouped by "id", and the values are the "correct" values. That way, the result can be merged with the original set using -merge- (I already implemented this last part) to implement the corrections easily
Question: Is there a way in Stata to obtain my second example from my first? For those familiar with Python data management, in pandas you would write
Thank you very much!
I am designing a quality control framework for a longitudinal cohort study that uses survey data. Every year, twenty new datasets must be checked for potential mistakes in data collection. For each dataset, the "diagnosis" is a dataset of the form:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id str64 question float(wrong correct) 1 "A" 1 0 1 "G" -3 -1 2 "A" 8 9 5 "E" 4 5 5 "G" 2 6 end | id question value correct | |---------------------------------| 1. | 1 A 1 0 | 2. | 1 G -3 -1 | 3. | 2 A 8 9 | 4. | 5 E 4 5 | 5. | 5 G 2 6 |
Where "id" is the participant id, "question" is the name of the variable that flagged an exception (the letters are an illustration, but could be any string), "wrong" is the wrong value, and "correct" is the correct value.
From this, I want to be able to output a corrected version of the original dataset. So I want to transform the diagnosis to the following:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id A G E) 1 0 -1 . 2 9 . . 5 . 6 5 end | id A G E | |-----------------| 1. | 1 0 -1 . | 2. | 2 9 . . | 3. | 5 . 6 5 |
Question: Is there a way in Stata to obtain my second example from my first? For those familiar with Python data management, in pandas you would write
Code:
pivoted_df = df.pivot(index='id', columns='question')['correct']
Comment