Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Listing inconsistent entries

    Hello, I have data across 5 surveys, an extract of which is below:

    ID GenderS1 GenderS2 GenderS3 GenderS4 GenderS5
    1 Male Male Male Male Male
    2 Male Female Male Male Male
    3 Female Female Female Female Male
    4 Female Female Female Female Female
    5 Male Male Male Male Male
    6 Male Male Male Male Male

    When checking the data for consistency there is clearly an issue with person 2 and person 3 where the coding was perhaps erroneously entered.

    Is there a way of listing which ID's have inconsistent data on gender? Especially for thousands of lines of data.

    Thank you.

  • #2
    The easiest way is to reshape long the dataset.

    Code:
    reshape long Gender, i(ID) j(survey)
    bys ID (Gender): gen inconsistent= Gender[1]!=Gender[_N]
    *BEST TO MAINTAIN A LONG LAYOUT. IF YOU MUST GO BACK TO WIDE:
    reshape wide Gender, i(ID) j(survey)

    Comment


    • #3
      Gregory:
      another possible approach using -egen-:
      Code:
      egen wanted=group( GenderS*)
      label define wanted  1 "Female_correct" 4 "Male_correct" 2 "Double-check" 3 "Double-check"
      label val wanted wanted
      . list
      
           +----------------------------------------------------------------------------+
           | ID   GenderS1   GenderS2   GenderS3   GenderS4   GenderS5           wanted |
           |----------------------------------------------------------------------------|
        1. |  1       Male       Male       Male       Male       Male     Male_correct |
        2. |  2       Male     Female       Male       Male       Male     Double-check |
        3. |  3     Female     Female     Female     Female       Male     Double-check |
        4. |  4     Female     Female     Female     Female     Female   Female_correct |
        5. |  5       Male       Male       Male       Male       Male     Male_correct |
           |----------------------------------------------------------------------------|
        6. |  6       Male       Male       Male       Male       Male     Male_correct |
           +----------------------------------------------------------------------------+
      ETA: what above assumes that no ID changed her/his gender during the span of time the survey stretches over.
      Last edited by Carlo Lazzaro; 13 May 2022, 11:36.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        https://www.stata-journal.com/articl...article=pr0046 documents in its Section 7 egen functions rownvals() and rowsvals() for counting the number of distinct numeric and string values across a variable list. The code is under egenmore from SSC.

        Here is the string version in action. As usual, the code thinks very literally and would regard "Male" "male" "Male " as three distinct values.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte ID str6(GenderS1 GenderS2 GenderS3 GenderS4 GenderS5)
        1 "Male"   "Male"   "Male"   "Male"   "Male"  
        2 "Male"   "Female" "Male"   "Male"   "Male"  
        3 "Female" "Female" "Female" "Female" "Male"  
        4 "Female" "Female" "Female" "Female" "Female"
        5 "Male"   "Male"   "Male"   "Male"   "Male"  
        6 "Male"   "Male"   "Male"   "Male"   "Male"  
        end
        
        egen nsvals = rowsvals(Gender??)
        
        list , sep(0)
        
             +--------------------------------------------------------------------+
             | ID   GenderS1   GenderS2   GenderS3   GenderS4   GenderS5   nsvals |
             |--------------------------------------------------------------------|
          1. |  1       Male       Male       Male       Male       Male        1 |
          2. |  2       Male     Female       Male       Male       Male        2 |
          3. |  3     Female     Female     Female     Female       Male        2 |
          4. |  4     Female     Female     Female     Female     Female        1 |
          5. |  5       Male       Male       Male       Male       Male        1 |
          6. |  6       Male       Male       Male       Male       Male        1 |
             +--------------------------------------------------------------------+
        I agree with Andrew Musau that you might well be better off after a reshape long, but this is for your present data layout.

        Comment


        • #5
          Thank you all for the assistance.

          Comment

          Working...
          X