Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count string variables if they have specific characteristics?

    Hi members,

    I'm currently trying to replicate the descriptive analytics of a dental study (A total of 4,406 children aged 1~19 years old were included, of whom 651 reported to have cavity and 3,755 were cavity-free; https://doi.org/10.1111/jphd.12345).

    I'm planning to create a binary variable 'cavity' to identify if this participant has cavities or not. There are 30 dental coronal caries variables named 'ohx*ctc' in the dataset, where * represents a number from 02 to 31. All these variables are coded with a single letter [D-Z]. If 'ohx*ctc' is coded as "D" or "S", it means that this tooth is free of caries. Here is the dictionary for this variable: https://wwwn.cdc.gov/Nchs/Nhanes/201...H.htm#OHX02CTC

    My codes are as follows but I found that the variable 'tooth_count' produce wrong number of cavities for individuals.
    Code:
    ** merging datasets **
    import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.XPT", clear
    save "DEMO_H.dta", replace
    
    import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/OHXDEN_H.XPT", clear
    save "OHXDEN_H.dta", replace
    
    
    ** Got merged data **
    
        use "DEMO_H.dta", clear
        merge 1:1 seqn using "OHXDEN_H.dta"
        drop _merge
    
    sort seqn
    
    save "D:\Hanes\2013-14\Hanes2013-14.dta", replace
    use "D:\Hanes\2013-14\Hanes2013-14.dta", clear
        
    ren (ridageyr riagendr dmdhredu ridreth1 dmdmartl indfmpir) (age sex education ethnicity marital PIR) // age, sex, education, ethnicity, marital status, family PIR
    
    keep if age<=19
    
    gen age_cat =.
    replace age_cat = 1 if age<=5
    replace age_cat = 2 if age>=6 & age<=11
    replace age_cat = 3 if age>=12 & age<=15
    replace age_cat = 4 if age>=16 & age<=19
    
    gen tooth_count = 0
    foreach v of varlist ohx*ctc {
        replace tooth_count = tooth_count + inlist(`v', "D", "S")
    }

  • #2
    If 'ohx*ctc' is coded as "D" or "S", it means that this tooth is free of caries.
    ...
    I found that the variable 'tooth_count' produce wrong number of cavities for individuals.
    From those two statements I conclude that your variable tooth_count counts the number that are free of caries - that have "D" or "S" coded - rather than the number of cavities.

    Comment


    • #3
      I ran your code (modifying the I/O commands to use tempfiles rather than permanent ones) . I spot-checked a few of the results by hand and found no errors. I then used a different method of calculating the tooth counts and found them to come out the same.

      I think perhaps the problem is simply confusion about the direction of things. "D" and "S" refer to non-carious teeth, according to the link you showed. So your variable is counting up the number of non-carious teeth, not the number with cavities. If you meant to count the number with cavities, just put ! infront of -inlist(`v', "D", "S")- and you will get that.

      Added: Crossed with #2.

      Comment

      Working...
      X