How to count string variables if they have specific characteristics?

Geralt Ji

Join Date: May 2020

Posts: 27
#1

How to count string variables if they have specific characteristics?

09 May 2022, 09:16

Hi members,

I'm currently trying to replicate the descriptive analytics of a dental study (A total of 4,406 children aged 1~19 years old were included, of whom 651 reported to have cavity and 3,755 were cavity-free; https://doi.org/10.1111/jphd.12345).

I'm planning to create a binary variable 'cavity' to identify if this participant has cavities or not. There are 30 dental coronal caries variables named 'ohx*ctc' in the dataset, where * represents a number from 02 to 31. All these variables are coded with a single letter [D-Z]. If 'ohx*ctc' is coded as "D" or "S", it means that this tooth is free of caries. Here is the dictionary for this variable: https://wwwn.cdc.gov/Nchs/Nhanes/201...H.htm#OHX02CTC

My codes are as follows but I found that the variable 'tooth_count' produce wrong number of cavities for individuals.

Code:

** merging datasets ** import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.XPT", clear save "DEMO_H.dta", replace import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/OHXDEN_H.XPT", clear save "OHXDEN_H.dta", replace ** Got merged data ** use "DEMO_H.dta", clear merge 1:1 seqn using "OHXDEN_H.dta" drop _merge sort seqn save "D:\Hanes\2013-14\Hanes2013-14.dta", replace use "D:\Hanes\2013-14\Hanes2013-14.dta", clear ren (ridageyr riagendr dmdhredu ridreth1 dmdmartl indfmpir) (age sex education ethnicity marital PIR) // age, sex, education, ethnicity, marital status, family PIR keep if age<=19 gen age_cat =. replace age_cat = 1 if age<=5 replace age_cat = 2 if age>=6 & age<=11 replace age_cat = 3 if age>=12 & age<=15 replace age_cat = 4 if age>=16 & age<=19 gen tooth_count = 0 foreach v of varlist ohx*ctc { replace tooth_count = tooth_count + inlist(`v', "D", "S") }
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

09 May 2022, 10:20

If 'ohx*ctc' is coded as "D" or "S", it means that this tooth is free of caries.
...
I found that the variable 'tooth_count' produce wrong number of cavities for individuals.

From those two statements I conclude that your variable tooth_count counts the number that are free of caries - that have "D" or "S" coded - rather than the number of cavities.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#3

09 May 2022, 10:23

I ran your code (modifying the I/O commands to use tempfiles rather than permanent ones) . I spot-checked a few of the results by hand and found no errors. I then used a different method of calculating the tooth counts and found them to come out the same.

I think perhaps the problem is simply confusion about the direction of things. "D" and "S" refer to non-carious teeth, according to the link you showed. So your variable is counting up the number of non-carious teeth, not the number with cavities. If you meant to count the number with cavities, just put ! infront of -inlist(`v', "D", "S")- and you will get that.

Added: Crossed with #2.
Comment

Announcement

How to count string variables if they have specific characteristics?

Comment

Comment