Using a Matrix to Identify Anaemia

Ryan Lucas

Join Date: Jun 2018
Posts: 14

Using a Matrix to Identify Anaemia

16 Dec 2022, 19:39

This started as a question, but I came up with a solution that I couldn't find documented elsewhere. The method uses a matrix. I'm very new to matrices and had no confidence that the method would work - but it did! I'm sure this method is far from novel, but it was fun to try it out and discover that it works! I share it here for discussion - is this a good solution? Are there better solutions?

Here is the problem:
I have a large dataset of blood values for children. I want to create a binary variable anaemia that is 0 if the child is not anaemic and 1 if the child is anaemic. The normal values for haemoglobin vary by age and sex. In the past I have used published mean (SD) values to generate Z-scores, and identified anaemia as children with a haemoglobin Z-score less than -2. There are some issues with the parametric assumptions underlying that approach. Here I will be using lower bound cut-offs as published in: Staffa et al, Pediatric hematology normal ranges derived from pediatric primary care patients. Am J Hematol [Internet]. 2020 Oct [cited 2022 Dec 17];95(10).

The relevant variables in my dataset for achieving this are:

haemoglobin (path_hb)
sex (sex)
date of birth (dob)
date of blood test (path_date)

From these I generated a categorical variable for age at blood test (path_age), with each category corresponding to the age categories used in the reference.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int path_hb byte sex int(dob path_date) float path_age
128 1 20853 21570  7
112 1 19185 21625  9
117 2 21397 22484  8
102 2 21642 22486  8
 92 1 22337 22465  6
123 1 20591 22451  8
108 2 22363 22468  6
126 1 19719 22486  9
 81 1 21210 22540  8
 99 2 21917 22563  7
111 2 20887 22564  8
133 1 20385 21624  8
 99 1 22084 22575  7
 76 1 22257 22584  7
 99 2 21312 22580  8
106 1 22279 22635  7
105 1 22325 22636  7
 91 2 22481 22644  6
120 1 19505 21627  8
103 2 20331 21629  8
130 2 19890 21631  8
114 1 20886 21633  8
119 1 20965 21644  7
 89 1 21510 21646  6
105 2 20648 21650  8
101 2 19456 21556  8
108 1 20751 21678  8
117 1 21525 21680  6
100 2 19458 21693  9
 89 1 21452 21715  7
139 1 18245 21720  9
113 1 21175 21726  7
121 1 19665 21735  8
119 1 19253 21755  9
 80 1 21279 21749  7
 93 1 20515 21756  8
113 2 21448 21557  6
126 1 20900 21760  8
132 1 20744 21764  8
 91 1 20860 21788  8
end
format %dM_d,_CY dob
format %dM_d,_CY path_date
label values sex sex_lbl
label def sex_lbl 1 "Male", modify
label def sex_lbl 2 "Female", modify
label values path_age path_age_lbl
label def path_age_lbl 5 "31d to 60d", modify
label def path_age_lbl 6 "61d to 180d", modify
label def path_age_lbl 7 "6m to <2y", modify
label def path_age_lbl 8 "2y to <6y", modify
label def path_age_lbl 9 "6y to <12y", modify
label def path_age_lbl 10 "12y to <18y", modify

The brief is to create a binary variable anaemia that is 0 if the haemoglobin is ≥ cut-off for age and sex, and 1 if the haemoglobin is < cut-off for age and sex. My first solution was simply many iterations of:

Code:

replace anaemia = 1 if sex == X & path_age = Y & path_hb < Z

I thought that looked messy, so I tried another solution. I created a matrix for the reference lower cut-offs as follows:

Code:

matrix input hbRef = ( 128, 128 \ /// 1: 1-3 days
133, 130 \ /// 2: 4-7 days
110, 120 \ /// 3: 8-14 days
98, 102 \ /// 4: 15-30 days
90, 89 \ /// 5: 31-60 days
94, 96 \ /// 6: 61-180 days
102, 103 \ /// 7: 181 days to <2 years
107, 107 \ /// 8: 2 years to <6 years
113, 112 \ /// 9: 6 years to <12 years
124, 114 ) // 10: 12 years to <18 years)
matrix colnames hbRef = "Males" "Females"
matrix rownames hbRef = "1d to 3d" "4d to 7d" "8d to 14d" "15d to 30d" "31d to 60d" "61d to 180d" "6m to <2y" "2y to <6y" "6y to <12y" "12y to <18y"

Which gives the following:

Code:

. mat list hbRef

hbRef[10,2]
               Males  Females
   1d to 3d      128      128
   4d to 7d      133      130
  8d to 14d      110      120
 15d to 30d       98      102
 31d to 60d       90       89
61d to 180d       94       96
  6m to <2y      102      103
  2y to <6y      107      107
 6y to <12y      113      112
12y to <18y      124      114

Note that the columns correspond to my sex codes and the rows correspond to my path_age codes. Now I can create my anaemia variable as follows:

Code:

gen anaemia = .
replace anaemia = 0 if !missing(path_hb) & !missing(path_age)
replace anaemia = 1 if path_hb < hbRef[path_age, sex]
label variable anaemia "Anaemic for age"
label values anaemia yesno_lbl

How do these methods compare? Six in one, half a dozen in the other?

Tags: anaemia, haemoglobin, matrix

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

16 Dec 2022, 19:47

I'm so sorry, I don't understand the problem or the solution at all. Could you please rephrase it? What's the problem in simple terms, and how did your approach overcome it?
Comment

Ryan Lucas

Join Date: Jun 2018
Posts: 14

16 Dec 2022, 20:33

Hi Jared,
I apologise. The 'problem' was probably just that I'm still a novice. I'll try to be clearer.

The referenced article gives normative ranges for haemoglobin for children, categorised by age and sex:

Click image for larger version

Name: Table1_Staffa_2020.jpg
Views: 1
Size: 212.9 KB
ID: 1693909

I have haemoglobin values for lots of kids at different ages, and want to categorise them as anaemic or not. I am using the first column from the reference, taking just the lower-bound. My first approach worked, but I thought it was messy. It was as follows:

Code:

gen anaemia = .
replace anaemia = 0 if !missing(path_hb) & !missing(path_age)    // I have some missing data in my dataset.
label variable anaemia "Anaemic for age"
label values anaemia yesno_lbl    // This is my yes/no label, 0 = no, 1 = yes

//    Make anaemia = 1 if the child was male, between 1 and 3 days old at the time of the blood test,
//    and the haemoglobin was less than the cut-off in the table (I use g/L rather than g/dL).
replace anaemia = 1 if sex == 1 & path_age == 1 & path_hb < 128

//    Make anaemia = 1 if the child was male, between 1 and 3 days old at the time of the blood test,
//    and the haemoglobin was less than the cut-off in the table (I use g/L rather than g/dL).
replace anaemia = 1 if sex == 2 & path_age == 1 & path_hb < 128
//    Etcetera, for the rest of the table.

This is a terrible solution: it is unsophisticated and it doesn't scale. My new solution was to enter the reference cut-off as a matrix and use it as a look-up table.

Code:

. mat list hbRef

hbRef[10,2]
               Males  Females
   1d to 3d      128      128
   4d to 7d      133      130
  8d to 14d      110      120
 15d to 30d       98      102
 31d to 60d       90       89
61d to 180d       94       96
  6m to <2y      102      103
  2y to <6y      107      107
 6y to <12y      113      112
12y to <18y      124      114

Since I coded the my age and sex variables to match the rows and columns of the matrix, the following line works:

Code:

replace anaemia = 1 if path_hb < hbRef[path_age, sex]

I hope that was clearer. This might be an obvious solution, I'm sure it isn't the best solution. I'm keen to know how others would have achieved the same thing.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

16 Dec 2022, 23:03

I think this is a perfectly reasonable approach, at least for one-time or occasional use.

If you need to do this on a recurring basis, the drawback is that you need to code the creation of that matrix over and over again. So a more convenient approach would be to save the same information that is in the matrix in a data set in long layout. Then you could -merge- a patient data set on the age (suitably coded) and sex whenever you needed to do anemia classification.

And if this were something that you were going to do on a very frequent basis, it would make sense to write a general program that would take the names of the age and sex variables in any data set as arguments, the name of a new variable to hold the anemia classification results as an argument, code the age and sex compatibly with your data set that has the hemoglobin cutoffs, and do the -merge-. That would require a small investment of time to setup, but then any time you needed to classify anemia it would just be a single command to do it.
1 like
Comment

Announcement

Using a Matrix to Identify Anaemia

Comment

Comment

Comment