Cumulative frequencies and names of observations calculated across rows of different variables

Thekke Purakkal

Join Date: Dec 2015

Posts: 95
#1

Cumulative frequencies and names of observations calculated across rows of different variables

11 Mar 2019, 22:17

Hi Statalisters,

Could someone help me with the below or direct me to any previous thread (which I could not identify) indicating a solution?

I am trying to retrieve the name and frequency of N observations under 3 variables (presented in columns), where by frequencies are calculated across rows of the 3 variables. For example, the data below gives information on 3 tests conducted on 15 participants to identify a bacteria from different type of specimens to diagnose a particular disease. "SpecimenTest1" lists the specimens used to conduct the first test from the 15 participants. The results of first test could come as positive or negative. If the bacteria was not identified in first test for patients whose clinical symptoms indicted presence of disease/possibility of infection, the doctor advised a second test on same or other specimens ("SpecimenTest2") collected from such participants (e.g., participant Nos. 4, 8 and 11). If the second test also failed, a 3^rd test with specimens ("SpecimenTest3") available was conducted to be extra sure (e.g., participant nos. 4 and 8) . From the data given below, I want to get the names and frequencies of total samples tested for these 15 participants across test 1, test 2 and test 3. In the given example, my final output table should look something like:.

Type of Specimen Freq

Blood 7

Bone 4

Gastric 1

Lymph 2

Pus 4

Urine 2

Total 20

I got the frequencies (= 20) by running the below code. However I also want the names of the type of specimens (as given under Type of Specimen column in the above table) as I have to do further analysis stratified by this "Type of Specimen" variable.

Code:

gen SpecimenTest1_n=0 replace SpecimenTest1_n=1 if SpecimenTest1!="NA" gen SpecimenTest2_n=0 replace SpecimenTest2_n=1 if SpecimenTest2!="NA" gen SpecimenTest3_n=0 replace SpecimenTest3_n=1 if SpecimenTest3!="NA" gen TypeofSpecimen_123n=SpecimenTest1_n+SpecimenTest2_n+SpecimenTest3_n sum TypeofSpecimen_123n display r(sum)

Given below is the sample data set produced by -dataex-. I use Stata/SE 13 on Windows 10

Thankyou

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long CaseID float Age str2 Sex str24(SpecimenTest1 SpecimenTest2 SpecimenTest3) 1 10 "F" "Bone" "NA" "NA" 2 5 "M" "Pus" "NA" "NA" 3 3 "M" "Blood" "NA" "NA" 4 10 "F" "Blood" "Bone" "Bone" 5 8 "M" "Blood" "NA" "NA" 6 8 "M" "Lymph" "NA" "NA" 7 9 "F" "Bone" "NA" "NA" 8 11 "F" "Blood" "Lymph" "Pus" 9 9 "F" "Urine" "NA" "NA" 10 7 "F" "Pus" "NA" "NA" 11 2 "M" "Blood" "Blood" "NA" 12 6 "M" "Gastric" "NA" "NA" 13 14 "F" "Pus" "NA" "NA" 14 9 "M" "Blood" "NA" "NA" 15 5 "M" "Urine" "NA" "NA" end
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35792

12 Mar 2019, 01:26

Thanks for the data example.

You can do some useful things with your present data structure. On the whole I recommend a different structure.

I would first replace your "NA" by string missing "" for cleaner tables. I will add nicer variable labels at the same time.

Code:

quietly forval j = 1/3  {
     replace SpecimenTest`j' = "" if SpecimenTest`j' == "NA"
     label var SpecimenTest`j' "Test `j'"
}

tabm from tab_chi (SSC) can be of some help for tabulation, but you must install the package first.

Code:

ssc inst tab_chi

 
tabm SpecimenTest* , transpose

                      |             variable
               values |    Test 1     Test 2     Test 3 |     Total
----------------------+---------------------------------+----------
                Blood |         6          1          0 |         7
                 Bone |         2          1          1 |         4
              Gastric |         1          0          0 |         1
                Lymph |         1          1          0 |         2
                  Pus |         3          0          1 |         4
                Urine |         2          0          0 |         2
----------------------+---------------------------------+----------
                Total |        15          3          2 |        20

But a different data structure is advisable.

Code:

reshape long SpecimenTest , i(CaseID) j(TestNo)
drop if SpecimenTest == ""

tab SpecimenTest TestNo

                      |              TestNo
         SpecimenTest |         1          2          3 |     Total
----------------------+---------------------------------+----------
                Blood |         6          1          0 |         7
                 Bone |         2          1          1 |         4
              Gastric |         1          0          0 |         1
                Lymph |         1          1          0 |         2
                  Pus |         3          0          1 |         4
                Urine |         2          0          0 |         2
----------------------+---------------------------------+----------
                Total |        15          3          2 |        20

tab SpecimenTest TestNo, rowsort

                      |              TestNo
         SpecimenTest |         1          2          3 |     Total
----------------------+---------------------------------+----------
                Blood |         6          1          0 |         7
                 Bone |         2          1          1 |         4
                  Pus |         3          0          1 |         4
                Lymph |         1          1          0 |         2
                Urine |         2          0          0 |         2
              Gastric |         1          0          0 |         1
----------------------+---------------------------------+----------
                Total |        15          3          2 |        20

 tab SpecimenTest, sort

            SpecimenTest |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
                   Blood |          7       35.00       35.00
                    Bone |          4       20.00       55.00
                     Pus |          4       20.00       75.00
                   Lymph |          2       10.00       85.00
                   Urine |          2       10.00       95.00
                 Gastric |          1        5.00      100.00
-------------------------+-----------------------------------
                   Total |         20      100.00

See also https://en.wiktionary.org/wiki/bacterium

Comment

Thekke Purakkal

Join Date: Dec 2015

Posts: 95
#3

12 Mar 2019, 12:13

Hi Nick, Thank you very much for both solutions, and also for the wiki link. The tabm from tab_chi (SSC) does get me the output I want but I will have to use the long format to use the SpecimenTest variable for further analysis involving other variables. Best regards

Last edited by Thekke Purakkal; 12 Mar 2019, 12:18.
Comment

Announcement

Cumulative frequencies and names of observations calculated across rows of different variables

Comment

Comment