Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative frequencies and names of observations calculated across rows of different variables

    Hi Statalisters,

    Could someone help me with the below or direct me to any previous thread (which I could not identify) indicating a solution?

    I am trying to retrieve the name and frequency of N observations under 3 variables (presented in columns), where by frequencies are calculated across rows of the 3 variables. For example, the data below gives information on 3 tests conducted on 15 participants to identify a bacteria from different type of specimens to diagnose a particular disease. "SpecimenTest1" lists the specimens used to conduct the first test from the 15 participants. The results of first test could come as positive or negative. If the bacteria was not identified in first test for patients whose clinical symptoms indicted presence of disease/possibility of infection, the doctor advised a second test on same or other specimens ("SpecimenTest2") collected from such participants (e.g., participant Nos. 4, 8 and 11). If the second test also failed, a 3rd test with specimens ("SpecimenTest3") available was conducted to be extra sure (e.g., participant nos. 4 and 8) . From the data given below, I want to get the names and frequencies of total samples tested for these 15 participants across test 1, test 2 and test 3. In the given example, my final output table should look something like:.
    Type of Specimen Freq
    Blood 7
    Bone 4
    Gastric 1
    Lymph 2
    Pus 4
    Urine 2
    Total 20










    I got the frequencies (= 20) by running the below code. However I also want the names of the type of specimens (as given under Type of Specimen column in the above table) as I have to do further analysis stratified by this "Type of Specimen" variable.

    Code:
    gen SpecimenTest1_n=0
    replace SpecimenTest1_n=1 if SpecimenTest1!="NA"
    
    gen SpecimenTest2_n=0 
    replace SpecimenTest2_n=1 if SpecimenTest2!="NA" 
    
    gen SpecimenTest3_n=0 
    replace SpecimenTest3_n=1 if SpecimenTest3!="NA" 
    
    gen TypeofSpecimen_123n=SpecimenTest1_n+SpecimenTest2_n+SpecimenTest3_n
    sum TypeofSpecimen_123n
    display r(sum)
    Given below is the sample data set produced by -dataex-. I use Stata/SE 13 on Windows 10

    Thankyou

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long CaseID float Age str2 Sex str24(SpecimenTest1 SpecimenTest2 SpecimenTest3)
     1 10 "F" "Bone"    "NA"    "NA"  
     2  5 "M" "Pus"     "NA"    "NA"  
     3  3 "M" "Blood"   "NA"    "NA"  
     4 10 "F" "Blood"   "Bone"  "Bone"
     5  8 "M" "Blood"   "NA"    "NA"  
     6  8 "M" "Lymph"   "NA"    "NA"  
     7  9 "F" "Bone"    "NA"    "NA"  
     8 11 "F" "Blood"   "Lymph" "Pus" 
     9  9 "F" "Urine"   "NA"    "NA"  
    10  7 "F" "Pus"     "NA"    "NA"  
    11  2 "M" "Blood"   "Blood" "NA"  
    12  6 "M" "Gastric" "NA"    "NA"  
    13 14 "F" "Pus"     "NA"    "NA"  
    14  9 "M" "Blood"   "NA"    "NA"  
    15  5 "M" "Urine"   "NA"    "NA"  
    end

  • #2
    Thanks for the data example.

    You can do some useful things with your present data structure. On the whole I recommend a different structure.

    I would first replace your "NA" by string missing "" for cleaner tables. I will add nicer variable labels at the same time.

    Code:
    quietly forval j = 1/3  {
         replace SpecimenTest`j' = "" if SpecimenTest`j' == "NA"
         label var SpecimenTest`j' "Test `j'"
    }
    tabm from tab_chi (SSC) can be of some help for tabulation, but you must install the package first.

    Code:
    ssc inst tab_chi
    
     
    tabm SpecimenTest* , transpose
    
                          |             variable
                   values |    Test 1     Test 2     Test 3 |     Total
    ----------------------+---------------------------------+----------
                    Blood |         6          1          0 |         7
                     Bone |         2          1          1 |         4
                  Gastric |         1          0          0 |         1
                    Lymph |         1          1          0 |         2
                      Pus |         3          0          1 |         4
                    Urine |         2          0          0 |         2
    ----------------------+---------------------------------+----------
                    Total |        15          3          2 |        20
    But a different data structure is advisable.

    Code:
    reshape long SpecimenTest , i(CaseID) j(TestNo)
    drop if SpecimenTest == ""
    
    tab SpecimenTest TestNo
    
                          |              TestNo
             SpecimenTest |         1          2          3 |     Total
    ----------------------+---------------------------------+----------
                    Blood |         6          1          0 |         7
                     Bone |         2          1          1 |         4
                  Gastric |         1          0          0 |         1
                    Lymph |         1          1          0 |         2
                      Pus |         3          0          1 |         4
                    Urine |         2          0          0 |         2
    ----------------------+---------------------------------+----------
                    Total |        15          3          2 |        20
    
    tab SpecimenTest TestNo, rowsort
    
                          |              TestNo
             SpecimenTest |         1          2          3 |     Total
    ----------------------+---------------------------------+----------
                    Blood |         6          1          0 |         7
                     Bone |         2          1          1 |         4
                      Pus |         3          0          1 |         4
                    Lymph |         1          1          0 |         2
                    Urine |         2          0          0 |         2
                  Gastric |         1          0          0 |         1
    ----------------------+---------------------------------+----------
                    Total |        15          3          2 |        20
    
     tab SpecimenTest, sort
    
                SpecimenTest |      Freq.     Percent        Cum.
    -------------------------+-----------------------------------
                       Blood |          7       35.00       35.00
                        Bone |          4       20.00       55.00
                         Pus |          4       20.00       75.00
                       Lymph |          2       10.00       85.00
                       Urine |          2       10.00       95.00
                     Gastric |          1        5.00      100.00
    -------------------------+-----------------------------------
                       Total |         20      100.00
    See also https://en.wiktionary.org/wiki/bacterium

    Comment


    • #3
      Hi Nick, Thank you very much for both solutions, and also for the wiki link. The tabm from tab_chi (SSC) does get me the output I want but I will have to use the long format to use the SpecimenTest variable for further analysis involving other variables. Best regards
      Last edited by Thekke Purakkal; 12 Mar 2019, 12:18.

      Comment

      Working...
      X