Counting observations across multiple variables

Vijay Kumar

Join Date: Jul 2016
Posts: 24

Counting observations across multiple variables

25 Apr 2017, 20:56

Dear Statalisters,

I have a classroom dataset of four variables- the first is the name of the child (c_name) and the next three (c_1, c_2, c_3) are the names of the children chosen by the child in the first column. I want to generate a variable that counts the number of times every child is chosen by the other children. For every c_name, I want to count through all the observations in c_1, c_2, c_3 and generate a count variable. Here is a sample of my data. Appreciate your help. Thank you.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str18 c_name str17(c_1 c_2) str18 c_3
"Bhawani"            "Aithal Vishruth"   "Gowda Dhaiwik .M"  "P. Pravith"        
"Sirishree"          "Sankarshan .AA"    "Hegde Shoorthi"    "R.Rishika"         
"Thrisha"            "Sirishree"         "Hegde Shoorthi"    "P. Pravith"        
"Sankarshan .AA"     "Jaya Mohan Aneesh" "Jonna Victoria. A" "Modak .M"          
"Jonna Victoria. A"  "Sirishree"         "Thrisha"           "S.Lavanya"         
"B.R.Brunda"         "Sirishree"         "Hegde Shoorthi"    "R.Rishika"         
"Alur Akshata"       "Hegde Shoorthi"    "Kashyap Amrutha"   "R. Saanvi"         
"Aithal Vishruth"    "Gowda Dhaiwik .M"  "P. Pravith"        "S. Hemanth Kumar"  
"B.Y. Sujan"         "Aithal Vishruth"   "J.S. Dhwani"       "Rao Gorakshith .D" 
"Babu Mukul .CS"     "Aithal Vishruth"   "Modak .M"          "P. Pravith"        
"D.Poorvaja"         "Kashyap Amrutha"   "R.Rishika"         "R. Saanvi"         
"Donthi Ahan .N"     "Aithal Vishruth"   "Gowda Dhaiwik .M"  "Rao Gorakshith .D" 
"Gowda Dhaiwik .M"   "Aithal Vishruth"   "P. Pravith"        "S. Hemanth Kumar"  
"Guru Raj Parnika"   "Hegde Shoorthi"    "Kashyap Amrutha"   "R. Saanvi"         
"Hegde Shoorthi"     "Sirishree"         "R.Rishika"         "S.G. Ahana"        
"J.S. Dhwani"        "Sankarshan .AA"    "Hegde Shoorthi"    "R.Rishika"         
"Jaya Mohan Aneesh"  "Thrisha"           "Sankarshan .AA"    "Jois  Prathyush .V"
"Jois  Prathyush .V" "Sirishree"         "Hegde Shoorthi"    "R.Rishika"         
""                   "Alur Akshata"      "D.Poorvaja"        "R. Saanvi"         
"Modak .M"           "Sankarshan .AA"    "Aithal Vishruth"   "P. Pravith"        
end

Tags: None

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#2

25 Apr 2017, 21:12

To get the counts like what you've asked for, you could try something like

Code:

isid c_name reshape long c_, i(c_name) j(discard) contract c_, freq(count)

I assume that the children's names are made up for the illustration dataset.
Comment
Vijay Kumar

Join Date: Jul 2016

Posts: 24
#3

30 Apr 2017, 05:15

Thank you so much Joseph! This works. However, is there a way to do this without changing the structure of the dataset?
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4423

30 Apr 2017, 17:57

If each chosen child's name is also present in the first variable, then you could merge the count dataset back into the original.

Code:

isid c_name
generate long row = _n
preserve
drop row
quietly reshape long c_, i(c_name) j(discard)
contract c_, freq(count)
rename c_ c_name
tempfile tmpfil0
quietly save `tmpfil0'

restore
merge 1:1 c_name using `tmpfil0', assert(match master) nogenerate noreport
sort row
drop row
order c_name c_?
quietly replace count = 0 if missing(count)

Comment

Vijay Kumar

Join Date: Jul 2016

Posts: 24
#5

04 May 2017, 23:03

Thank you so much Joseph! This works.
Comment

Announcement

Counting observations across multiple variables

Comment

Comment

Comment

Comment