Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting observations across multiple variables

    Dear Statalisters,

    I have a classroom dataset of four variables- the first is the name of the child (c_name) and the next three (c_1, c_2, c_3) are the names of the children chosen by the child in the first column. I want to generate a variable that counts the number of times every child is chosen by the other children. For every c_name, I want to count through all the observations in c_1, c_2, c_3 and generate a count variable. Here is a sample of my data. Appreciate your help. Thank you.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 c_name str17(c_1 c_2) str18 c_3
    "Bhawani"            "Aithal Vishruth"   "Gowda Dhaiwik .M"  "P. Pravith"        
    "Sirishree"          "Sankarshan .AA"    "Hegde Shoorthi"    "R.Rishika"         
    "Thrisha"            "Sirishree"         "Hegde Shoorthi"    "P. Pravith"        
    "Sankarshan .AA"     "Jaya Mohan Aneesh" "Jonna Victoria. A" "Modak .M"          
    "Jonna Victoria. A"  "Sirishree"         "Thrisha"           "S.Lavanya"         
    "B.R.Brunda"         "Sirishree"         "Hegde Shoorthi"    "R.Rishika"         
    "Alur Akshata"       "Hegde Shoorthi"    "Kashyap Amrutha"   "R. Saanvi"         
    "Aithal Vishruth"    "Gowda Dhaiwik .M"  "P. Pravith"        "S. Hemanth Kumar"  
    "B.Y. Sujan"         "Aithal Vishruth"   "J.S. Dhwani"       "Rao Gorakshith .D" 
    "Babu Mukul .CS"     "Aithal Vishruth"   "Modak .M"          "P. Pravith"        
    "D.Poorvaja"         "Kashyap Amrutha"   "R.Rishika"         "R. Saanvi"         
    "Donthi Ahan .N"     "Aithal Vishruth"   "Gowda Dhaiwik .M"  "Rao Gorakshith .D" 
    "Gowda Dhaiwik .M"   "Aithal Vishruth"   "P. Pravith"        "S. Hemanth Kumar"  
    "Guru Raj Parnika"   "Hegde Shoorthi"    "Kashyap Amrutha"   "R. Saanvi"         
    "Hegde Shoorthi"     "Sirishree"         "R.Rishika"         "S.G. Ahana"        
    "J.S. Dhwani"        "Sankarshan .AA"    "Hegde Shoorthi"    "R.Rishika"         
    "Jaya Mohan Aneesh"  "Thrisha"           "Sankarshan .AA"    "Jois  Prathyush .V"
    "Jois  Prathyush .V" "Sirishree"         "Hegde Shoorthi"    "R.Rishika"         
    ""                   "Alur Akshata"      "D.Poorvaja"        "R. Saanvi"         
    "Modak .M"           "Sankarshan .AA"    "Aithal Vishruth"   "P. Pravith"        
    end

  • #2
    To get the counts like what you've asked for, you could try something like
    Code:
    isid c_name
    reshape long c_, i(c_name) j(discard)
    contract c_, freq(count)
    I assume that the children's names are made up for the illustration dataset.

    Comment


    • #3
      Thank you so much Joseph! This works. However, is there a way to do this without changing the structure of the dataset?

      Comment


      • #4
        If each chosen child's name is also present in the first variable, then you could merge the count dataset back into the original.
        Code:
        isid c_name
        generate long row = _n
        preserve
        drop row
        quietly reshape long c_, i(c_name) j(discard)
        contract c_, freq(count)
        rename c_ c_name
        tempfile tmpfil0
        quietly save `tmpfil0'
        
        restore
        merge 1:1 c_name using `tmpfil0', assert(match master) nogenerate noreport
        sort row
        drop row
        order c_name c_?
        quietly replace count = 0 if missing(count)

        Comment


        • #5
          Thank you so much Joseph! This works.

          Comment

          Working...
          X