Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mean, median etc. of frequency of string variable

    Hello,


    I am new to stata and could use some help, please. My dataset looks like this:
    Individual Observation Category
    1 1 A
    1 1 B
    1 1 A
    1 2 D
    1 2 F
    2 1 F
    2 1 F
    2 2 A
    2 2 C
    I would like to have summary statistics of the string variable "Category". So, how many times a specific letter comes up for individual 1, what is the mean, median for the frequency of each letter across observations.
    I am not sure if there are commands for that or if I have to collapse the dataset?! I tried collapsing but it seems more difficult with string variables.

    Thank you so much!






  • #2
    What is the role of the variable Observation here? For example, in the row of your table that has 1 2 D, does this mean that we should count this as 2 appearances of D? If that's the case, then I think you want:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(individual observation) str1 category
    1 1 "A"
    1 1 "B"
    1 1 "A"
    1 2 "D"
    1 2 "F"
    2 1 "F"
    2 1 "F"
    2 2 "A"
    2 2 "C"
    end
    
    collapse (sum) observation, by(individual category)
    
    by category, sort: summ observation, detail
    In the future, please use the -dataex- command to post example Stata data, as I have done here. While your table was simple enough to import into Stata, and it contained all the necessary information, frequently this type of display is difficult to work with or leaves out important details. If you are running Stata version 15.1, -dataex- is part of the official installation. If not, you can get it by running -ssc install dataex-. Either way, run -help dataex- to read the simple instructions for using it. Going forward, use -dataex- to show example data every time you request help with code.

    Added: If you want a more compact presentation of the results, instead of the -by category, sort: summ...- command, try -tabstat observation, statistics(mean p50) by(category)-

    Comment


    • #3
      Hello Clyde,
      thank you so much for your response! And I apologize for not inserting the data properly :-/
      The role of observation: There are several observations per individual, so essentially a repeated measure design. During these observations, they get several Ratings (=Categories). So, observation refers more to the study design and not at anything that was actually measured. I guess, if I want to summarize the frequency of the categories at individual level, I could just ignore that here...how would I do that? I tried to modify your command but whatever I got does not make sense.
      Thank you so much again!

      Comment


      • #4
        So, if I understand you, the variable observation really plays no role here. Each row in your table counts once and only once. In that case:

        Code:
        collapse (count) freq = observation, by(individual category)
        by category: summ freq, detail

        Comment


        • #5
          Thank you so much again and please don't be too annoyed with me. I did what you suggested and I am a huge step closer...But I think what I get now is the mean number of "A" across all Individuals. What I need is the mean number of "A" across all individuals per observation. I should have been clearer.
          So, let's say individual 1 and 2 have 10 observations each and for those, both have 10 times the rating "A" (in total per individual), then right now my mean is 10. But the mean number of "A" per observation per individual would be 1. Does that make sense? Again. I tried to adapt your command but I am just not very intuitive about it ;(
          Any help is very much appreciated!!

          Comment


          • #6
            I don't understand at all what you are looking for. I suspect if you could explain it clearly you could probably find the code yourself. Part of the confusion is that the word observation here has two different meanings and I can't tell which one you are talking about. One meaning is that you have a variable in your data set called observation (though earlier in the thread you seemed to imply that it was irrelevant). And then there is the standard meaning of an observation in the Stata data set. So there is, at least, that confusion.

            Why don't you give it one more try at an explanation and, post a new, short data example (please use -dataex- this time!) and show what the results from that example should be. Perhaps then it will become clear.

            Comment


            • #7
              I don' t understand exactly what is needed either. But I played a bit to show that different data structures are possible here. If this helps, good. Otherwise, just ignore it.

              But do note the use of (an equivalent of) dataex (as urged by Clyde!).

              Code:
              clear 
              input individual    Observation    str1 Category
              1    1    A
              1    1    B
              1    1    A
              1    2    D
              1    2    F
              2    1    F
              2    1    F
              2    2    A
              2    2    C
              end 
              
              contract i O C 
              reshape wide _freq , i(i O) j(Category) string
              rename (_freq*) (*)
              mvencode A-F, mv(0)
              list 
              
                   +-----------------------------------------+
                   | indivi~l   Observ~n   A   B   C   D   F |
                   |-----------------------------------------|
                1. |        1          1   2   1   0   0   0 |
                2. |        1          2   0   0   0   1   1 |
                3. |        2          1   0   0   0   0   2 |
                4. |        2          2   1   0   1   0   0 |
                   +-----------------------------------------+

              Comment


              • #8
                Thank you so much again, Clyde and Nick!
                After reshaping the dataset the way Nick suggested I was able to get what I needed. I am sorry I could not be more clear about my problem. I appreciate your time!

                Comment

                Working...
                X