Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary of categorical variable

    Hi all,
    I want know how calculate the average and standard deviation (sd) of a categorical variable. For example if I have a variable that tell me the education level in a any village in where 1=none and primary level, 2= secondary, 3= technical and university and 4= postgraduate, and I only want know the average and sd for the people in secondary level... How do I do in Stata?
    I know that is a simple question but I don´t remember, please point me the way.

    Thanks...

  • #2
    Code:
    summarize <your variable of interest here> if education_level == 2

    Comment


    • #3
      Joseph thanks for reply my post.

      You know when I have a binary variable the command <sum> show me the average and sd the group with attribute. but with categorical variable is difficult. I run the command like you point me, but the result in average is the sum of the all one - one is attribute- and the value in sd is rule -"." in state-. I don´t if you understand me joseph??

      thanks again

      Comment


      • #4
        Camilo:
        the best way to get understood is to post what you typed and what Stata gave you back (as per FAQ).
        As far as I can get your query, Stata reported a missing standard deviation for your variable. I'd bet that mean, miminum and maximum are the same, too.
        This occurs whenever there's no variation in the values of a given variabl:
        Code:
        . set obs 1
        number of observations (_N) was 0, now 1
        
        . g var=1
        
        . su
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
                 var |          1           1           .          1          1
        That said, it's meanigless to calculate descriptive statistics of a categorical variabe such as education level for two reasons, at least:
        - each level that variable is composed of is created at research convenience (ie, does not mesure anything);
        - as discussed above, you cannot have any variance for a constant value (such as education level 1, 2 or else).

        Probably, your goal was different, that is calculating descriptive statistics for a given variable under the condition:
        Code:
        if education level==2
        It's easy to do it with Stata:
        Code:
        . use "C:\Program Files (x86)\Stata14\ado\base\a\auto.dta", clear
        (1978 Automobile Data)
        
        . sum price if foreign==0
        
        Variable | Obs Mean Std. Dev. Min Max
        -------------+---------------------------------------------------------
        price | 52 6072.423 3097.104 3291 15906
        
        . bysort foreign: sum price
        
        -----------------------------------------------------------------------------------------------------------------
        -> foreign = Domestic
        
        Variable | Obs Mean Std. Dev. Min Max
        -------------+---------------------------------------------------------
        price | 52 6072.423 3097.104 3291 15906
        
        -----------------------------------------------------------------------------------------------------------------
        -> foreign = Foreign
        
        Variable | Obs Mean Std. Dev. Min Max
        -------------+---------------------------------------------------------
        price | 22 6384.682 2621.915 3748 12990
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment

        Working...
        X