Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create aggregated variable from individual data

    Hello everyone!

    I have a dataset based on individual observations for each respondent of a questionnaire (only for one year). Because the subsequent datasets for other years do not follow individuals on time and there is no individual id, I want to create means of the values of the variable of interest (p32xx) by province (ccaa).

    When I create the variable using: egen ca30=mean( p32xx), by(ccaa) I see a variable with the number of values just as the number of provinces, but I don't have the name of the provinces with the respective mean value as the frequency for each one, instead I have the mean as the label of each value, and the number of observations as the frequency. So I would like to have a variable with the name of the provinces (ccaa), and the mean value for p32xx within provinces (ccaa).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ca30 double(ccaa p32xx)
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 1
    2.0045533 2 6
    end
    label values ccaa ccaa
    label def ccaa 2 "Arag�n", modify
    label values p32xx p32xx
    label def p32xx 1 "S�", modify
    label def p32xx 6 "No", modify

    Hope you can help me, thanks!

  • #2
    In ca30 something like 2.0045533 is a value, not a label. ca30 has no value labels. To see both ccaa and ca30 you can do e.g.

    Code:
    tabdisp ccaa, c(ca30) 
    
    tabulate ccaa, su(ca30) nost
    You don't need new variables for that. Indeed the request to have a name as value and a number as label is invalid in Stata as string variables can't have value labels.

    Comment


    • #3
      Thanks! So I would have to convert the variable from string to a categorical one in order to have the variable as the
      tabdisp ccaa, c(ca30) code displays it? How can I have this variable (ca30) as showed with the table that command provides me? Thanks again!

      Comment


      • #4
        I don't know what you mean by categorical variable in this context, but you can't attach a value label to a non-integer. The commands I gave in #2 are commands you can try now with existing variables; there is no need for, and no scope for, creating a new variable such as you seem to think you need.

        Comment


        • #5
          What I think the original post fails to understand is that for any observation, the value of ca30 will be the mean of p32xx over all the observations with the same value of ccaa.
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double(ccaa p32xx)
          2 1
          2 1
          2 1
          2 1
          2 1
          3 6
          3 1
          3 1
          3 1
          3 1
          end
          egen ca30=mean(p32xx), by(ccaa)
          list, sepby(ccaa)
          tabstat ca30, by(ccaa) statistics(n min max)
          Code:
          . list, sepby(ccaa)
          
               +---------------------+
               | ccaa   p32xx   ca30 |
               |---------------------|
            1. |    2       1      1 |
            2. |    2       1      1 |
            3. |    2       1      1 |
            4. |    2       1      1 |
            5. |    2       1      1 |
               |---------------------|
            6. |    3       6      2 |
            7. |    3       1      2 |
            8. |    3       1      2 |
            9. |    3       1      2 |
           10. |    3       1      2 |
               +---------------------+
          
          . tabstat ca30, by(ccaa) statistics(n min max)
          
          Summary for variables: ca30
          Group variable: ccaa 
          
              ccaa |         N       Min       Max
          ---------+------------------------------
                 2 |         5         1         1
                 3 |         5         2         2
          ---------+------------------------------
             Total |        10         1         2
          ----------------------------------------
          From the output of tabstat we see that all 5 observations with ccaa==2 have ca30==1 and all 5 observations with ccaa==3 have ca30==2, as we saw in the listing.

          Comment

          Working...
          X