Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Table of Counts by Variable

    Hello, I have a panel dataset of people and various characteristics. I want to generate a table that displays counts for each of these variables and group them by year. It's easy to do for one variable using the 'tab' function, but I cannot seem to find an option for multiple variables. For some given data (ID, Year, Sex, Age, Education), I have:
    ID Year Age Sex Education
    1 2000 25 1 PhD
    1 2002 27 1 PhD
    2 2000 20 0 --
    2 2001 21 0 --
    3 2000 23 1 College
    3 2003 24 1 College
    And I'd like to generate an output similar to the one below, where I'm counting the frequency of observations by year:
    Sex Age Education
    2000 3 3 2
    2001 1 1 0
    2002 1 1 1
    2003 1 1 1

    Is there a feasible way to do this in Stata? I can't seem to do it with the tab function. Any help would be appreciated!

  • #2
    Please follow FAQ Advice #12 and use dataex to give a data example.

    You seem to want counts of non-missing values. If so, string values of -- are not helpful. This may help:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte d int year byte(age sex) str7 education
    1 2000 25 1 "PhD"    
    1 2002 27 1 "PhD"    
    2 2000 20 0 "--"     
    2 2001 21 0 "--"     
    3 2000 23 1 "College"
    3 2003 24 1 "College"
    end
    
    replace education = "" if education == "--"
    preserve
    gen educ = cond(education == "", ., 1) 
    collapse (count) age sex educ, by(year)
    list
    restore
    Results

    Code:
     
        +-------------------------+
         | year   age   sex   educ |
         |-------------------------|
      1. | 2000     3     3      2 |
      2. | 2001     1     1      0 |
      3. | 2002     1     1      1 |
      4. | 2003     1     1      1 |
         +-------------------------+

    Comment


    • #3
      Thank you, Nick! This worked greatly.

      Luckily, my variables are all doubles, so the missing value conversion is not necessary for my dataset. However, my actual variable list has 10+ variables, so I don't suppose there is a more clean way to display these tables in the way you've set it up? So far, the output appears to be an individual table for each year (probably because my actual var list is quite large). Any further suggestions? Thank you!

      Comment


      • #4
        Further suggestions would benefit from a data example, as requested in #2,

        Comment


        • #5
          Sure, here is an example with more variables.

          Code:
          clear
          input byte d int year byte(age sex income industry vehicles household_size city jobs circle square) str7 education
          1 2000 25 1 70 30 1 2 2 3 4 5 "PhD"    
          1 2002 27 1 70 30 1 2 2 3 4 5 "PhD"    
          2 2000 20 0 55 12 2 1 1 1 5 2 "--"     
          2 2001 21 0 55 12 3 1 1 1 5 2 "--"     
          3 2000 23 1 65 21 2 3 1 3 6 3 "College"
          3 2003 24 1 65 21 2 4 1 3 6 3 "College"
          end
          
          replace education = "" if education == "--"
          preserve
          gen educ = cond(education == "", ., 1) 
          collapse (count) age sex income industry vehicles household_size city jobs circle square educ, by(year)
          list
          restore
          The resulting tables (which I'm not sure how to display here) are not as organized as before. I believe that it has to do with the amount of variables that I'm trying to count.

          Comment


          • #6
            Thanks for your data example in #5 which flatly contradicts the statement in #3 that "my variables are all doubles".

            #5 is just a variation on what I guessed in #2, so that is welcome confirmation, but the answer is not going to be much different.

            Once more, what to just about any researcher is obviously an indication of missing -- namely "--" -- will not be regarded as missing by Stata, and if you have any other similar conventions they will frustrate your goal.

            The only extra trick here is that you can segregate string variables.

            The code here assumes that you have no variable work, but any other name not in use would be fine, as would using a temporary variable.


            Code:
            clear
            input byte d int year byte(age sex income industry vehicles household_size city jobs circle square) str7 education
            1 2000 25 1 70 30 1 2 2 3 4 5 "PhD"    
            1 2002 27 1 70 30 1 2 2 3 4 5 "PhD"    
            2 2000 20 0 55 12 2 1 1 1 5 2 "--"     
            2 2001 21 0 55 12 3 1 1 1 5 2 "--"     
            3 2000 23 1 65 21 2 3 1 3 6 3 "College"
            3 2003 24 1 65 21 2 4 1 3 6 3 "College"
            end
            
            preserve 
            ds, has(type string)
            
            foreach v in `r(varlist)' {
                replace `v' = "" if `v' == "--"
                gen work = cond(missing(`v'), ., 1)
                drop `v'
                rename work `v'
            }
            
            ds year, not 
            collapse (count) `r(varlist)', by(year)
            list
            restore

            Comment


            • #7
              Thank you Nick! Yes, my example does contradict what you said before, as this is just a toy example with multiple variables. This should be helpful in other scenarios.

              It appears that the output tables are similar to what I've generated before; they are separate tables instead of one large organized table. I suppose that is fine if Stata cannot aggregate the output tables into a single large table.

              Comment


              • #8
                You showed what you wanted in =1.In principle and in practice, that will be so many rows and so many columns.

                If you really or also want something else, you also need to specify what that is. For example, you can reshape the dataset produced by collapse to another layout

                Much of the difficulty in coding here arises from needing to fix a non-standard method of holding missing values.
                Last edited by Nick Cox; 23 Feb 2023, 01:42.

                Comment

                Working...
                X