Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • frequency table in panel counting only id and not id-time

    dear users,

    I'm working on a dataset in panel format: id is the individual, time is the year. I need to describe some characterist of the dataset looking at the numbers of individuals: that is considering the frequency tables in which numbers of individuals are counted and not the number of individual-time.

    For instance:
    id time var1
    1 2000 100
    1 2001 200
    1 2002 300

    2 2001 200
    2 2002 55

    3 2000 150
    3 2001 60
    3 2002 70
    3 2003 80

    for example : tab time

    i would that it coumes out:

    2000 2
    2001 3
    2002 3
    2003 1
    ...

    then how can i obtain the mean for the var1 ?

    thank you for any help!!
    elena

  • #2
    Elena,
    welcome to the list.
    Do you mean something like this?
    Code:
    . use "http://www.stata-press.com/data/r14/nlswork.dta", clear
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . sort year
    
    . tab year
    
      interview |
           year |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             68 |      1,375        4.82        4.82
             69 |      1,232        4.32        9.14
             70 |      1,686        5.91       15.05
             71 |      1,851        6.49       21.53
             72 |      1,693        5.93       27.47
             73 |      1,981        6.94       34.41
             75 |      2,141        7.50       41.91
             77 |      2,171        7.61       49.52
             78 |      1,964        6.88       56.40
             80 |      1,847        6.47       62.88
             82 |      2,085        7.31       70.18
             83 |      1,987        6.96       77.15
             85 |      2,085        7.31       84.45
             87 |      2,164        7.58       92.04
             88 |      2,272        7.96      100.00
    ------------+-----------------------------------
          Total |     28,534      100.00
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,
      thanks a lot for your reply. actually i need to change example:
      id time grup var1
      1 2011 Gruppo D 0
      1 2012 Gruppo D 0
      1 2013 Gruppo D 0
      2 2006 Gruppo C 1
      2 2007 Gruppo C 1
      2 2008 Gruppo C 1
      2 2009 Gruppo C 1
      2 2010 Gruppo C 1
      2 2011 Gruppo C 1
      2 2012 Gruppo C 1
      2 2013 Gruppo C 1
      3 2006 Gruppo C 1
      3 2007 Gruppo C 1
      3 2008 Gruppo C 1
      3 2009 Gruppo C 1
      3 2010 Gruppo C 1
      3 2011 Gruppo C 1
      3 2012 Gruppo C 1
      3 2013 Gruppo C 1
      4 2006 Gruppo D 0
      4 2007 Gruppo D 0
      4 2008 Gruppo D 0
      4 2009 Gruppo D 0
      4 2010 Gruppo D 0
      4 2011 Gruppo D 0
      4 2012 Gruppo D 0
      4 2013 Gruppo D 0

      obs=27

      if i tab the following, i have:
      var1 is a dummy 0/1
      tab var1 , tab distribute 27 obs between 0 and 1
      while i need to distribute id , that is the 4 id, 2 in 0 and 2 in 1.
      i don't know if i am clear

      thank you again forthe help
      elena

      Comment


      • #4
        Code:
        help tabstat
        
        help tabsum

        Comment


        • #5
          Nick thanks for your replay. I tried this:
          tabstat var1, by(grup) stat(n mean sd ...)
          but it counts obs and not id.
          i really need help, do you have any other suggestion ?

          Comment


          • #6
            So you need to pick out one observation per id:

            Code:
            egen flag = tag(id)
            tabstat var1 if flag, by(group) stat(n mean sd)
            Do read -help egen- and find the -tag()- function for more details.

            Important: Note also that tabulating a variable by id when there are multiple observations per id only makes sense if the variable is constant within id. If var1 can take on different values in different observations for the same id, then the results will be neither consistent nor sensible. Your example data in the first post of this thread suggests that var1 does in fact change among observations with the same id. So it isn't clear to me how to sensibly do what you appear to want to do. Perhaps I am misunderstanding what you want here.

            Comment


            • #7
              Clyde thanks! it works if the variable as you said is constant within id. however var1 could also be a variables varying among observations within the same id. is there an alternative to tag for this case?
              thanks a lot!!

              Comment


              • #8
                Well, I don't understand what you want to get when the variable varies among observations with the same id. I can't make sense of it. Can you show an example of what you have (with var1 varying) and what the result you want would look like?

                Comment


                • #9
                  In similar situation dealing with panel data
                  the command
                  by ID: gen nyear = [_N]
                  creates the number of observations that appear nyear times in a panel data.
                  However I was looking for the number of IDs and how many years they appear in the dataset. How can I do that?
                  Example
                  id year var1
                  1 2000 3
                  2 2000 5
                  2 2001 6
                  3 2001 4
                  3 2002 6
                  3 2003 7
                  4 2000 2
                  4 2001 4
                  4 2002 7
                  4 2003 8
                  5 2001 6
                  5 2003 8
                  I am looking for
                  # of IDs freq(# of years the ID has data).
                  1 1
                  2 2
                  1 4
                  1 3

                  Comment


                  • #10
                    Code:
                    tab ID
                    Note. Your example is confusing to me. Identifier 1 occurs 1 time; 2 occurs 2 times in your example. But 3 occurs 3 times, and so after that the data you give and their summary part company.
                    Last edited by Nick Cox; 02 Oct 2019, 11:08.

                    Comment


                    • #11
                      Thank you very much for the reply.
                      The code you provided gives frequency of each ID. I am looking for aggregate by frequency (How many appear only once; how many appear twice and so forth.)

                      There are 4 periods in the panel data(from 2000-2003). ID 1 occurs only once; ID 2 Occurs twice; ID 3 occurs three times; ID 4 occurs four times and ID 5 occurs 2 times.
                      And what I want is the number of IDs and the number of their occurrence. Like the example I tried above.
                      Number of IDs Number of their occurrence
                      1 ( ID1 ) 1
                      2 ( ID 2 and ID 5) 2
                      3 ( ID 3) 1
                      4 (ID 4) 1
                      Total 5 (total number of IDs)

                      Comment


                      • #12
                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input byte id int year byte var1
                        1 2000 3
                        2 2000 5
                        2 2001 6
                        3 2001 4
                        3 2002 6
                        3 2003 7
                        4 2000 2
                        4 2001 4
                        4 2002 7
                        4 2003 8
                        5 2001 6
                        5 2003 8
                        end
                        
                        isid id year, sort
                        by id: gen n_years_this_id = _N
                        egen flag = tag(id)
                        tab n_years if flag
                        will give you what you want.

                        In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                        Comment


                        • #13
                          Thanks so much. It gave me what I was looking for.

                          Comment

                          Working...
                          X