Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulate top frequencies of one variable dependent on other variable

    Hi guys,

    I am trying to tabulate the frequencies (top 5) of a variable 'edate' for each value of another variable 'userid'. The following command:
    Code:
    groups edate, order(h) select(5)
    would work if I added an if-clause, naming each observation of variable 'userid' manually. I have tried combining the groups command with a foreach loop:
    Code:
    foreach x in userid{
        groups edate, order(h) select(5)
    }
    but the output is total, while I need the output for each observation of 'userid'.

    Simply using
    Code:
    tab userid edate
    does not help, as there are too many observations for this.

    I would gladly appreciate any help.

    Kind regards,
    Petar

  • #2
    groups is from the Stata Journal, as you are asked to explain (FAQ Advice #12).

    The problem calls for the by: prefix. This is documented and exemplified in the help and in https://www.stata-journal.com/articl...article=st0496 Here is another example:

    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
    
    
    .  bysort collgrad : groups grade, order(high) select(5)
    
    ---------------------------------------------------------------------------------------------------------------
    -> collgrad = 0
    
      +---------------------------------+
      | grade   Freq.   Percent     %<= |
      |---------------------------------|
      |    12   14242     60.00   60.00 |
      |    11    1775      7.48   67.48 |
      |    13    1708      7.20   74.67 |
      |    14    1636      6.89   81.56 |
      |    10    1518      6.40   87.96 |
      +---------------------------------+
    
    ---------------------------------------------------------------------------------------------------------------
    -> collgrad = 1
    
      +---------------------------------+
      | grade   Freq.   Percent     %<= |
      |---------------------------------|
      |    16    2681     55.91   55.91 |
      |    18     921     19.21   75.12 |
      |    17     851     17.75   92.87 |
      |    15     185      3.86   96.73 |
      |    14     115      2.40   99.12 |
      +---------------------------------+

    Your code shows a common misconception about foreach. A loop in Stata like


    Code:
     
     foreach x in userid {     ...  }
    does not somehow loop over the distinct values of userid and issue output for each. As Stata doesn't do this, and doesn't claim to do this, it's a puzzle why people think it is on offer, but perhaps the reasons include wishful thinking or a guess that Stata behaves like some other software that does offer this. Oddly the loop is perfectly legal, as it is not essential to refer to the loop element x within the loop. It's a loop over one item, which happens to be a variable name, but that's as may be.

    Comment

    Working...
    X