Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with tag(id)

    Hello,

    Suppose I have the following dataset. (apologies for the formatting, I was trying to make the columns clearly defined, and don't know any other way to do that here). There are multiple observations per person taken over different periods of time, and some people have been observed more times than others.

    I would like to know how many people enrolled in a course post college. . If a person is listed as having enrolled post college, that takes precedence over any observation that says no post college. To account for multiple observations per person, I generated tag = tag(id1). My question is, would the command "tab tag postcollegeenrollment" say that two out of the three people here have enrolled in a post college course? Or would it incorrectly say that no one enrolled in a post college course because each of the rows where tag = 1 has the corresponding postcollegeenrollment value "No Post-college enrollment"? If the second case is true, what other stata commands can I use to answer my initial question? (Of course, my full dataset has 1000s of observations, so I can't just count like I did here).

    major..................postcollegeenrollment...... ........id1........tag

    French................No Post-college enrollment.....1...........1
    French................No Post-college enrollment.....1...........0
    French................Post-college enrolled............. 1.......... 0
    French................Post-college enrolled.............. 1.......... 0
    French................Post-college enrolled.............. 1.......... 0
    French................Post-college enrolled.............. 1.......... 0
    History................No Post-college enrollment..... 2.......... 1
    History................No Post-college enrollment..... 2.......... 0
    Neuroscience.....No Post-college enrollment .....3.......... 1
    Neuroscience.....No Post-college enrollment .....3.......... 0
    Neuroscience.....Post-college enrolled.............. 3.......... 0

    Thank you for your help.
    Last edited by Margo Channing; 19 Sep 2018, 17:52.

  • #2
    It is very unclear to me what you want. And your example data table requires major surgery to convert it into something that can be imported into Stata to try code out with. You put all that effort into creating it, with unhelpful results, when you could have, in just a matter of a few seconds, posted a -dataex- example. For example data, it doesn't matter whether it aligns visually in the Forum: what matters is being easy to import into Stata. Good visual alignment is important when showing code or Stata results, but not for data sets.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Please post back with a data example generated by -dataex-. Then clarify whether you want, for each different course, a count of the number of people who enrolled in it post-college, or whether you want an overall total of all the distinct people who enrolled post-college for any course at all. Also, your multiple observations per person are, at least in your example, all exact duplicates, so is there some reason you don't just apply -duplicates drop- and get the extra, redundant, observations out of the way?

    Comment


    • #3
      Clyde gives excellent advice as always but while we await a fuller explanation it's possible to underline that the purpose of the tag() function of egen (which, NB, can't be used with generate) is to tag just one of several repeated instances so that we don't end up showing or using the same values again and again.

      Here is a silly example. For a groupwise calculation where the result is the same for every observation in a group then all we need to see is one result from each group. The idiom if tag (possible once a tag variable exists) deliberately gives exactly the same result as if tag == 1.

      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . egen tag = tag(rep78)
      
      . egen iqr = iqr(mpg), by(rep78)
      
      . list rep78 iqr if tag
      
           +-------------+
           | rep78   iqr |
           |-------------|
        1. |     3     4 |
        5. |     4     7 |
       12. |     2   6.5 |
       20. |     5    17 |
       40. |     1     6 |
           +-------------+
      
      . sort rep78
      
      . list rep78 iqr if tag, noobs
      
        +-------------+
        | rep78   iqr |
        |-------------|
        |     1     6 |
        |     2   6.5 |
        |     3     4 |
        |     4     7 |
        |     5    17 |
        +-------------+
      Code:
      
      
      In this example and in many others there are several equivalent ways of getting the same result but that is not a problem: the aim of tag() is to produce a variable to allow use of just one clone out of several.

      Comment

      Working...
      X