Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify if each ID has two unique values for a variable

    Hello! I am somewhat new to Stata and have tried to find an answer to this on other threads but have been unable to find something that works with my dataset. I have attached a snippet from my dataset. I have upwards to 6,000 observations, so you could see why I am trying to find a way to code for this issue.

    This is a dietary recall dataset. I am trying to begin data cleaning, and one of my parameters is that I only want to include subjects that have recalls from two days (as opposed to only one). I am having trouble coming up with a way where I can ultimately create a variable and code it 0/1 for if they have recalls for two days or not. As you can see, the number of entries for each ID can vary greatly. Some have 6 while others have up to 14. This obviously doesn't matter when I just need to verify that I have recalls for two days, but it does make it a little tricky to handle since I can't reshape the data. Can someone lend some insight on how to handle this? I am open to hearing other suggestions that may be better than what I'm trying to do as well.

    I appreciate any and all help I can get with this.
    Attached Files

  • #2
    Please read the Forum FAQ, with particular attention to #12, for excellent advice on how to give information in the most helpful ways. There you will learn, among other things, that attachments are deprecated, particularly attachments that are capable of transmitting malware, or that are opened with proprietary software other than Stata.

    The most helpful way to show example data is to use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Now, I will imagine that your data set has a variable that uniquely identifies subjects, which I will call subject_id, and another variable that specifies the date. You wish to retain only those subjects who have at least two distinct dates. This can be done simply with:

    Code:
    assert !missing(date)
    by subject_id (date), sort: drop if date[1] == date[_N]

    Comment


    • #3
      It seems that the code you provided worked! I greatly appreciate your help with my issue as well as the information with posting questions on the forum. I will be sure to read the information you suggested.

      Comment

      Working...
      X