Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA Newbie

    Hi everyone,

    I'm new to STATA. I'm trying to do something fairly simple but am having trouble because I have very limited experience with STATA.

    I have a variable dataset with the variable "date_of_birth" and "ID". There are >1 million observations and there are duplicates for dob entries by ID. My goal is to calculate the percentage of individuals who have more than one value for the variable dob. I also have to calculate the percent of individuals with multiple records that have more
    than one value for the dob. Should I consider missing values?

    Any help/direction is greatly appreciated!

  • #2
    My goal is to calculate the percentage of individuals who have more than one value for the variable dob.
    Code:
    by ID (date_of_birth), sort: gen n_dobs = sum(date_of_birth != date_of_birth[_n-1])
    by ID (date_of_birth): gen byte multiple_dobs = n_dobs[_N] > 1
    by ID (date_of_birth): gen id_tag = (_n == _N)
    tab multiple_dobs if id_tag
    I also have to calculate the percent of individuals with multiple records that have more
    than one value for the dob.
    This is the same question. It is not possible for a person to have more than one value for the dob if they only have one record in the data set.

    Should I consider missing values?
    The above code treats missing values as a distinct value. Whether you should do this or ignore missing values depends on what you will use the results for, so no answer is possible.

    Note: As no example data set was provided, the code is untested and may contain typos or other errors. In the future, when asking for help with code, it is wise to include example data. The helpful way to do that is with the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Also, most people who come to the list to answer questions scan the thread titles to pick which ones they will respond to. A title like "Stata newbie" is really uninformative: the question could be about anything at all. You will attract more views, and therefore be likely to get a helpful response sooner, if you give your threads informative titles. That will also make it easier for others who come to the list searching for answers to a question of their own that is similar to yours to find your thread and perhaps avail themselves of the solution(s) that others have already posted.
    Last edited by Pete Huckelba (StataCorp); 10 Oct 2023, 07:19. Reason: Fixed formatting.

    Comment

    Working...
    X