STATA Newbie

Rani Mohamed

Join Date: Oct 2023

Posts: 2
#1

STATA Newbie

09 Oct 2023, 11:01

Hi everyone,

I'm new to STATA. I'm trying to do something fairly simple but am having trouble because I have very limited experience with STATA.

I have a variable dataset with the variable "date_of_birth" and "ID". There are >1 million observations and there are duplicates for dob entries by ID. My goal is to calculate the percentage of individuals who have more than one value for the variable dob. I also have to calculate the percent of individuals with multiple records that have more
than one value for the dob. Should I consider missing values?

Any help/direction is greatly appreciated!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#2

09 Oct 2023, 12:24

My goal is to calculate the percentage of individuals who have more than one value for the variable dob.

Code:

by ID (date_of_birth), sort: gen n_dobs = sum(date_of_birth != date_of_birth[_n-1]) by ID (date_of_birth): gen byte multiple_dobs = n_dobs[_N] > 1 by ID (date_of_birth): gen id_tag = (_n == _N) tab multiple_dobs if id_tag

I also have to calculate the percent of individuals with multiple records that have more
than one value for the dob.

This is the same question. It is not possible for a person to have more than one value for the dob if they only have one record in the data set.

Should I consider missing values?

The above code treats missing values as a distinct value. Whether you should do this or ignore missing values depends on what you will use the results for, so no answer is possible.

Note: As no example data set was provided, the code is untested and may contain typos or other errors. In the future, when asking for help with code, it is wise to include example data. The helpful way to do that is with the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Also, most people who come to the list to answer questions scan the thread titles to pick which ones they will respond to. A title like "Stata newbie" is really uninformative: the question could be about anything at all. You will attract more views, and therefore be likely to get a helpful response sooner, if you give your threads informative titles. That will also make it easier for others who come to the list searching for answers to a question of their own that is similar to yours to find your thread and perhaps avail themselves of the solution(s) that others have already posted.

Last edited by Pete Huckelba (StataCorp); 10 Oct 2023, 07:19. Reason: Fixed formatting.
1 like
Comment

Announcement

Comment