Repeated variables

Clarissa Gallegos

Join Date: Jun 2022

Posts: 10
#1

Repeated variables

17 Jan 2023, 15:11

I have a problem because I have a data with a variable called houseID, but the houseID is the same for diffetent members, for example:

houseID income tax total_tax_by_house
001 10 2.3 54.4
001 200 52.1 54.4
001 0 0 54.4
002 ...

and i just want to keep one for each house number, or create a new variable that list the members in the same houseID, any idea?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#2

17 Jan 2023, 15:36

So the first question is why you have multiple records for the same houseID. I'm guessing that the houseID just identifies a house, and that there can be multiple people in that house. Is there no variable in your data set that identifies the people within the houseID?

If there is no such variable, you might consider doing something like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str3 houseid int income float(tax total_tax_by_house) "001" 10 2.3 54.4 "001" 200 52.1 54.4 "001" 0 0 54.4 "002" . . . end by houseid (income), sort: gen int person_num = _n reshape wide income tax total_tax_by_house, i(houseid) j(person_num)

That said, why do you want to reduce the data set to one observation per houseID? If you are going to be doing calculations with the values of income, tax, and total_tax_by_house, you will probably find it is much easier to do if you leave the data in the multiple observations per houseID layout you already have.

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Announcement

Repeated variables

Comment