How to unite observations based on identical values in different variables?

Ben Ronen

Join Date: May 2018
Posts: 26

How to unite observations based on identical values in different variables?

12 May 2018, 02:17

Hi everyone,
so first here is an example of my data:
/
/

locality code	city name	statistical area	pop	pop_18	potential	voters	_merge	new locality code	new statistical area	new city name
1053	EN HAZEVA	1	43.08881	22.98231	0	0	1	13	1	HAZEVA
13	HAZEVA	1	561.4722	388.3561	474	362	3
1056	EN KAREM-B.S.HAQLA'I	1	161.2622	37.2866	0	0	1	3000	1352	JERUSALEM
3000	JERUSALEM	1351	3565.347	2688.487	2781	2129	3
3000	JERUSALEM	1352	2727.852	2036.572	2742	1564	3
3000	JERUSALEM	1353	2438.363	1814.717	1804	1122	3
4000	HAIFA	111	2125.539	1416.01	1768	1185	3
4000	HAIFA	112	5445.942	3247.853	3250	2640	3

/
/
/
These are cities (with areas inside them) and small areas (villages) from israel, with some demografic information and number of voters (and potential voters) in the elections. Every city has its own "locality code", and a few statistical areas if it's a big city or just one statistical area if it's a small village (so every observation is a statistical area).
/
What I'm trying to do is to join the population (pop) in some small areas with that of different areas, and then to get rid of the observations of small areas. I want to use the variables: "new locality code" and "new statistical area", so that if they are identical to "locality code" and "statistical area" of a different observation respectively, I would like to add the values of "pop" and "pop_18" from the first observation to those of the second observation, and then loose the first observation.

For instance, the village "EN KAREM-B.S.HAQLA'I", should be deleted after its pop and pop_18 was joined to sttistical area 1352 in the city of JERUSALEM (of which locality code is 3000). And the same goes for the Village EN HAZEVA, that should be joined to HAZEVA (sorry for the confusing names).

The last two observations of HAIFA are to demonstrate that there are cities to which I won't join any small village. Actually, only about 25 small villages should be joined and deleted, while the total observations (statistical areas) in my data is about 2800.

Any help with recommended commands?

Thanks ahead,
Ben Ronen.

Last edited by Ben Ronen; 12 May 2018, 02:19.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17703
#2

12 May 2018, 02:33

Ben:
welcome to this forum.
If you really have only 25 villages to include in your database reshaping, I would:
- save your original data source in a new .dta file and store it in a safe directory (or, even better, on an external device);
- divide the working copy of your original dataset in smaller datasets, according to the numeber of small villages to join to larger areas (see -keep- and -drop- command);
- use -collapse- in each oner of the abovementioned smaller datasets to get what you're after;
- finally use -append- to reconstruct the new data framework.

For the future, as recommended by the FAQ, please post example/excerpt of your dataset via -dataex- or share waht you typed and what Stata gave you back via CODE delimiters. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Ben Ronen

Join Date: May 2018

Posts: 26
#3

12 May 2018, 06:19

Hey Carlo,
Thanks s lot for your advice. Just a follow-up question: Why should I divide the data before collapsing?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17703
#4

13 May 2018, 03:11

Ben:
for your data excerpt. I was not sure whether working on an unique database was feasible/advisable.
As sometimes -collapse- does not reach what you're after at the first attempt, working on smaller datasets (and then -append-) may be (for me .at least) easier.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

How to unite observations based on identical values in different variables?

Comment

Comment

Comment