Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to unite observations based on identical values in different variables?

    Hi everyone,
    so first here is an example of my data:
    /
    /
    locality code city name statistical area pop pop_18 potential voters _merge new locality code new statistical area new city name
    1053 EN HAZEVA 1 43.08881 22.98231 0 0 1 13 1 HAZEVA
    13 HAZEVA 1 561.4722 388.3561 474 362 3
    1056 EN KAREM-B.S.HAQLA'I 1 161.2622 37.2866 0 0 1 3000 1352 JERUSALEM
    3000 JERUSALEM 1351 3565.347 2688.487 2781 2129 3
    3000 JERUSALEM 1352 2727.852 2036.572 2742 1564 3
    3000 JERUSALEM 1353 2438.363 1814.717 1804 1122 3
    4000 HAIFA 111 2125.539 1416.01 1768 1185 3
    4000 HAIFA 112 5445.942 3247.853 3250 2640 3
    /
    /
    /
    These are cities (with areas inside them) and small areas (villages) from israel, with some demografic information and number of voters (and potential voters) in the elections. Every city has its own "locality code", and a few statistical areas if it's a big city or just one statistical area if it's a small village (so every observation is a statistical area).
    /
    What I'm trying to do is to join the population (pop) in some small areas with that of different areas, and then to get rid of the observations of small areas. I want to use the variables: "new locality code" and "new statistical area", so that if they are identical to "locality code" and "statistical area" of a different observation respectively, I would like to add the values of "pop" and "pop_18" from the first observation to those of the second observation, and then loose the first observation.

    For instance, the village "EN KAREM-B.S.HAQLA'I", should be deleted after its pop and pop_18 was joined to sttistical area 1352 in the city of JERUSALEM (of which locality code is 3000). And the same goes for the Village EN HAZEVA, that should be joined to HAZEVA (sorry for the confusing names).

    The last two observations of HAIFA are to demonstrate that there are cities to which I won't join any small village. Actually, only about 25 small villages should be joined and deleted, while the total observations (statistical areas) in my data is about 2800.

    Any help with recommended commands?

    Thanks ahead,
    Ben Ronen.
    Last edited by Ben Ronen; 12 May 2018, 02:19.

  • #2
    Ben:
    welcome to this forum.
    If you really have only 25 villages to include in your database reshaping, I would:
    - save your original data source in a new .dta file and store it in a safe directory (or, even better, on an external device);
    - divide the working copy of your original dataset in smaller datasets, according to the numeber of small villages to join to larger areas (see -keep- and -drop- command);
    - use -collapse- in each oner of the abovementioned smaller datasets to get what you're after;
    - finally use -append- to reconstruct the new data framework.

    For the future, as recommended by the FAQ, please post example/excerpt of your dataset via -dataex- or share waht you typed and what Stata gave you back via CODE delimiters. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hey Carlo,
      Thanks s lot for your advice. Just a follow-up question: Why should I divide the data before collapsing?

      Comment


      • #4
        Ben:
        for your data excerpt. I was not sure whether working on an unique database was feasible/advisable.
        As sometimes -collapse- does not reach what you're after at the first attempt, working on smaller datasets (and then -append-) may be (for me .at least) easier.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X