Merging back Dataset Which Had to Reduced

Maxence Morlet

Join Date: Mar 2021

Posts: 653
#1

Merging back Dataset Which Had to Reduced

12 Sep 2023, 10:37

Hi all,

In order to generate spatial lags of variables, I have had to reduce the following dataset, which is id_region and time specific (I've omitted the irrelevant variables from the data extract to remain concise):

Code:

input float(tok_cant token1) byte canton float time 27 16 1 7 27 16 1 8 27 16 1 10 27 16 1 11 27 16 1 13 27 16 1 14 27 16 1 16 27 16 1 17 28 16 2 7 28 16 2 8 28 16 2 10 28 16 2 11 28 16 2 13 28 16 2 14 28 16 2 16 28 16 2 17 29 16 3 7 29 16 3 8 29 16 3 10 29 16 3 11 29 16 3 13 29 16 3 14 29 16 3 16 29 16 3 17

to a region time specific dataset:

Code:

input byte region float time 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 2 18 2 19

I had to do this in order to generate spatially lagged variables, using code outlined here: https://www.statalist.org/forums/for...ing-spgenerate. I reduced the dataset like this, after having created region-time averages of the data:

Code:

bys region time: keep if _N==_n

However, I need these spatial lags in the original dataset, the one I had before I reduced it to a region-time specific dataset. The optimal course of action would be to merge this reduced dataset back with the original one and take the newly generate spatial lags, which then just have duplicates in the original id_region time specific dataset.

I tried the preserve and restore commands, but once restore has run, I lose the changes made and therefore the spatially lagged variables created.

I also tried, with the reduced dataset as master file

Code:

merge 1:m region time using "originaldataset"

however this gave me nonsense results, although all the observations had matched (_merge==3 for all observations).

Please could someone let me know where I've gone wrong and how I could get back to the original dataset with the newly generated spatially lagged variables? I might be going about it wrong, merging may perhaps not be the way to go...

Many thanks in advance!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10274
#2

12 Sep 2023, 12:37

Originally posted by Maxence Morlet View Post

I also tried, with the reduced dataset as master file

Code:

merge 1:m region time using "originaldataset"

however this gave me nonsense results, although all the observations had matched (_merge==3 for all observations).

Can you elaborate on this? I cannot follow the example provided because there is no variable named "region" in the first dataset - perhaps it is the variable "canton". However, it may be useful from your own perspective to start with the master dataset, although this should not change anything.

Code:

merge m:1 region time using "usingdataset", keep(master match) nogen
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#3

12 Sep 2023, 14:04

Yes, canton = region sorry I constantly get confused between the two.

In the original dataset, id_region and time specific, there are variables Y and X. Before reducing the dataset to being region-time specific, I generate averages of Y and X that are specific to each region-time cell. These new averages are Y_bar and X_bar.

Then I reduced the dataset to region-time specific and generated the spatially lagged variables. However, when I tried merging it back to the original, "uncollapsed" dataset, observations from the latter did not at all map onto the right regions and the right times. Furthermore, each respondent (id) was assigned only one region (they should all have 26 regions) and duplicated a lot of times.

As you said, it was unlikely, but however your solution worked! Thank you very much Andrew! I then went back to using the orginial dataset as master file, and ran the merge in that direction, m:1.

It then mapped observations correctly onto the right region and time values.
Comment

Announcement

Merging back Dataset Which Had to Reduced

Comment

Comment