Using a Dataset as Varlist

Cameron Chan

Join Date: Feb 2019

Posts: 3
#1

Using a Dataset as Varlist

14 Feb 2019, 13:30

I'm trying to use one data set as the reference list to replace values in another. I suspect the code looks something like this:

foreach i in `filename' {
gen var_new if var_old == var_reference;
}

The data set in mind has a numerical ID tied to other data; data set 2 has a string label for that numerical ID that I'd like to use. Tips?
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

14 Feb 2019, 17:02

Welcome to Statalist.

I'm afraid the code looks nothing like what you imagine.

What you want to do is merge your two datasets. Here's an example.

Code:

// invent example data clear input float var_old var_new 1 2 4 3 5 42 end tempfile changes save `changes' clear input float id var 101 1 102 4 103 17 104 1 end tempfile master save `master' // demonstrate technique use `master', clear rename var var_old merge m:1 var_old using `changes' list, clean drop if _merge==2 replace var_old = var_new if _merge==3 drop _merge var_new rename var_old var sort id list, clean

Code:

. // demonstrate technique . use `master', clear . rename var var_old . merge m:1 var_old using `changes' Result # of obs. ----------------------------------------- not matched 2 from master 1 (_merge==1) from using 1 (_merge==2) matched 3 (_merge==3) ----------------------------------------- . . list, clean id var_old var_new _merge 1. 101 1 2 matched (3) 2. 104 1 2 matched (3) 3. 102 4 3 matched (3) 4. 103 17 . master only (1) 5. . 5 42 using only (2) . . drop if _merge==2 (1 observation deleted) . replace var_old = var_new if _merge==3 (3 real changes made) . drop _merge var_new . rename var_old var . sort id . . list, clean id var 1. 101 2 2. 102 3 3. 103 17 4. 104 2

With that said, let me offer you some advice as an apparently new user of Stata.

I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.
Comment
Cameron Chan

Join Date: Feb 2019

Posts: 3
#3

16 Feb 2019, 11:48

I'm not brand new, but a relative rookie indeed. I'll take a look at the documentation.

A question on the above example: what I have is 14M data points where each unique ID is tied to multiple observations. Correct me if I'm wrong, but I was under the impressions that the merge command only works on a 1-to-1 basis. That's where I was getting stuck. I know how to use merge for unique IDs, but I thought it didn't work if they recur.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#4

16 Feb 2019, 12:07

Correct me if I'm wrong, but I was under the impressions that the merge command only works on a 1-to-1 basis.

You are, indeed, wrong. -merge- works on a 1-to-1, 1-to-m, or m-to-1 basis.

It is also legal syntax to use -merge- on an m-to-m basis, but the result is data salad, so you should not do that.

Do read -help merge-.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

16 Feb 2019, 12:33

Clyde writes the answer I would have written.

But to that let me add the following, on reflection. The code I presented in post #2 included the merge m:1 command. This allows for multiple observations with the same id in the master dataset - even though the example data I made up in that post (in the absence of example data in post #1) had distinct id's. This means that the new_var for a given id in the change dataset will be matched to every observation with the same id in the master dataset. That is usually what is wanted for problems described as you did.
Comment
Cameron Chan

Join Date: Feb 2019

Posts: 3
#6

16 Feb 2019, 14:05

Ah, I see. Thanks so much, all!
Comment

Announcement

Using a Dataset as Varlist

Comment

Comment

Comment

Comment

Comment