Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a Dataset as Varlist

    I'm trying to use one data set as the reference list to replace values in another. I suspect the code looks something like this:

    foreach i in `filename' {
    gen var_new if var_old == var_reference;
    }

    The data set in mind has a numerical ID tied to other data; data set 2 has a string label for that numerical ID that I'd like to use. Tips?

  • #2
    Welcome to Statalist.

    I'm afraid the code looks nothing like what you imagine.

    What you want to do is merge your two datasets. Here's an example.
    Code:
    // invent example data
    clear
    input float var_old var_new
    1 2
    4 3
    5 42
    end
    tempfile changes
    save `changes'
    
    clear
    input float id var
    101 1 
    102 4
    103 17
    104 1
    end
    tempfile master
    save `master'
    
    // demonstrate technique
    use `master', clear
    rename var var_old
    merge m:1 var_old using `changes'
    
    list, clean
    
    drop if _merge==2
    replace var_old = var_new if _merge==3
    drop _merge var_new
    rename var_old var
    sort id
    
    list, clean
    Code:
    . // demonstrate technique
    . use `master', clear
    
    . rename var var_old
    
    . merge m:1 var_old using `changes'
    
        Result                           # of obs.
        -----------------------------------------
        not matched                             2
            from master                         1  (_merge==1)
            from using                          1  (_merge==2)
    
        matched                                 3  (_merge==3)
        -----------------------------------------
    
    . 
    . list, clean
    
            id   var_old   var_new            _merge  
      1.   101         1         2       matched (3)  
      2.   104         1         2       matched (3)  
      3.   102         4         3       matched (3)  
      4.   103        17         .   master only (1)  
      5.     .         5        42    using only (2)  
    
    . 
    . drop if _merge==2
    (1 observation deleted)
    
    . replace var_old = var_new if _merge==3
    (3 real changes made)
    
    . drop _merge var_new
    
    . rename var_old var
    
    . sort id
    
    . 
    . list, clean
    
            id   var  
      1.   101     2  
      2.   102     3  
      3.   103    17  
      4.   104     2
    With that said, let me offer you some advice as an apparently new user of Stata.

    I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

    When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

    All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

    Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.


    Comment


    • #3
      I'm not brand new, but a relative rookie indeed. I'll take a look at the documentation.

      A question on the above example: what I have is 14M data points where each unique ID is tied to multiple observations. Correct me if I'm wrong, but I was under the impressions that the merge command only works on a 1-to-1 basis. That's where I was getting stuck. I know how to use merge for unique IDs, but I thought it didn't work if they recur.

      Comment


      • #4
        Correct me if I'm wrong, but I was under the impressions that the merge command only works on a 1-to-1 basis.
        You are, indeed, wrong. -merge- works on a 1-to-1, 1-to-m, or m-to-1 basis.

        It is also legal syntax to use -merge- on an m-to-m basis, but the result is data salad, so you should not do that.

        Do read -help merge-.

        Comment


        • #5
          Clyde writes the answer I would have written.

          But to that let me add the following, on reflection. The code I presented in post #2 included the merge m:1 command. This allows for multiple observations with the same id in the master dataset - even though the example data I made up in that post (in the absence of example data in post #1) had distinct id's. This means that the new_var for a given id in the change dataset will be matched to every observation with the same id in the master dataset. That is usually what is wanted for problems described as you did.

          Comment


          • #6
            Ah, I see. Thanks so much, all!

            Comment

            Working...
            X