Problem With Drop After Merge

Monica Clodius

Join Date: Aug 2015

Posts: 4
#1

Problem With Drop After Merge

24 Aug 2015, 11:47

I am basically updating/creating a bunch of data tables and graphs for a presentation, so nothing too complex. However, I am having some issues dropping observations after a merge on Stata 11.2, and I can't figure out what the issue is exactly. I've searched online but I couldn't find anything for my error message that wasn't directly related to the merge itself, which my error doesn't seem to be. Any help or direction towards further resources is greatly appreciated.

After merging two data sets 1:1 by month and year (there are 12 months listed numerically for every year, but in combination they work as unique identifiers), when I try to drop values from before 1980 as well as values after month 7 and year 2015 I get the error message "variables year month do not uniquely identify observations in the using data."

Here is a copy of the section of the code where the error occurs:

Code:

use ${outputdir2}temp_ir , replace ; sort year month; program define readin; sort year month; merge 1:1 year month using ${outputdir2}temp_`1'; drop _merge; save "${outputdir2}scrap1.dta" , replace ; end; readin `ei'; readin `cpi'; readin `hpi_us_ofheo'; readin `hpi_us_cs'; readin `fedtax'; readin `bea_income_nat' drop if year > ${current_y}; drop if year == ${current_y} & month > ${current_m}; drop if year < 1980; drop if year == .; sort year month;

The error message pops up right after "drop if year > ${current_y}".

Concerned about duplicates, I ran a duplicates report for year month on the merged dataset before and after the readin commands. There were no duplicates. I also checked "isid year month" and had no error messages, which makes sense considering the 1:1 merge. I also checked for duplicates from the original datasets before the initial merge, as I had just recently updated it. There were no duplicates. Here is a screen shot of the output after the final readin.

Considering this, I figured stata was having issues with year and month working as unique identifiers on their own, as I am dropping observations based on their response in year and month as variables on their own not together. When I ran similar checks on duplicates and isid on year and month independently, I unsurprisingly had error messages and copies galore thanks to how I organized my data. A partial screenshot of 5 of my observations shows a little more clearly how I have the data organized and why I have duplicate months and years independently and not in combination.

I never considered that stata would require observations to have unique identifiers for a drop as I am trying to accomplish here. Do you all know what the reasoning behind this is, if that is even the problem here? Also, if within reason, is there some way for me to get around that? Otherwise, I am thinking of dropping variables by using _n, but that would be less than ideal as I hope to use this code to update my presentation with new data as required. Do you have any recommendations on how to accomplish this in a more efficient manner?

Thanks for your help!
Tags: None
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#2

24 Aug 2015, 12:33

Originally posted by M Clodius View Post

The error message pops up right after "drop if year > ${current_y}".

Could you please copy and paste some lines from the Results window that show that the error message is indeed caused by the drop command? Please use CODE tags instead of screenshots (see section 12 in the FAQ). Please also see FAQ section 6, which asks list members to post under their full real names.
Comment

Monica Clodius

Join Date: Aug 2015
Posts: 4

24 Aug 2015, 14:04

Sorry about the name. I thought I had sent the email already to get that changed after registering, but maybe not? My apologies regardless.

I'm using Stata in a batch mode in linux, so I don't have a results window. I have copied the results from a log file. I hope that works.

Here is the section of the file involving the error. I included the section involving the merge and readin files in case the issue has to do with the merge and you wanted to look at that. Otherwise, the error occurs at the very end of the included results.

Code:

 use ${outputdir2}temp_ir
  >         ,
  >         replace
  >         ;
          
  . sort year month;
          
  . program define readin;
    1.         sort year month;
    2.         merge 1:1 year month using ${outputdir2}temp_`1';
    3.         drop _merge;
    4.         save "${outputdir2}scrap1.dta"
  >         ,
  >         replace
  >         ;
    5. end;
          
  . readin `ei';
          
      Result                           # of obs.
      -----------------------------------------
      not matched                            11
          from master                         0  (_merge==1)
          from using                         11  (_merge==2)
          
      matched                               416  (_merge==3)
      -----------------------------------------
  (note: file /projects/DATA/new_output/data/scrap1.dta not found)
  file /projects/DATA/new_output/data/scrap1.dta saved
          
  . readin `mr';
          
      Result                           # of obs.
      -----------------------------------------
      not matched                             0
      matched                               427  (_merge==3)
      -----------------------------------------
  file /projects/DATA/new_output/data/scrap1.dta saved
          
  . readin `cpi';
          
      Result                           # of obs.
      -----------------------------------------
      not matched                             1
          from master                         1  (_merge==1)
          from using                          0  (_merge==2)
          
      matched                               426  (_merge==3)
      -----------------------------------------
  file /projects/DATA/new_output/data/scrap1.dta saved
          
  . readin `hpi_us_ofheo';
          
      Result                           # of obs.
      -----------------------------------------
      not matched                             4
          from master                         4  (_merge==1)
          from using                          0  (_merge==2)
          
      matched                               423  (_merge==3)
      -----------------------------------------
  file /projects/DATA/new_output/data/scrap1.dta saved
          
  . readin `hpi_us_cs';
          
      Result                           # of obs.
      -----------------------------------------
      not matched                            97
          from master                        97  (_merge==1)
          from using                          0  (_merge==2)
          
      matched                               330  (_merge==3)
      -----------------------------------------
  file /projects/DATA/new_output/data/scrap1.dta saved
          
  . readin `fedtax';
          
      Result                           # of obs.
      -----------------------------------------
      not matched                             7
          from master                         7  (_merge==1)
          from using                          0  (_merge==2)
          
      matched                               420  (_merge==3)
      -----------------------------------------
  file /projects/DATA/new_output/data/scrap1.dta saved
          
  . readin `bea_income_nat'
  >       
  > drop if year > ${current_y};
  variables year month do not uniquely identify observations in the using data
  r(459); 
          
  end of do-file
          
  r(459);

Please tell me if you need more info. Thank you!

Comment

Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#4

24 Aug 2015, 14:30

Could you set trace on and then show the log file, starting with the line readin `bea_income_nat'? The error must be caused by the merge command.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#5

24 Aug 2015, 14:44

There's a missing semicolon here

Code:

readin `bea_income_nat' drop if year > ${current_y};
Comment

Announcement