Append Error; r106; How can I solve this problem step-by-step? I have Stata version 15

Roman Johnson

Join Date: Mar 2018

Posts: 87
#1

Append Error; r106; How can I solve this problem step-by-step? I have Stata version 15

30 Mar 2018, 17:57

Hi, I am trying to append files from the same dataset but different years.

This is my line of code in my do file:

use "C:\Users\rjohn123\Documents\hh02dta_b3a\iiia_tb.d ta", clear
keep tb21_2 tb02_1
append using "C:\Users\rjohn123\Documents\hh05dta_b3a\iiia_tb.d ta", nol
append using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", nol
sort folio ls
save "C:\Users\rjohn123\Documents\employementappend.dta ", replace
keep tb21_2 tb02_1

This is my output:

. use "C:\Users\rjohn123\Documents\hh02dta_b3a\iiia_tb.d ta", clear
(VERSION (ene 19))

. keep tb21_2 tb02_1

. append using "C:\Users\rjohn123\Documents\hh05dta_b3a\iiia_tb.d ta", nol
(note: variable tb02_1 was byte, now float to accommodate using data's values)
(note: variable tb21_2 was long, now double to accommodate using data's values)

. append using "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta", nol
variable tb20_1 is byte in master but str1 in using data
You could specify append's force option to ignore this numeric/string mismatch. The using variable would then be treated as
if it contained numeric missing value.
r(106);

end of do-file

r(106);

Can someone please tell me step-by-step how to solve this problem with the append command that I am having?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

30 Mar 2018, 18:20

So, the message is self-explanatory. In the files in the hh_02 and hh_05 directories, the data is set up with variable tb20_1 as a numeric variable. For some reason in hh_09 it is a string variable. So you have to fix the hh_09 data set and convert it to numeric. Without knowing what's in variable tb20_1 in those data sets, it is hard to say what is the correct way to do this. Please post example data from hh_02 and also from hh_09, including the variable tb20_1. In order for it to be helpful, you must use the -dataex- command to do this. Do not post an HTML table or something from a spreadsheet. And do not post a screenshot. Use -dataex- so that you give a complete and faithful replica of all the important details. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Roman Johnson

Join Date: Mar 2018
Posts: 87

30 Mar 2018, 18:36

Sure thing, Clyde. I will show the data from now on.

data from hh_02 for tb20_1

dataex tb20_1

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte tb20_1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 19755 observations
Use the count() option to list more

.
data from hh_09 for tb20_1

use "C:\Users\rjohn123\Documents\hh09dta_b3a\iiia_tb.d ta"
(Ennvih-3 Libro 3a_portad)

. dataex tb20_1

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 tb20_1
"."
"."
"."
"."
"."
"."
"."
"3"
"."
"."
"."
"."
"3"
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"3"
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"3"
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
"."
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 24944 observations
Use the count() option to list more

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

30 Mar 2018, 19:01

OK, in the hh_09 data set, run

Code:

destring tb20_1, replace

Then re-save the data set. Now you can run your -append- code.

This is, by the way, a very common problem. Even when data sets are obtained from reliable sources with strong reputations for being good curators of data, one frequently finds data sets where the same variable is stored differently in two data sets from the same family of data sets. Or where two data sets have different names for what should be the same variable, or the same name for what should be two different variables, or different numerical encoding of short-answer variables in different data sets. These things create no end of headaches, but I guess it creates full employment for data analysts.

Anyway, if you have even more of these files, don't be surprised if a similar problem arises farther down the line!

When I have a bunch of data sets that I need to combine, my own practice is to review each one separately first, and then decide on what I want the combined data set to look like in terms of data storage types (string vs numeric, type of numeric), variable names, encoding of short-answer variables. Then I go through each one separately and make whatever changes are needed to enforce conformity with that plan. Only then do I append (or merge) them together. It's tedious, but I find it less frustrating than having the append sequence repeatedly break in the middle. Some people prefer to put the code that cleans up the data sets inside the loop of appends. The drawback to that is that usually there are several different incompatibilities that need to be corrected overall, but any individual file typically only needs 1 or 2 of them. So I think it makes the code unnecessarily complicated to do it all at once. But admittedly, it is tedious writing a cleaning strip for every separate file.
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#5

02 Apr 2018, 13:56

Hi, Clyde:

I destringed that variable in hh09 and I am still getting the same error message. What other things could you offer?
Comment
Roman Johnson

Join Date: Mar 2018

Posts: 87
#6

02 Apr 2018, 13:56

And I went through each file hh02, hh05, and of course, hh09 to make sure each was destringed.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#7

02 Apr 2018, 14:13

I destringed that variable in hh09 and I am still getting the same error message. What other things could you offer?

I can think of a few possibilities. One is that you didn't save the hh09 data set after you did the -destring-ing. So you were just -append-ing the original data anyway.

The other possibility is that the -destring- operation failed. Did you get any message after -destring-? If any of the values of that hh09 variable contains something that is not actually a number, then -destring- will refuse to proceed, and will give you its complaint. This happens when, for example, some values are recorded as "N/A" or "-" or the like. In this case, with the hh09 data in memory, run:

Code:

browse tb20_1 if missing(real(tb20_1))

and Stata will show you all of the offending items. You can then figure out how to fix them. This might be as simple as using the -ignore()- option in -destring- or might require some more complicated data cleaning with -replace- commands.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

02 Apr 2018, 14:25

Considering the difficulties, how about using the - force - option. Then, you could check the occasions where the new variable is missing and the old one not at all.

Best regards,

Marcos
Comment
Ayesha Irfan

Join Date: May 2020

Posts: 5
#9

20 May 2020, 09:33

Originally posted by Clyde Schechter View Post

This is, by the way, a very common problem. Even when data sets are obtained from reliable sources with strong reputations for being good curators of data, one frequently finds data sets where the same variable is stored differently in two data sets from the same family of data sets. Or where two data sets have different names for what should be the same variable, or the same name for what should be two different variables, or different numerical encoding of short-answer variables in different data sets. These things create no end of headaches, but I guess it creates full employment for data analysts.

Anyway, if you have even more of these files, don't be surprised if a similar problem arises farther down the line!

When I have a bunch of data sets that I need to combine, my own practice is to review each one separately first, and then decide on what I want the combined data set to look like in terms of data storage types (string vs numeric, type of numeric), variable names, encoding of short-answer variables. Then I go through each one separately and make whatever changes are needed to enforce conformity with that plan. Only then do I append (or merge) them together. It's tedious, but I find it less frustrating than having the append sequence repeatedly break in the middle. Some people prefer to put the code that cleans up the data sets inside the loop of appends. The drawback to that is that usually there are several different incompatibilities that need to be corrected overall, but any individual file typically only needs 1 or 2 of them. So I think it makes the code unnecessarily complicated to do it all at once. But admittedly, it is tedious writing a cleaning strip for every separate file.

This is interesting, because I am facing the exact same numeric/string issue in files I am trying to append. I would manually inspect them all but there are close to 200 files so I'm looking for a better solution. But not only that, I had already written very precise destringing commands along with other arithmetic operations on the offending variables in a nested loop. It never returned an error with those even though, according to the append error, some files have the variable as a string (numeric in others) which should have returned an error when I ran commands to add/multiply stuff to it.

Though I cannot figure out why this is happening, I would at least like to know which file(s) is(are) causing trouble.

The error I get is:
variable Volume is double in master but strL in using data. You could specify append's force option to ignore this string/numeric mismatch. The using variable would then be treated as if it contained "".

The append code I used was:

Code:

fs "`c'*.dta" return list append using `r(files)'

This worked perfectly fine for 20 previous files that the loop ran and appended from the list c.

At this point, I just wanna know which file has Volume as string instead of numeric.
Comment

Announcement