Error in dictionary file

Irene Solmone

Join Date: Jul 2021

Posts: 20
#1

Error in dictionary file

29 Oct 2021, 07:05

I have a dataset in .txt, which I have to translate with a dictionary file. I have written the dictionary file according to instructions from the owners of the dataset. However, there seems to be a problem with one variable (that I know of, but there might be other problems too). The variable STDIND presents with missing observations (about 80% of all observations), even though it should not. If I run the code in R (written in R language) it works, without giving me missing values, so there must be a problem with my Stata code (and I have to use Stata). This is how I wrote the dictionary file:

Code:

dictionary { _column(1) int ANNO %4f "ANNO" _column(5) int TRIM %1f "TRIM" _column(6) int REG %2f "REG" _column(8) int numcff %2f "SG4" (...) _column(587) int STDFAM %6f "STDFAM" _column(593) int STDIND %6f "STDIND" _column(599) int NN2 %1f "NN2" _column(600) int RPN2 %1f "RPN2" _column(601) int TF %2f "TF" _column(603) int TN2 %1f "TN2" _column(604) int F0_14 %1f "F0_14" _column(605) int CP0_7 %1f "CP0_7" _column(606) int CITTAD %1f "CITTAD" _column(607) int WAVQUA %1f "WAVQUA" _column(608) int nasita %1f "SG13" _column(609) int citita %1f "SG16" _column(610) int annres %3f "SG18" _column(613) int NASSES %3f "NASSES" _column(616) int CITSES %3f "CITSES" _column(619) int RAPSES %3f "RAPSES" }

And this is the code I used to apply the dictionary file (2005_Q2_dict.dct is the dictionary file, sta_2005_2.txt is the dataset in txt):

Code:

clear infile using "$PathDict/2005_Q2_dict.dct", using("$Path05Q2/sta_2005_2.txt")

Is there a problem with the code? Am I doing something wrong, or missing something?
Tags: .txt, data set, dictionary, import, Infile
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

29 Oct 2021, 07:37

To start on solving this problem, try changing

Code:

_column(593) int STDIND %6f "STDIND"

to

Code:

_column(593) str6 STDIND %6s "STDIND"

to read STDIND as a string. Then, look at the values of STDIND to see what does not look like a number. Perhaps

Code:

tab STDIND if real(STDIND)==.

will help you find values of STDIND that Stata translates into missing values when converting text to numeric values.
Comment
Irene Solmone

Join Date: Jul 2021

Posts: 20
#3

29 Oct 2021, 08:13

Thank you so much!! This works.
Comment
Irene Solmone

Join Date: Jul 2021

Posts: 20
#4

29 Oct 2021, 08:54

William Lisowski could you tell me how to determine whether to choose int or str, when I have numeric values (if there is some sort of criteria)?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

29 Oct 2021, 09:39

This works.

I don't think you understood what is suggested in post #2.

Of course reading STDIND as a string "works" in the sense that it doesn't create any missing values when the contents of columns 593-598 contain something that Stata cannot interpret as a number. But if STDIND is supposed to have numeric values, then having it as a string does not "work" if you are need to use it as a variable in your analysis.

If STDIND "has numeric values" then you must get it to be a numeric variable. It is most convenient to read it as a number, but if that is not possible, you will have to read it as a string and then perhaps change certain values before converting the string to a number. How you do that depends on what values it has. What were the results when you ran

Code:

tab STDIND if real(STDIND)==.

You also wrote in post #1

there seems to be a problem with one variable (that I know of, but there might be other problems too)

You will learn of all potential problems by running

Code:

misstable summarize, all

after your data has been read to find out what numeric variables have missing values, and a few details about the values each takes.
Comment
Irene Solmone

Join Date: Jul 2021

Posts: 20
#6

30 Oct 2021, 02:36

Yes I understood what you meant, but thank you for the suggested command!
Comment

Announcement

Error in dictionary file

Comment

Comment

Comment

Comment

Comment