Destring returns "contains nonnumeric characters; no replace"

Andrew Musau

Join Date: Oct 2014

Posts: 10197
#31

16 Mar 2020, 14:38

It is indeed interesting! Gotcha2 explains it:

One feature of preserve that catches many users by surprise is that, if you preserve data in a do file, the data is automatically and silently restored when the do file finishes even if no restore command has been reached (including when the do file crashes!)

Adding the line

Code:

tab enquetecible

within the dofile will clear the mystery.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#32

16 Mar 2020, 14:50

I've tried but I am still getting the same 95 observations for all the variables I try to tabulate.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10197

#33

16 Mar 2020, 15:05

Add the following two commands before restore at the end of the do-file and copy and paste the output as I do below. Make sure the last statement that you see is "end of do-file".

Code:

tab Educ_1
tab enquetecible

My output:

Code:

. tab Educ_1

 Ecoles: organisée & |
         bien gérées |      Freq.     Percent        Cum.
---------------------+-----------------------------------
 En  total désaccord |          9        9.47        9.47
       En  désaccord |         35       36.84       46.32
              Neutre |         19       20.00       66.32
            D'accord |         28       29.47       95.79
Tout à fait d’accord |          4        4.21      100.00
---------------------+-----------------------------------
               Total |         95      100.00

. tab enquetecible

                Enquêté cîble |      Freq.     Percent        Cum.
------------------------------+-----------------------------------
                       Ménage |      1,258       92.98       92.98
Ménage et Unité de production |         95        7.02      100.00
------------------------------+-----------------------------------
                        Total |      1,353      100.00

. *********************************************************************
. /*
>
> restore
> *************************************
> ** MODULE CIVISME FISCAL **
> *************************************
>

end of do-file

Comment

Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#34

16 Mar 2020, 19:06

Dear Andrew, Thank you for the response. the added part is available in the dofile I have which has another part that I didn't put in the one I've uploaded here. However, from the output above, it appears clearly that though we are supposed to have the output of (tab Educ_1) for 1353 observations, it is appearing only for the 95 onces representing a small part of the set of units which have responded to the questionnaire. That is where my true problem resides. I have no clue on what can be the reason for that exclusion of the rest of observations.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#35

16 Mar 2020, 21:49

In fact, if I try

Code:

keep if inlist(enquetecible,1)

or

Code:

keep if inlist(enquetecible, 2)

all the variables loose their observations, even those for which there is no missing observations.!
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10197

#36

17 Mar 2020, 03:19

I suggest that you look at the responses for the variable "Educ_1". While it is true that you have 1353 observations for the variable "enquetecible", you have only 95 responses for the former.

Code:

. tab Educ_1 if enquetecible==3

                   Educ_1 |      Freq.     Percent        Cum.
--------------------------+-----------------------------------
                D'accord  |         28       29.47       29.47
    En  total
désaccord   |          9        9.47       38.95
          En  désaccord   |         35       36.84       75.79
                   Neutre |         19       20.00       95.79
     Tout à fait d’accord |          4        4.21      100.00
--------------------------+-----------------------------------
                    Total |         95      100.00

. tab Educ_1 if enquetecible==1
no observations

.

Comment

Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#37

17 Mar 2020, 14:15

I suggest that you look at the responses for the variable "Educ_1". While it is true that you have 1353 observations for the variable "enquetecible", you have only 95 responses for the former.

No, in the dataset that I've attached in post #21 even for the "Educ_1" there more than 1200 non-missing observations. It is this situation that is really difficult to understand.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10197

#38

17 Mar 2020, 14:23

We have different datasets then. Importing the dataset and without doing anything else, this is what I get. From my side, I see nothing inconsistent.

Code:

. import excel "C:\Users\709554\Desktop\BD_all1_versions_25.01.2020.xlsx", sheet("perception_qlty") firstrow cl
> ear
(549 vars, 1,682 obs)

. tab Educ_1

                   Educ_1 |      Freq.     Percent        Cum.
--------------------------+-----------------------------------
                D'accord  |         28       29.47       29.47
    En  total
désaccord   |          9        9.47       38.95
          En  désaccord   |         35       36.84       75.79
                   Neutre |         19       20.00       95.79
     Tout à fait d’accord |          4        4.21      100.00
--------------------------+-----------------------------------
                    Total |         95      100.00

EDIT: Maybe the confusion is caused by how you define an observation. Yes, the dataset has 1682 observations, but when you tabulate, you get the frequency of non-missing observations for a given variable. The above shows that Educ_1 has 95 non-missing observations.

Last edited by Andrew Musau; 17 Mar 2020, 14:37.

Comment

Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#39

17 Mar 2020, 16:59

Dear Andrew,
Many thanks for all. there was a replication of variables in the datasets and when I wrote my dofile I did not realize that. So while I was referring to the replicates of the same variables with their number of observations STATA was considering the initial ones which were restricted to just a sub-sample. After being aware of this issue I think everything is now fine with. I've just obtained this:

Code:

. tab contact12m_Mairie, nolab contact12m_ | Mairie | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,488 94.30 94.30 1 | 59 3.74 98.04 3 | 31 1.96 100.00 ------------+----------------------------------- Total | 1,578 100.00

Many thanks.
Comment
Reetamarghya Dey

Join Date: Jul 2024

Posts: 2
#40

24 Jan 2025, 20:58

I also have a similar question. I want to destring a non-numeric variable named iid1 to numeric. For example, one of the observations of iid1 is "Q1V1110002110101". I have also used the "encode" code [for instance, encode iid1, gen (iidnew)] but STATA cannot run it stating, "too many values". Can anyone give me any suggestion to rectify this issue? It is essential for me to run a nested logit model.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#41

25 Jan 2025, 01:20

#40 You should not try to destring a variable with values like "Q1V1110002110101". It does not qualify as having numeric content presented as string.

As encode fails given too many distinct values, and you need a numeric identifier, try

Code:

egen long numid = group(iid1)

Last edited by Nick Cox; 25 Jan 2025, 01:22.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#42

25 Jan 2025, 03:38

Since this thread started Clyde Schechter and I wrote https://journals.sagepub.com/doi/epd...867X1801800413 as an attempt to bring together the most common details you might need to know in this territory.
Comment
Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#43

27 Jan 2025, 14:57

Hi Reetamarghya Dey! That looks a lot like a string from the PLFS. I think it's encoding something along the lines of:

Quarter: Q1
Visit: V1
Sector: 1 (Rural)
State: 10 (Bihar)
District: 0021 (Khagaria)
Region: 101 (Northern)
Stratum: 1

(I don't understand why district appears to be four digits instead of two digits, but maybe it's been through a processing step?)

This is an unusual set of characteristics to group together into a single variable. Most of the time you'd be better served by importing these bytes as seven different variables. Perhaps if you let us know what you intended to do with this variable we'd be better able to advise you how to deal with it. For example, maybe what you really want is a variable that takes a unique value for each stratum?
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment