Replacing incorrect DOB values

Waewnet Sukkasem

Join Date: Sep 2017

Posts: 13
#1

Replacing incorrect DOB values

26 Sep 2017, 16:54

Hi Guys,

I'm very new here. This is my first question. I try to replace some date of birth values that were entered incorrectly. However, I tried many ways but it doesn't work. Can you please help me? Many thanks.

Type and Format of DATEOFBIRTH is str10 and %10s respectively.

replace DATEOFBIRTH =="01/07/1964" if DATEOFBIRTH =="01/07/0964"
== invalid name
r(198);

Many thanks.
Waewnet
Tags: data, string, syntax
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

26 Sep 2017, 17:24

Code:

replace DATEOFBIRTH ="01/07/1964" if DATEOFBIRTH =="01/07/0964"

In Stata you must always distinguish between = and ==. = is used only to mean: what is on the right is to be calculated and stored in the place indicated by the left hand side. == is used only to be the relational operator "is equal to." They are never interchangeable.* If you use either where the other is called for, you will always get an error.

This command requires both: the first is a single =, and the second is a double ==.

*That's a slight overstatement. There are a few odd places in Stata syntax where both will be accepted, but they are exceptional, and it is best to form your programming habits as if you can never do that. Always use = for storing the result of a calculation and use == for "is equal to."

Last edited by Clyde Schechter; 26 Sep 2017, 17:27.
1 like
Comment
Waewnet Sukkasem

Join Date: Sep 2017

Posts: 13
#3

26 Sep 2017, 18:57

Thank you very much Clyde. Really appreciated your explanation.

Best,
Waewnet
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

27 Sep 2017, 04:34

Clyde Schechter Yesterday, I was about to comment on the "rule" that = and == are "never interchangeable". Well, to demonstrate it is truly a rule, we cannot help but try hard to find an exception.

Below, an exception to this rule. Interesting enough, it concerns a frequently-used test:

Code:

. set obs 100
number of observations (_N) was 0, now 100

. gen pre = rnormal()*1.5

. gen post = rnormal()*2

. ttest pre = post

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |     100   -.2168539    .1445549    1.445549   -.5036822    .0699744
    post |     100    .1363868    .1941968    1.941968   -.2489418    .5217153
---------+--------------------------------------------------------------------
    diff |     100   -.3532407    .2333843    2.333843   -.8163258    .1098445
------------------------------------------------------------------------------
     mean(diff) = mean(pre - post)                                t =  -1.5136
 Ho: mean(diff) = 0                              degrees of freedom =       99

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.0667         Pr(|T| > |t|) = 0.1333          Pr(T > t) = 0.9333

. ttest pre == post

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |     100   -.2168539    .1445549    1.445549   -.5036822    .0699744
    post |     100    .1363868    .1941968    1.941968   -.2489418    .5217153
---------+--------------------------------------------------------------------
    diff |     100   -.3532407    .2333843    2.333843   -.8163258    .1098445
------------------------------------------------------------------------------
     mean(diff) = mean(pre - post)                                t =  -1.5136
 Ho: mean(diff) = 0                              degrees of freedom =       99

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.0667         Pr(|T| > |t|) = 0.1333          Pr(T > t) = 0.9333

Best regards,

Marcos

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#5

27 Sep 2017, 09:32

Marcos Almeida Yes. In fact the -ttest- command was what I had in mind when I added my disclaimer at the end of #2. There are some others as well. I believe -ranksum- also accepts them interchangeably, just like -ttest-. I can't think of any others. I think it is also worth noting that both of these commands are ancient: they have been in Stata at least since I started using it (version 4), and I suspect they might even date back to version 1! I'm almost certain there are no modern Stata commands that allow this, and I think it would be foolish for StataCorp to introduce any at this point. Keeping a sharp distinction between = and == is, I think, an important part of the coherent design of programming languages.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

27 Sep 2017, 09:45

I fully agree with Clyde Schechter .

Actually, it was by serendipity that I "discovered" this particularity with regards to - ttest - command: a hand-out for a group of students was printed with the "mistake", and, lo and behold, I checked it out, and the command worked perfectly.

With regards to "new" commands, once (I gather more than one year ago), I noticed the same "issue" with regards to a new command. I cannot tell for sure, but I believe it was something concerning margins. A member in the forum had some "error message" and, at the beginning, I thought it was due to the wrong choice of "=". To my surprise, it wasn't. And both options provided the same results. I'm not sure whether I can find this message, but I'll give it a try.

Again and again, these are surely exceptions and, as I remarked, they in fact do confirm the rule you underlined.

Best regards,

Marcos
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#7

27 Sep 2017, 15:34

Found it!

Here, oddly enough, we get a new exception, this time with a truly "modern" command, as I remarked in this link in #3.

Then (as it would be if posted now), my recommendation was the same as Clyde's, i.e., to keep the best level of Stata's grammar.

Edited to clarify an important issue: the command - ranksum - won't allow any type of equal sign (double or single), because it only accepts "by".

With regards to the counterpart of the paired t test, we observe that the grammar is truly orthodox for the command - signrank - :

Code:

. webuse fuel . signrank mpg1 = mpg2 Wilcoxon signed-rank test sign | obs sum ranks expected -------------+--------------------------------- positive | 3 13.5 38.5 negative | 8 63.5 38.5 zero | 1 1 1 -------------+--------------------------------- all | 12 78 78 unadjusted variance 162.50 adjustment for ties -1.63 adjustment for zeros -0.25 ---------- adjusted variance 160.63 Ho: mpg1 = mpg2 z = -1.973 Prob > |z| = 0.0485 . signrank mpg1 == mpg2 == invalid name r(198);

I hope to have helped.

Last edited by Marcos Almeida; 27 Sep 2017, 15:48.

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#8

27 Sep 2017, 18:30

Marcos, you are right. I was mis-remembering -ranksum-, when it was really -signrank-.
Comment

Announcement

Replacing incorrect DOB values

Comment

Comment

Comment

Comment

Comment

Comment

Comment