Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting string date to numeric and getting inconsistent result

    I have a new data set with string date variable entered as DD/MM/YYYY (i.e., 05/24/1975).

    The inconsistent date conversion results I'm getting in Stata is that only about half the cases in the date variable are converting, with the other half being blank and generating this error:
    (26 missing values generated)
    The syntax used (Stata 14; windows 7 sys)

    Code:
    gen date2 = date(birthdate, "DMY")
    format date2 %td
    Details:
    The data was digitally "captured" from scanned (PDFs & TIFs) questionnaires (nd exported to Microsoft Access, then into Stata format via Stat/Transfer. The data capture software was "Remark Office OMR 2014". This is my first experience using Remark (FYI - mostly positive so far). I am working with a small sample of data (n=47).
    • The "birthdate" variable was entered manually into Remark as a "textual" variable (their term for alphanumeric variable).
    • In Access the birthdate variable properties are "Text" with filed size = 25; no input mask or other properties are set. I have confirmed that all birthdates are correctly entered (dd/mm/yyyy)
    • In Stata the birthdate variable is "str12" type , with format "%12s". All data in this variable looks as expected.
    • In Stata the new variable generated to convert birthdate to a numeric date (with half the cases successfully generated) is a "float" type with format "%tdCCYY-NN-DD"
    I do not understand why only cases/rows #1 thru 21 will convert the date from strring to numeric date (with rows #22-47 left blank).

    One other oddity: IN this small data set I manually inserted a new string variable and manually entered string dates in to the column. Then I tried to convert the new var string date to a new variable )using the same syntax as above. Surprisingly, only rows 1-21 were converted with rows 22-47 being blank, just like the imported variable.

    Any suggestions as to what I am doing wrong? Could this be a Stata anomaly?

    . . . or could the Remark software be embedding something I can't spot?

    Thanks very much (in advance),
    Wendy


    Cheers, wg
    ~ ~
    sapere aude ~~

  • #2
    My guess would be a data anomaly. Your best bet would be to contact Stata's techsupport with that dataset, so they can look at what is going on.

    You can find them here: http://www.stata.com/support/tech-support/contact/
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      You might also want to post your data here; just the column(s) with dates that are misbehaving, so that others can have a look as well.

      Comment


      • #4
        You stated that you have

        DD/MM/YYYY (i.e., 05/24/1975)
        However, 05/24/1975 is clearly MM/DD/YYYY.

        There is no automatic error correction in Stata such as people apply. If you try simple examples

        Code:
        . di daily("05/24/1975", "DMY")
        .
        
        . di daily("05/24/1975", "MDY")
        5622
        
        . di %td  daily("05/24/1975", "MDY")
        24may1975
        you will see that Stata just plays dumb with such a date presented as DMY (month 24? no such month, so missing) rather than flip around the day and month on your behalf.

        That may account for some of your problems. What will be worse for you is if there is inconsistency in the raw data, as then you will have to fight hard to work out whether "03/07/1975" means March 7 or July 3 (or 7 March or 3 July, whichever anyone prefers).

        (Incidentally, after reading the help, my second tip for working with dates is to use display with simple examples to understand what is going on.)

        Comment


        • #5
          Nick -- thanks for pointing out my "typo". That is the result of my sloppy typing in this post, and not an actual example of one of my data values. Before attempting the conversion, I did verify that all string dates were valid values. I apologize for the confusion.
          Cheers, wg
          ~ ~
          sapere aude ~~

          Comment


          • #6
            OK, but in that case Jorrit's advice to post the data remains cogent, as so far the flavour is that you have puzzling results, and don't know why.

            Comment


            • #7
              Attached is data file (n = 47) with the birthdate, and the newvar date2 that resulted in this post.

              Code:
              gen date2 = date(birthdate, "DMY") 
               format date2 %td
              Much thanks,
              wg
              Attached Files
              Cheers, wg
              ~ ~
              sapere aude ~~

              Comment


              • #8
                Wendy:
                I probably spotted the issue:
                Stata legally returns missing values when day and month to be translated from string to date are inverted (eg.: 05/22/1981 which should be rewritten as 22/05/1981).
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thanks for posting the data. I agree with Carlo.

                  The problem I guessed at in #4 really is the issue. Your data mix DMY and MDY.

                  It's striking that this is for the same interviewer. Sometimes it is too late to drill down and discover the reason for data problems. Perhaps they were using different conventions on different days of interview.

                  Code:
                    
                  gen date2 = daily(birthdate, "DMY")
                  replace date2 = daily(birthdate, "MDY") if missing(date2)
                  Notes:

                  1. I use daily() not date() here. it's the same function with the different names. I have seen people treat date() as a generic date function too many times here to want to recommend its use.

                  2. The problem of ambiguous dates remains. However, at worst it means your people are a few months off in their age. For most medical or epidemiological problems for adults, my wild guess is that that is just noise.
                  Last edited by Nick Cox; 20 Jan 2016, 05:34.

                  Comment


                  • #10
                    Thanks Nick and Carlo -- The daily() function is a good idea.

                    Also, I thought I had taken the MDY vs DMY into account already, but I am relieved that y'all spotted this.
                    Cheers, wg
                    ~ ~
                    sapere aude ~~

                    Comment


                    • #11
                      This positively-closed thread reminds me that posting Stata codes and attachments in Stata format(s) worths more than thousand words attempting to describe what the problem is.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment

                      Working...
                      X