Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Year response into 4 digit year using the century and if condition

    Hi family,
    I have a data set where the year of interview was taking from the last 2 digit. example, if it the respondent reported a year to be 1999 the year variable is captured as 99 and where it is reported as 2002, the year is captured as 2 or 2000 as 0. I wanted a command that could convert the last 2 digit back to 4digit. I tried severally but the system keep generating error.

    Below is what used;
    replace yearcent= (200*10+year, if year<10)|(19*100+year, if year>10)

    The error message was "year,ifyear invalid name"


    Find the data set below;

    input float(id day month year)
    103358 4 10 2
    103460 20 2 1
    103740 17 12 97
    102922 19 7 97
    103018 21 11 98
    103362 3 5 99
    103552 19 1 99
    102112 1 12 2
    102388 17 8 1
    102998 27 5 1
    103742 8 9 1
    103752 1 7 0
    103152 13 12 99
    102044 28 9 2
    102298 16 3 97
    102946 26 4 97
    103498 6 7 1
    103528 18 10 0
    103564 8 2 0
    103660 30 10 2
    102046 8 1 98
    102478 3 12 97
    103126 13 1 97
    103316 23 8 99
    102274 27 11 1
    102300 14 9 1
    102476 13 9 98
    102654 28 3 1
    103062 1 8 98
    103282 21 9 99
    103364 22 12 98
    103786 22 6 2
    102304 7 12 97
    102348 3 2 1
    102410 19 11 1
    103390 11 6 98
    102772 25 5 1
    103202 3 3 0
    103492 18 7 1
    102070 3 1 1
    102152 31 8 0
    102372 6 8 99
    102384 8 4 99
    102788 5 9 2
    102802 24 3 2
    103472 11 7 0
    103720 12 8 2
    102218 30 8 98
    102600 18 1 97
    102820 25 9 99
    102994 2 4 0
    103122 26 3 97
    103266 8 5 99
    103640 14 6 2
    102706 4 11 0
    103010 7 4 97
    103674 1 7 2
    103770 5 9 0
    102262 9 10 97
    102190 20 1 99
    103470 28 11 1
    103486 15 8 97
    102928 7 5 97
    103400 30 4 2
    103280 28 7 99
    103726 27 3 1
    102254 6 3 97
    102620 20 3 98
    103620 23 9 98
    102650 17 7 0
    102424 2 5 1
    102132 12 1 98
    102694 1 7 1
    102234 25 12 2
    102892 13 3 99
    102942 16 7 0
    103404 11 5 0
    102986 9 7 99
    102968 13 10 98
    103168 6 9 97
    103210 31 5 99
    103542 6 7 0
    102284 9 7 97
    102658 7 3 2
    102078 19 3 1
    102592 2 7 1
    103214 17 9 97
    103684 22 4 98
    102872 8 5 1
    103676 6 11 97
    102932 15 5 98
    103338 19 1 0
    102436 6 5 2
    103416 27 7 0
    102412 22 2 0
    102500 11 2 98
    102728 11 10 98
    103538 8 9 2
    103246 26 4 98
    103436 31 5 97
    end

    Kindly help me out.

    2. How do I format date of interview to comprise of day month and year (example: 22nd January 1998) using the above data set?

    Thank you.

  • #2
    Shamsudini:
    1) the simplest way is to code as follows:
    Code:
    . g yearcent= (200*10+year) if year<10
    
    
    . replace yearcent=(19*100+year) if year>10
    
    
    . list in 1/10
    
         +----------------------------------------+
         |     id   day   month   year   yearcent |
         |----------------------------------------|
      1. | 103358     4      10      2       2002 |
      2. | 103460    20       2      1       2001 |
      3. | 103740    17      12     97       1997 |
      4. | 102922    19       7     97       1997 |
      5. | 103018    21      11     98       1998 |
         |----------------------------------------|
      6. | 103362     3       5     99       1999 |
      7. | 103552    19       1     99       1999 |
      8. | 102112     1      12      2       2002 |
      9. | 102388    17       8      1       2001 |
     10. | 102998    27       5      1       2001 |
         +----------------------------------------+
    
    .
    2) see -help f_date- and related entriies in Stata .pdf manual.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Here is another way to do it:


      Code:
      gen wanted = cond(year <= 10, 2000 + year, 1900 + year)
      . 
      tab year 
      
             year |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         16       16.00       16.00
                1 |         20       20.00       36.00
                2 |         15       15.00       51.00
               97 |         19       19.00       70.00
               98 |         16       16.00       86.00
               99 |         14       14.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      
      . tab wanted
      
           wanted |      Freq.     Percent        Cum.
      ------------+-----------------------------------
             1997 |         19       19.00       19.00
             1998 |         16       16.00       35.00
             1999 |         14       14.00       49.00
             2000 |         16       16.00       65.00
             2001 |         20       20.00       85.00
             2002 |         15       15.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00

      Comment


      • #4
        Waoo! Carlo and Nick,

        I really appreciate your help. It is working perfectly.

        I have been able to do the 4digit year conversion but on the other hand, how do I combine the three variables "day", "month" and "yrad" as a single value called dateofInterview with values like (11 Jan 2022).
        I read the help command as Carlo suggested but I am still having challanges.

        Below is my command and the error feed back I am getting;
        I changed the day, the month from numeric to string and continued with the command below;

        gen dateofInterview=.
        replace actualdate= date( day ,"MD19Y"[,Y])

        feed back I got was

        "0 real changes made"

        Please guide me on this as well.

        I attached a sample for your review and further guide.
        clear
        input float id byte(day month) float yrad
        103358 4 10 2002
        103460 20 2 2001
        103740 17 12 1997
        102922 19 7 1997
        103018 21 11 1998
        103362 3 5 1999
        103552 19 1 1999
        102112 1 12 2002
        102388 17 8 2001
        102998 27 5 2001
        103742 8 9 2001
        103752 1 7 2000
        103152 13 12 1999
        102044 28 9 2002
        102298 16 3 1997
        102946 26 4 1997
        103498 6 7 2001
        103528 18 10 2000
        103564 8 2 2000
        103660 30 10 2002
        102046 8 1 1998
        102478 3 12 1997
        103126 13 1 1997
        103316 23 8 1999
        103472 11 7 2000
        103720 12 8 2002
        102218 30 8 1998
        102600 18 1 1997
        102820 25 9 1999
        102994 2 4 2000
        103122 26 3 1997
        103266 8 5 1999
        103640 14 6 2002
        end


        Thank you

        Comment


        • #5
          According to your data example day month yrad are all numeric and that is how they should be.

          Code:
          help datetime 
          
          help datetime display formats 
          
          gen dateofinterview = mdy(month, day, yrad) 
          
          format dateofinterview  %td

          Comment


          • #6
            Nick!
            I am most grateful for your time for the swift feed back.
            It is working so well!

            Finally sir, I want to drop all observation whose data was captured before the 31st December 1998. ie(dateofinterview<13st December 1998).

            What I did was before the format command,
            I type "drop if dateofinterview<14244" since the century value at 31st December 1998 = 14244.
            then before I run the format command.

            As curious as I am, after running the format command, I also try to drop the observation where "dateofinterview"<31st December 1998) using the formatted value.
            Below is the command I used

            drop if dateofinterview<31dec1998
            the error message was "31dec1998 invalid name"
            ​​​​​​​
            So is there anyway to do this better?

            input float id byte(day month)
            103358 31 12
            103460 20 2
            103362 3 5
            103552 19 1
            102112 1 12
            102388 17 8
            102998 27 5
            103742 8 9
            103752 1 7
            103152 13 12
            102044 28 9
            103498 6 7
            103528 18 10
            103564 8 2
            103660 30 10
            103316 23 8
            103472 11 7
            103720 12 8
            102820 25 9
            102994 2 4
            103266 8 5
            103640 14 6
            end

            Thank you sir.

            Comment


            • #7
              What Stata told you is correct. What you typed is not a valid name — nor is it a valid numeric value. mdy(12, 31, 1998) would be another way to specify a particular date.

              Comment


              • #8
                Hi Nick,
                I am most grateful for your time.
                I have tried it and it is working so well.

                Comment

                Working...
                X