Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transfer the b_date of CHNS into age

    Hello guys,

    I am so glad I found this forum, as I am new to work with STATA and I am struggling a lot right now. If anyone could help I would very much appreciate it.

    QUESTION:

    I am trying to analyse data of Chinese Health and Nutrition Survey, in the survey, age of children was expressed like this:
    Click image for larger version

Name:	QQ截图20161018150903.png
Views:	1
Size:	3.8 KB
ID:	1360605
    I want to get age of children,and the interview date is 891220, so I use"if b_date<=890000, gen age=b_date-890000",but I can't get the right answer. So how should I do?
    And I need to split the age into age groups of 0-1year, 1-2 years, 2-4 years. I have no clue.
    Please help, I feel very helpless.
    Thanks!

  • #2
    No; to Stata these are just large integers and you have not told Stata anything about their being dates. There is no homunculus inside Stata that will see that these are (representations of) dates and do the smart thing on your behalf. (How could it know whether 060715 was 6 July 1915, 6 July 2015, 7 June 1915, 7 June 2015, 15 July 2006, etc.? How could it tell when 60715 means 60,715?)

    You have to learn something about how Stata works with dates. There is no short cut avoiding that that does not produce pain and puzzlement.
    Start with help dates

    I can't copy and paste your data example (please see FAQ Advice #12; also do study all of that up to #18) but your interview date is evidently

    Code:
    .  di  daily(string(891220), "19YMD")
    10946
    
    .  di  %td daily(string(891220), "19YMD")
    20dec1989
    What I did there with display was

    1. feed a integer daily date to a numeric date function daily() with information about its structure: the century 19?? is tacit, then we have year, month and day elements. daily() expects a string argument.

    2. push the resulting integer 10946 through a daily date display format, here
    %td

    With a variable you need similar but different syntax: Here is a minimal example:

    Code:
    clear
    set obs 1 
    gen b_date = 860513 
    gen nb_date = daily(string(b_date), "19YMD") 
    format nb_date %td 
    list 
    
         +--------------------+
         | b_date     nb_date |
         |--------------------|
      1. | 860513   13may1986 |
         +--------------------+

    Age is a little trickier. A common convention in epidemiology is be unfussy about precise length of years

    Code:
     
    gen age1 = (10946 - nb_date) / 365.25
    A more exact calculation is provided by
    personage (SSC) which you must install before you can use it:

    Code:
    ssc inst personage
    Code:
     
    personage nb_date, currdate(10946) gen(age2y age2d)
    
    list 
    
         +-----------------------------------------------+
         | b_date     nb_date       age1   age2y   age2d |
         |-----------------------------------------------|
      1. | 860513   13may1986   3.605749       3     221 |
         +-----------------------------------------------+
    Code:
    
    


    Please note that
    personage is currently on my to-check list, as a user reports small problems with it.

    But it gives you number of completed years, which is what you want for your classification.

    Comment


    • #3
      Thanks so much for your patient and perfect answer.
      I tried the code :
      Code:
      di  daily(string(891220), "19YMD")
      di  %td daily(string(891220), "19YMD")
      gen nb_date = daily(string(b_date), "19YMD")
      format nb_date %td 
      list
      gen age1 = (10946 - nb_date) / 365.25
      personage nb_date, currdate(10946) gen(age2y age2d)
      list nb_date age2y age2d
      and it comes out:
      +---------------------------+
      | nb_date age2y age2d |
      |---------------------------|
      1. | 13may1986 3 221 |
      2. | 04nov1987 2 46 |
      3. | 06aug1987 2 136 |
      4. | 28apr1988 1 236 |
      5. | 24feb1988 1 299 |
      |---------------------------|
      6. | 01sep1985 4 110 |
      7. | 09jan1988 1 345 |
      8. | 25apr1989 0 239 |
      9. | 05jun1989 0 198 |
      10. | 28aug1989 0 114 |
      |---------------------------|
      11. | 25nov1987 2 25 |
      12. | 03jan1985 4 351 |
      13. | 30nov1987 2 20 |
      14. | 23jan1985 4 331 |
      15. | 01may1985 4 233 |
      |---------------------------|
      16. | 29apr1983 6 235 |
      17. | 30may1986 3 204 |
      18. | 20may1983 6 214 |
      19. | 11jul1986 3 162 |
      20. | 05jan1986 3 349 |
      --more--
      That's exactly what I want. but I have a little question:what the 19 in "19YMD" stands for?
      PS: need to split the age into age groups of 0-1year, 1-2 years, 2-4 years, what should I do?

      Comment


      • #4
        19 stands for the century start. 1986 starts 19??.

        The classification is easy. See recode or use something like

        Code:
         
        gen class = cond(age2y < 2, age2y, 2)

        Comment


        • #5
          I tried this code:
          code:
          Code:
          gen age_group = 0 if age2y< 1
          replace age_group = 1 if age2y>= 1 & age2y < 2
          replace age_group = 2 if age2y >= 2 & age2y <=4
          tabulate age_group
          It comes out:
          age_group | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 197 15.25 15.25
          1 | 271 20.98 36.22
          2 | 824 63.78 100.00
          ------------+-----------------------------------
          Total | 1,292 100.00

          That's amazing. But what if I want to use age_group to do regression,how should I use this age_group data file?

          Comment


          • #6
            My code was more concise, if also more cryptic possibly....

            Why use age group for regression at all? You would be throwing away information and implying that children change by jumps on their 1st and 2nd birthdays and never in between. Sounds like nonsense to me.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              My code was more concise, if also more cryptic possibly....

              Why use age group for regression at all? You would be throwing away information and implying that children change by jumps on their 1st and 2nd birthdays and never in between. Sounds like nonsense to me.
              Sorry to intrude,can I ask a question? I transformed a sas format file into dta file so I can open it in STATA, but the format of variable---date of birth was changed, the original format is year-month-date, but the dta file just shows year, what should I do to solve this problem?Thanks very much.

              Comment


              • #8
                This looks like a completely new question unrelated to the title of the thread, so please start a new thread.

                I don't know anything about SAS which I have never used.

                My advice is limited to pointing out that Stata is so spelled.

                Comment

                Working...
                X