Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating variables based on date

    Dear Statalisters,
    I am sorting out the database, I post as follows.
    It offers firm-year data, including ID, Date of birth, Status(Active/Failed), Date of Status change, and year. For "Date of Status change", taking ID=001 as an example, it means on 10/21/2021, it failed.

    I want to generate two variables.

    The first is "Month_in_business", for example, for ID=001, it is failed on 10/21/2021, so in end of 2019, Month_in_business=(2019-2018)*12+(12-01)=23; in end of 2020, Month_in_business=(2020-2018)*12+(12-01)=35; in 2021, it failed on 10/21/2021, Month_in_business=(2021-2018)*3+(10-01)=45.

    The second is "Failure", for example, for ID=001, it failed in 2021, so in 2019, Failure=0; in 2020, Failure=0; in 2021 Failure=1.

    I know it is tricky, but I need to use the Cox model for survival analysis, I need to have these two variables.

    I appreciate your help in advance!

    Best regards,
    Josh



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID str10 Dateofbirth str38 Status int(Date_of_status_change year)
    001 "2018-01-04" "Failed" 10/21/2021 2019
    001 "2018-01-04" "Failed" 10/21/2021 2020
    001 "2018-01-04" "Failed" 10/21/2021 2021
    002 "2018-01-04" "Active"            . 2018
    002 "2018-01-04" "Active"            . 2019
    002 "2018-01-04" "Active"            . 2020
    002 "2018-01-04" "Active"            . 2021
    003 "2018-07-03" "Failed" 11/5/2021 2018
    003 "2018-07-03" "Failed" 11/5/2021 2019
    003 "2018-07-03" "Failed" 11/5/2021 2020
    004 "2018-05-03" "Failed" 11/29/2019 2018
    004 "2018-05-03" "Failed" 11/29/2019 2019
    end
    format %tdnn/dd/CCYY Date_of_status_change
    ------------------ copy up to and including the previous line ------------------


  • #2
    The following code works with your example data modified (see below):
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long ID str10 Dateofbirth str38 Status str10 Date_of_status_change int year
    1 "2018-01-04" "Failed" "10/21/2021" 2019
    1 "2018-01-04" "Failed" "10/21/2021" 2020
    1 "2018-01-04" "Failed" "10/21/2021" 2021
    2 "2018-01-04" "Active" ""           2018
    2 "2018-01-04" "Active" ""           2019
    2 "2018-01-04" "Active" ""           2020
    2 "2018-01-04" "Active" ""           2021
    3 "2018-07-03" "Failed" "11/5/2021"  2018
    3 "2018-07-03" "Failed" "11/5/2021"  2019
    3 "2018-07-03" "Failed" "11/5/2021"  2020
    4 "2018-05-03" "Failed" "11/29/2019" 2018
    4 "2018-05-03" "Failed" "11/29/2019" 2019
    end
    
    gen date_of_birth = daily(Dateofbirth, "YMD"), after(Dateofbirth)
    assert missing(date_of_birth) == missing(Dateofbirth)
    drop Dateofbirth
    format date_of_birth %tdnn/dd/CCYY
    
    gen date_of_status_change = daily(Date_of_status_change, "MDY"), ///
        after(Date_of_status_change)
    assert missing(date_of_status_change) == missing(Date_of_status_change)
    drop Date_of_status_change
    format date_of_status_change %tdnn/dd/CCYY
    
    gen end_of_year = mdy(12, 31, year)
    format end_of_year %tdnn/dd/CCYY
    gen months_in_business ///
        = datediff(date_of_birth, min(date_of_status_change, end_of_year), "m")
    gen byte failed = (year == yofd(date_of_status_change))
    Now, your example data is messed up and is not usable--try running it yourself and you'll see. It is not -dataex- output: you must have edited it. Never do that!It took me longer to fix up your example data to make it usable than to solve your problem and post my response. The problem arises in Date_of_status_change which, in the -input- command was designated as an int, but in fact what you have instead is, for Stata purposes, gibberish--it's a string, but it's not bound in quotes, so it's useless. I suspect that in your real data set, it is, in fact an integer with a nice date format. -dataex- handles that by showing the actual integer value and issues a format command at the end. But, I guess your motivation was to make the date readable to human eyes so as to match what you say in your explanation. Well meant, but in the end it did more harm than good. The whole point of -dataex- is that those who want to help you can run the code and immediately have a faithful replication of your example data. Nobody really plans to read the -dataex- with their eyes; that's not what it's for, and particularly with Stata date variables, it isn't possible. But once valid -dataex- output code is run in the responder's set up, if we need to see the data with our eyes we can always -browse- or -list- it. So always leave -dataex- output exactly as it comes.

    By the way, if Date_of_status_change really is a formatted Stata internal format numeric daily date variable in your real data set, as I suspect it is, then there is no need to run the second "paragraph" of the code I posted here. And you will also need to change -date_of_status_change- to -Date_of_status_change- throughout the code I posted.

    Comment


    • #3
      Thank you Clyde Schechter
      Your codes really help me!
      I will keep in mind your comments in the non-edition of dataex. Just like you mentioned, I want to make it more readable, but it takes you more extra work to fix up. I am sorry for that.

      Best regards,
      Josh

      Comment

      Working...
      X