Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to choose one variable with non-missing value among 3 variables

    Dear all,

    I have a set of data for firm deals. For each deal, there are 3 completion dates, Completeddate, Expectedcompletiondate, and Assumedcompletiondate. I have a thorough inspection of the data and found that in most of the cases, only 1 of the 3 dates has a value, the other 2 are missing. In some cases, 2 out of the 3 dates have value, but the 2 dates are the same. So I want to create a new variable, complete_date, which takes the non-missing value of the 3 dates, or chooses one value if there are 2 non-missing dates (given that the 2 dates are the same, it doesn't matter which date to choose). How can I achieve this? Thanks a lot for any help!
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long DealNumber int(Completeddate Expectedcompletiondate Assumedcompletiondate)
    1907206228     .     .     .
    1907063377 20087     .     .
    1943130728 21643     .     .
    1633010357     .     . 19288
    1907140101     .     . 21220
    1907243411 21155     .     .
    1943102519 21909     .     .
    1907094570     .     . 20910
    1907203574 20884     .     .
    1943042795 21543     .     .
    1943040193     .     . 22089
    1943152252 22071     .     .
    1907196308 20948     .     .
    1907000052     . 19900 19900
    end
    format %tdnn/dd/CCYY Completeddate
    format %tdnn/dd/CCYY Expectedcompletiondate
    format %tdnn/dd/CCYY Assumedcompletiondate
    The dates look as if they are random numbers but if you put it in stata it will back to normal mm/dd/yyyy format. The last row of the data is an example of there are 2 non-missing dates and they are the same.

  • #2
    try the following:
    Code:
    egen int complete_date=rowmax(Completeddate Expectedcompletiondate Assumedcompletiondate)
    note that this assumes that if an observation has at least two dates, you want the max of those 2 or 3 (if they differ); if you want something else, please see
    Code:
    h egen
    and look for the various options that start with "row"

    since you know how to format date time variables I have not shown a format command here

    Comment


    • #3
      Thanks Rich, rowmax works well!

      Comment


      • #4
        Either max() or min() would work fine here too. The syntax is slightly different: variable names should be separated by commas.

        Comment


        • #5
          Thanks Nick for your suggestion. They both works!

          Comment

          Working...
          X