Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate new variable = time difference between two first dates

    Hi All,

    Hi All,

    I have a large dataset with dates of visits and dates of purchases for each customer. The data looks like this:

    HTML Code:

    customer_no visit date purchase date 10 20-10-2014 NA 10 NA 12-01-2013 10 17-06-2011 NA 10 NA 18-02-2012 30 22-12-2013 NA 30 14-07-2012 NA


    I would like to generate a new variable that gives me the days of difference between FIRST visit and FIRST purchase. So, STATA should be able to identify which date is the first in the visits column for each unique customer_no, identify which date is first in the purchase column for the same customer_no and then calculate the difference.

    Presumably the new column will be all NA and one value per customer.

    Thanks!!

  • #2
    So, first of all, within Stata you cannot have "NA" in a date variable. Date variables in Stata are numeric. Next, to do calculations with dates, as you require, they must be Stata date variables, not strings that read like dates to humans. Next, you cannot have a variable named visit date or purchase date because blank spaces within variable names are not legal. So I'm going to assume for starters that you have a Stata data set with variables named customer_no, visit_date, and purchase_date. I'll also assume that the visit_date and purchase_date variables are, for now, strings.

    Code:
    // STEP 1: MAKE THE DATA USABLE
    replace visit_date = "" if visit_date == "NA"
    replace purchase_date = "" if purchase_date == "NA"
    gen long visit = date(visit_date, "MDY")
    assert missing(visit) == missing(visit_date) // VERIFY ALL VISIT DATES WERE VALID
    gen long purchase = date(purchase_date, "MDY")
    assert missing(purchase) == missing(purchase_date) // VERIFY PURCHASE DATES ALL VALID
    
    // GET FIST VISIT & PURCHASE DATES PER CUSTOMER
    egen long first_visit = min(visit), by(customer_no)
    egen long first_purchase = min(purchase), by(customer_no)
    
    // AND SUBTRACT
    gen interval_to_purchase = first_purchase - first_visit
    The variable interval_to_purchase is what you want, except that it is present in all the observations for a customer rather than just one. It isn't clear how to identify which one of the observations for any customer enjoys the privilege of having that observation. So I'll leave it to you to take it from here. And don't try to replace it with "NA." This is a number, so you replace it with missing value, denoted just as a period in commands.

    Comment

    Working...
    X