Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape data to long by month-year using data collection date range

    Hi!

    I am having trouble reshaping my data to long by month-year for each month-year an observation is in my dataset and hoping this forum can help. Currently the data are long, but by round. Each round has a beginning date (BEDGDATE) and an end date (ENDDATE), which can span any number of months/years. Important to note is that the beginning date of a round is the same as the end date of the previous round, and each observation can have anywhere from 1-9 total rounds.

    Here is what my date currently look like:
    STUDYID PANEL ROUND BEGDATE (YYMM) ENDDATE (YYMM) RACE MARITAL INCOME INS
    12112 24 1 2018m1 2018m3 ASIAN SING 120K PUB
    12112 24 2 2018m3 2018m5 ASIAN MAR 150K PRIV
    12112 24 3 2018m6 2018m8 ASIAN MAR 173K PRIV
    Etc….


    Here is what I would like the final product to look like. To note, once the data have been reshaped to long, variables like RACE, INC, INS, MARITAL should be filled in for all the month-years within each round.
    DUPERSID PANEL Round DATE RACE INCOME INS MARITAL
    12112 24 1 2018m1 Asian 120K PUB SINGLE
    12112 24 1 2018m2 Asian 120K PUB SINGLE
    12112 24 1 2018m3 Asian 120K
    PUB SINGLE
    12112
    24 2 2018m4 Asian 150K
    PRIV MARRIED
    12112 24 2 2018m5 Asian 150K PRIV MARRIED
    12112
    24 3 2018m6 Asian
    173K PRIV
    MARRIED
    12112 24 3 2018m7 Asian
    173K
    PRIV
    MARRIED
    12112 24 3 2018m8 Asian
    173K
    PRIV
    MARRIED

    Thank you in advance for the assistance!


  • #2
    the beginning date of a round is the same as the end date of the previous round
    But this is not true in the example you show. There, the end date for round 2 is 2018m5, but the begin date for round 3 is 2018m6. I will assume that you will fix these data errors in your data set, because the code I am writing requires this assumption to be true (and verifies it before proceeding).

    This is not a -reshape- problem at all. It is a problem for -tsfill-. But to use -tsfill-, we first have to prepare the data by creating a proper Stata numeric monthly date variable to replace begdateyymm, as -tsfill- will not work with a string date variable.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int studyid byte(panel round) str7(begdateyymm enddateyymm) str6 race str5(marital income) str4 ins
    12112 24 1 "2018m1 " "2018m3 " "ASIAN " "SING " "120K " "PUB"
    12112 24 2 "2018m3 " "2018m6 " "ASIAN " "MAR "  "150K " "PRIV"
    12112 24 3 "2018m6 " "2018m8 " "ASIAN " "MAR "  "173K " "PRIV"
    end
    
    //    VERIFY ABSENCE OF GAPS IN DATES BETWEEN ROUNDS
    by studyid (begdateyymm), sort: assert begdateyymm == enddateyymm[_n-1] if _n > 1
    
    
    //    CREATE A STATA DATE VARIABLE FROM THE STRING DATE IN THE DATA
    gen begin = monthly(begdateyymm, "YM"), after(begdateyymm)
    assert missing(begin) == missing(begdateyymm)
    drop *yymm
    format begin %tm
    
    xtset studyid begin
    tsfill
    ds studyid begin, not
    foreach v of varlist `r(varlist)' {
        by studyid (begin), sort: replace `v' = `v'[_n-1] if missing(`v')
    }
    Note: I assume that studyid is the participant id variable for the study. That isn't completely clear, as panel, or the pair of panel and studyid might be instead. Modify accordingly if I have this wrong.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Please give an example using dataex. FAQ Advice #12 explains.

      By all means fudge the data values but using the same variable names and storage types as in your real Stata dataset is important.

      (@Clyde Schechter gave a better answer. At best this gives emphasis to a major point he made.)

      Comment


      • #4
        Hi Clyde, thanks for your response! Sorry about not being more clear, I am still getting aquainted with the norms of the forum. Please find example data from the dataset below using dataex.

        I understand what you mean about this not being a reshape problem, I just didn't know a better way to phrase it. The participant ID is dupersid, round indicates the panel survey round, and beg_date and end_date are the round beginning and end variables. So currently, a dupersid can appear from 1-9 times depending on the number of rounds they have been interviewed for the survey.

        When I start with testing this assumption from the code you shared, I get an error return that it is false
        // VERIFY ABSENCE OF GAPS IN DATES BETWEEN ROUNDS by dupersid (beg_date), sort: assert beg_date == end_date[_n-1] if _n > 1



        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str10 dupersid byte round float(beg_date end_date married) byte(age racet) float female
        "2293550104" 4 698 702 .b 14 1 0
        "2294263102" 3 692 700 1 58 2 0
        "2295294101" 3 692 696 0 71 2 0
        "2295952101" 4 696 703 1 44 2 0
        "2296168108" 3 693 697 .b 2 1 0
        "2296686103" 3 693 699 .b 11 2 0
        "2297492102" 3 692 696 1 59 2 1
        "2297758103" 5 702 707 1 70 3 0
        "2299032103" 5 702 707 .b 11 3 1
        "2320145102" 3 705 709 1 74 1 0
        "2320715104" 2 699 706 0 16 1 1
        "2321806101" 7 727 732 1 41 2 1
        "2322104104" 3 703 711 0 16 3 1
        "2322675101" 1 696 697 0 47 2 0
        "2322845102" 2 698 703 1 53 2 0
        "2322900102" 3 703 709 1 67 2 1
        "2324106106" 2 698 703 .b 12 1 0
        "2324335101" 3 704 712 1 75 2 0
        "2324339104" 1 696 697 .b 10 5 0
        "2324422103" 2 699 704 0 85 3 0
        "2325851103" 3 703 710 0 18 1 0
        "2326149102" 9 739 743 0 49 1 1
        "2326464103" 7 729 734 .b 5 2 1
        "2327512101" 7 729 733 0 59 2 0
        "2327906101" 5 717 719 1 50 3 0
        "2327917105" 4 711 716 .b 15 5 1
        "2327988102" 1 696 699 0 20 2 0
        "2328024104" 1 696 701 .b 9 2 1
        "2329280103" 3 705 710 0 20 5 0
        "2461126102" 2 709 715 .b 13 2 0
        "2461853101" 2 714 718 1 48 3 1
        "2461956101" 6 732 741 0 68 1 1
        "2462234101" 3 715 725 0 42 3 1
        "2462378102" 4 721 728 1 78 2 0
        "2464406102" 1 708 709 1 49 4 1
        "2464613101" 4 724 728 0 85 2 1
        "2465175101" 1 708 714 1 52 2 1
        "2466979103" 1 708 709 .b 14 2 0
        "2467042102" 2 711 715 1 50 4 1
        "2468700102" 5 726 732 0 58 2 1
        "2468853101" 5 727 733 1 41 3 1
        "2571022101" 1 720 722 1 36 3 1
        "2571062101" 2 724 728 0 85 2 1
        "2573512101" 5 739 743 1 57 3 1
        "2573734105" 4 733 738 0 29 3 1
        "2576843107" 3 728 734 .b 5 2 0
        "2578897103" 1 720 724 .b 11 2 1
        "2680199101" 3 739 745 1 50 2 1
        "2688070103" 3 742 746 0 42 2 0
        "2688974102" 3 741 749 .b 15 1 1
        end
        format %tm beg_date
        format %tm end_date
        label values racet H2090874X
        label def H2090874X 1 "1 HISPANIC", modify
        label def H2090874X 2 "2 NON-HISPANIC WHITE ONLY", modify
        label def H2090874X 3 "3 NON-HISPANIC BLACK ONLY", modify
        label def H2090874X 4 "4 NON-HISPANIC ASIAN ONLY", modify
        label def H2090874X 5 "5 NON-HISPANIC OTHER RACE OR MULTIPLE RACE", modify
        label values female yesno
        label def yesno 0 "No", modify
        label def yesno 1 "Yes", modify
        [/CODE]

        Again, thanks so much for your help!

        Comment


        • #5
          So the question now becomes, what do you want to do. You have learned that there are gaps in the data. There may be time intervals between the end date of one observation and the begin date of the next. So how do you want to handle those gaps? You might want to just ignore the gaps, and create observations for the gap years and fill them in anyway. Or you might decide to skip over those years. Which do you prefer?

          When you answer that question, please also show a different set of example data. The example you have given in #4 contains only a single observation for each dupersid, So to deal with the gap issue, we need an example where at least some of them have multiple observations, and at least one of them has a gap.

          To find the dupersid's that have a gaps, you can do this:
          Code:
          by dupersid (beg_date), sort: egen byte has_gap = max(beg_date != end_date[_n-1] & _n > 1)
          browse if has_gap

          Comment


          • #6
            Hi Clyde,
            Yes it appears that is the case. Thank you for this code, it was super helpful in diagnosing the issue. It looks like there are two possible scenarios: 1) gaps in the dates or 2) overlap in the dates (that I did not realize were there before). What we want is a consecutive month-year rows for the entire window an observtion is in the survey.(i.e., from the beginning date of their round 1 to the end date of their final listed round). So to do that I think we need to use the end date from a previous round and assign that as the NEXT round's start date (i.e., previous round end month-year + 1 month). For example, dupersid 111 round 1 begins on 2018m1 and ends on 2018m3. We pull 2018m4 as the start date for dupersid 111 round 2, and so forth.

            I've included a lot more observations so let me know if that works!

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str10 dupersid byte round float(beg_date end_date married) byte(age racet) float female
            "2290001101" 3 692 697  0 26 2 1
            "2290001101" 4 697 704  1 26 2 1
            "2290001101" 5 704 707  1 27 2 1
            "2290001102" 3 692 697  0 25 2 0
            "2290001102" 4 697 704  1 25 2 0
            "2290001102" 5 704 707  1 25 2 0
            "2290002101" 3 692 699  1 33 1 1
            "2290002101" 4 699 704  1 33 1 1
            "2290002101" 5 704 707  1 34 1 1
            "2290002102" 3 692 699  1 38 1 0
            "2290002102" 4 699 704  1 39 1 0
            "2290002102" 5 704 707  1 39 1 0
            "2290002103" 3 692 699 .b 11 1 0
            "2290002103" 4 699 704 .b 11 1 0
            "2290002103" 5 704 707 .b 11 1 0
            "2290002104" 3 692 699 .b  8 1 0
            "2290002104" 4 699 704 .b  8 1 0
            "2290002104" 5 704 707 .b  8 1 0
            "2290002105" 3 692 699 .b  3 1 0
            "2290002105" 4 699 704 .b  4 1 0
            "2290002105" 5 704 707 .b  4 1 0
            "2290002106" 3 692 699 .b  2 1 0
            "2290002106" 4 699 704 .b  2 1 0
            "2290002106" 5 704 707 .b  2 1 0
            "2290003101" 3 692 700  1 36 2 1
            "2290003101" 4 700 703  1 36 2 1
            "2290003101" 5 703 707  1 36 2 1
            "2290003102" 3 692 700  1 35 2 0
            "2290003102" 4 700 703  1 36 2 0
            "2290003102" 5 703 707  1 36 2 0
            "2290003103" 3 692 700 .b  5 2 1
            "2290003103" 4 700 703 .b  6 2 1
            "2290003103" 5 703 707 .b  6 2 1
            "2290003104" 3 692 700 .b  4 2 0
            "2290003104" 4 700 703 .b  4 2 0
            "2290003104" 5 703 707 .b  4 2 0
            "2290005101" 3 692 697  1 29 2 0
            "2290005101" 4 697 702  1 30 2 0
            "2290005101" 5 702 707  1 30 2 0
            "2290005102" 3 692 697  1 31 2 1
            "2290005102" 4 697 702  1 31 2 1
            "2290005102" 5 702 707  1 32 2 1
            "2290005103" 3 692 697 .b 11 2 1
            "2290005103" 4 697 702 .b 11 2 1
            "2290005103" 5 702 707 .b 12 2 1
            "2290005104" 3 692 697 .b  8 2 1
            "2290005104" 4 697 702 .b  9 2 1
            "2290005104" 5 702 707 .b  9 2 1
            "2290005105" 3 692 697 .b  1 2 1
            "2290005105" 4 697 702 .b  1 2 1
            "2290005105" 5 702 707 .b  2 2 1
            "2290006101" 3 691 697  0 51 1 1
            "2290006101" 4 697 703  0 51 1 1
            "2290006101" 5 703 707  0 52 1 1
            "2290006102" 3 691 697  0 29 1 0
            "2290006102" 4 697 703  0 30 1 0
            "2290006102" 5 703 707  0 30 1 0
            "2290006103" 3 691 696  0 27 1 1
            "2290006103" 4 696 703  0 28 1 1
            "2290006103" 5 703 707  0 28 1 1
            "2290006104" 3 691 696 .b  1 1 1
            "2290006104" 4 696 703 .b  2 1 1
            "2290006104" 5 703 707 .b  2 1 1
            "2290007101" 3 694 700  1 60 2 0
            "2290007101" 4 700 705  1 60 2 0
            "2290007101" 5 705 707  1 60 2 0
            "2290007102" 3 694 700  1 54 2 1
            "2290007102" 4 700 705  1 54 2 1
            "2290007102" 5 705 707  1 54 2 1
            "2290008101" 3 691 697  1 34 2 0
            "2290008101" 4 697 704  1 35 2 0
            "2290008101" 5 704 707  1 35 2 0
            "2290008102" 3 691 697  1 33 2 1
            "2290008102" 4 697 704  1 34 2 1
            "2290008102" 5 704 707  1 34 2 1
            "2290008103" 3 691 697 .b 15 2 1
            "2290008103" 4 697 704  0 16 2 1
            "2290008103" 5 704 707  0 16 2 1
            "2290008104" 3 691 697 .b 11 2 0
            "2290008104" 4 697 704 .b 12 2 0
            "2290008104" 5 704 707 .b 12 2 0
            "2290009101" 3 691 697  0 71 2 1
            "2290009101" 4 697 703  0 71 2 1
            "2290009101" 5 703 707  0 71 2 1
            "2290009102" 3 691 697  0 67 2 0
            "2290009102" 4 697 703  0 67 2 0
            "2290009102" 5 703 707  0 68 2 0
            "2290010101" 3 691 697  1 56 2 1
            "2290010101" 4 697 702  1 56 2 1
            "2290010101" 5 702 707  1 57 2 1
            "2290010102" 3 691 697  1 58 2 0
            "2290010102" 4 697 702  1 58 2 0
            "2290010102" 5 702 707  1 59 2 0
            "2290010103" 3 691 697 .b 11 2 1
            "2290010103" 4 697 702 .b 11 2 1
            "2290010103" 5 702 707 .b 12 2 1
            "2290010105" 4 698 702 .b 11 2 1
            "2290010105" 5 702 707 .b 11 2 1
            "2290010106" 4 698 702 .b 12 2 0
            "2290010106" 5 702 707 .b 12 2 0
            "2290015101" 3 690 697  0 82 2 1
            "2290015101" 4 697 702  0 82 2 1
            "2290015101" 5 702 707  0 83 2 1
            "2290016101" 3 691 697  1 53 1 0
            "2290016101" 4 697 703  1 53 1 0
            "2290016101" 5 703 707  1 54 1 0
            "2290016102" 3 691 697  1 39 1 1
            "2290016102" 4 697 703  1 40 1 1
            "2290016102" 5 703 707  1 40 1 1
            "2290016103" 3 691 697  0 18 1 1
            "2290016103" 4 697 703  0 18 1 1
            "2290016103" 5 703 707  0 18 1 1
            "2290016104" 3 691 697 .b  8 1 1
            "2290016104" 4 697 703 .b  9 1 1
            "2290016104" 5 703 707 .b  9 1 1
            "2290016105" 3 691 697 .b  6 1 0
            "2290016105" 4 697 703 .b  7 1 0
            "2290016105" 5 703 707 .b  7 1 0
            "2290019101" 3 691 700  1 48 3 0
            "2290019101" 4 700 704  1 49 3 0
            "2290019101" 5 704 707  1 49 3 0
            "2290019102" 3 691 700  1 48 3 1
            "2290019102" 4 700 704  1 48 3 1
            "2290019102" 5 704 707  1 49 3 1
            "2290019103" 3 691 700  0 19 3 1
            "2290019103" 4 700 704  0 19 3 1
            "2290019103" 5 704 707  0 20 3 1
            "2290019104" 3 691 700  0 19 3 1
            "2290019104" 4 700 704  0 19 3 1
            "2290019104" 5 704 707  0 20 3 1
            "2290019105" 3 691 700 .b 11 3 1
            "2290019105" 4 700 704 .b 12 3 1
            "2290019105" 5 704 707 .b 12 3 1
            "2290019106" 3 691 700 .b 13 3 0
            "2290019106" 4 700 704 .b 13 3 0
            "2290019106" 5 704 707 .b 13 3 0
            "2290020101" 3 691 697  0 52 1 1
            "2290020101" 4 697 703  0 53 1 1
            "2290020101" 5 703 707  0 53 1 1
            "2290020102" 3 691 697  0 19 2 0
            "2290020102" 4 697 704  0 20 2 0
            "2290020102" 5 704 707  0 20 2 0
            "2290024101" 3 691 699  0 50 2 0
            "2290024101" 4 699 704  0 50 2 0
            "2290024101" 5 704 707  0 51 2 0
            "2290025101" 3 691 699  1 67 2 1
            "2290025101" 4 699 702  1 67 2 1
            "2290025101" 5 702 707  1 68 2 1
            "2290025102" 3 691 699  1 64 2 0
            "2290025102" 4 699 702  1 64 2 0
            "2290025102" 5 702 707  1 65 2 0
            "2290027101" 3 693 698  1 43 3 1
            "2290027101" 4 698 702  1 43 3 1
            "2290027101" 5 702 707  1 43 3 1
            "2290027102" 3 693 698  1 42 3 0
            "2290027102" 4 698 702  1 42 3 0
            "2290027102" 5 702 707  1 43 3 0
            "2290027103" 3 693 698  0 18 3 0
            "2290027103" 4 698 702  0 19 3 0
            "2290027103" 5 702 707  0 19 3 0
            "2290027104" 3 693 698 .b 11 3 1
            "2290027104" 4 698 702 .b 11 3 1
            "2290027104" 5 702 707 .b 12 3 1
            "2290030101" 3 691 696  1 42 4 0
            "2290030101" 4 696 706  1 43 4 0
            "2290030101" 5 706 707  1 43 4 0
            "2290030102" 3 691 696  1 42 4 1
            "2290030102" 4 696 706  1 43 4 1
            "2290030102" 5 706 707  1 43 4 1
            "2290030104" 3 691 696 .b 14 4 1
            "2290030104" 4 696 706 .b 15 4 1
            "2290030104" 5 706 707 .b 15 4 1
            "2290032101" 3 691 696  0 61 2 1
            "2290032101" 4 696 702  0 61 2 1
            "2290032101" 5 702 707  0 61 2 1
            "2290032102" 3 691 696  0 60 2 0
            "2290032102" 4 696 702  0 61 2 0
            "2290032102" 5 702 707  0 61 2 0
            "2290032103" 3 691 696  0 31 2 1
            "2290032103" 4 696 702  0 32 2 1
            "2290032103" 5 702 707  0 32 2 1
            "2290034101" 3 690 697  1 51 1 1
            "2290034101" 4 697 702  1 52 1 1
            "2290034101" 5 702 707  1 52 1 1
            "2290034102" 3 690 697  1 44 1 0
            "2290034102" 4 697 702  1 44 1 0
            "2290034102" 5 702 707  1 45 1 0
            "2290035101" 3 691 697  1 39 2 0
            "2290035101" 4 697 702  1 40 2 0
            "2290035101" 5 702 707  1 40 2 0
            "2290035102" 3 691 697  1 38 2 1
            "2290035102" 4 697 702  1 38 2 1
            "2290035102" 5 702 707  1 38 2 1
            "2290035103" 3 691 697 .b  7 2 0
            "2290035103" 4 697 702 .b  7 2 0
            "2290035103" 5 702 707 .b  8 2 0
            "2290037101" 3 692 700  1 52 4 1
            "2290037101" 4 700 705  1 52 4 1
            "2290037101" 5 705 707  1 52 4 1
            "2290037102" 3 692 700  1 53 4 0
            "2290037102" 4 700 705  1 53 4 0
            "2290037102" 5 705 707  1 54 4 0
            "2290038101" 3 693 699  1 66 2 1
            "2290038101" 4 699 703  1 66 2 1
            "2290038101" 5 703 707  1 67 2 1
            "2290038102" 3 693 699  1 83 2 0
            "2290038102" 4 699 703  1 84 2 0
            "2290038102" 5 703 707  1 84 2 0
            "2290041101" 3 690 696  1 44 1 0
            "2290041101" 4 696 703  1 45 1 0
            "2290041101" 5 703 707  1 45 1 0
            "2290041102" 3 690 696  1 36 1 1
            "2290041102" 4 696 703  1 36 1 1
            "2290041102" 5 703 707  1 37 1 1
            "2290041103" 3 690 696 .b 11 1 0
            "2290041103" 4 696 703 .b 12 1 0
            "2290041103" 5 703 707 .b 12 1 0
            "2290041104" 3 690 696 .b  6 1 0
            "2290041104" 4 696 703 .b  6 1 0
            "2290041104" 5 703 707 .b  7 1 0
            "2290041105" 3 690 696 .b  2 1 0
            "2290041105" 4 696 703 .b  3 1 0
            "2290041105" 5 703 707 .b  3 1 0
            "2290042101" 3 692 697  1 27 2 0
            "2290042101" 4 697 704  1 28 2 0
            "2290042101" 5 704 707  1 28 2 0
            "2290042102" 3 692 697  1 27 2 1
            "2290042102" 4 697 704  1 28 2 1
            "2290042102" 5 704 707  1 28 2 1
            "2290044101" 3 691 701  1 35 2 0
            "2290044101" 4 701 704  1 35 2 0
            "2290044101" 5 704 707  1 35 2 0
            "2290044102" 3 691 701  1 35 2 1
            "2290044102" 4 701 704  1 36 2 1
            "2290044102" 5 704 707  1 36 2 1
            "2290044103" 3 691 701 .b  1 2 1
            "2290044103" 4 701 704 .b  1 2 1
            "2290044103" 5 704 707 .b  2 2 1
            "2290046101" 3 690 696  0 47 2 1
            "2290046101" 4 696 702  0 47 2 1
            "2290046101" 5 702 707  0 48 2 1
            "2290046102" 3 690 696  0 53 2 0
            "2290046102" 4 696 702  0 53 2 0
            "2290046102" 5 702 707  0 54 2 0
            "2290049101" 3 691 697  0 74 1 1
            "2290049101" 4 697 702  0 75 1 1
            "2290049101" 5 702 707  0 75 1 1
            "2290050101" 3 694 699  0 32 1 0
            "2290050101" 4 699 706  0 32 1 0
            "2290050101" 5 706 707  0 33 1 0
            "2290050102" 3 694 699  0 57 1 1
            "2290050102" 4 699 706  0 58 1 1
            "2290050102" 5 706 707  0 58 1 1
            "2290050103" 3 694 699  0 30 1 0
            "2290050103" 4 699 706  0 30 1 0
            "2290050103" 5 706 707  0 30 1 0
            "2290051101" 3 692 697  1 38 2 1
            "2290051101" 4 697 703  1 38 2 1
            "2290051101" 5 703 707  1 39 2 1
            "2290051102" 3 692 697  1 36 1 0
            "2290051102" 4 697 703  1 36 1 0
            "2290051102" 5 703 707  1 37 1 0
            "2290051103" 3 692 697  0 67 2 1
            "2290051103" 4 697 703  0 68 2 1
            "2290051103" 5 703 707  0 68 2 1
            "2290051104" 3 692 697 .b  5 1 0
            "2290051104" 4 697 703 .b  6 1 0
            "2290051104" 5 703 707 .b  6 1 0
            "2290051105" 3 692 697 .b  3 1 0
            "2290051105" 4 697 703 .b  4 1 0
            "2290051105" 5 703 707 .b  4 1 0
            "2290053101" 3 692 697  1 54 2 1
            "2290053101" 4 697 703  1 54 2 1
            "2290053101" 5 703 707  1 55 2 1
            "2290053102" 3 692 697  1 53 2 0
            "2290053102" 4 697 703  1 54 2 0
            "2290053102" 5 703 707  1 54 2 0
            "2290053103" 3 692 697  0 16 2 1
            "2290053103" 4 697 703  0 17 2 1
            "2290053103" 5 703 707  0 17 2 1
            "2290053104" 3 692 697 .b 13 2 0
            "2290053104" 4 697 703 .b 14 2 0
            "2290053104" 5 703 707 .b 14 2 0
            "2290054101" 3 691 697  0 69 2 1
            "2290054101" 4 697 702  0 70 2 1
            "2290054101" 5 702 707  0 70 2 1
            "2290055101" 3 690 697  0 85 2 1
            "2290055101" 4 697 702  0 85 2 1
            "2290055101" 5 702 707  0 85 2 1
            "2290056101" 3 692 700  1 27 2 1
            "2290056101" 4 700 703  1 27 2 1
            "2290056101" 5 703 707  1 28 2 1
            "2290056102" 3 692 700  1 28 2 0
            "2290056102" 4 700 703  1 28 2 0
            "2290056102" 5 703 707  1 28 2 0
            "2290057101" 3 691 698  1 41 3 1
            "2290057101" 4 698 702  1 41 3 1
            "2290057101" 5 702 707  1 41 3 1
            "2290057102" 3 691 698  1 43 3 0
            "2290057102" 4 698 702  1 43 3 0
            "2290057102" 5 702 707  1 44 3 0
            "2290061101" 3 691 696  1 80 2 1
            "2290061101" 4 696 702  1 80 2 1
            "2290061101" 5 702 707  1 81 2 1
            "2290061102" 3 691 696  1 82 2 0
            "2290061102" 4 696 702  1 82 2 0
            "2290061102" 5 702 707  1 83 2 0
            "2290063101" 3 693 699  1 63 2 1
            "2290063101" 4 699 702  1 63 2 1
            "2290063101" 5 702 707  1 64 2 1
            "2290063102" 3 693 699  1 65 2 0
            "2290063102" 4 699 702  1 66 2 0
            "2290063102" 5 702 707  1 66 2 0
            "2290064101" 3 691 698  0 30 3 1
            "2290064101" 4 698 702  0 30 3 1
            "2290064101" 5 702 707  0 30 3 1
            "2290064102" 3 691 698 .b 10 3 0
            "2290064102" 4 698 702 .b 10 3 0
            "2290064102" 5 702 707 .b 11 3 0
            "2290064103" 3 691 698 .b  8 3 0
            "2290064103" 4 698 702 .b  9 3 0
            "2290064103" 5 702 707 .b  9 3 0
            "2290064104" 3 695 698  1 50 3 1
            "2290064104" 4 698 702  1 50 3 1
            "2290064104" 5 702 707  1 51 3 1
            "2290064105" 3 695 698  1 52 3 0
            "2290064105" 4 698 702  1 53 3 0
            "2290064105" 5 702 707  1 53 3 0
            "2290064201" 3 691 697  0 29 3 1
            "2290064201" 4 697 703  0 30 3 1
            "2290064201" 5 703 707  0 30 3 1
            "2290064202" 3 691 697 .b  9 3 1
            "2290064202" 4 697 703 .b  9 3 1
            "2290064202" 5 703 707 .b 10 3 1
            "2290064203" 3 691 697 .b  9 3 0
            "2290064203" 4 697 703 .b  9 3 0
            "2290064203" 5 703 707 .b 10 3 0
            "2290065101" 3 690 696  1 70 2 0
            "2290065101" 4 696 704  1 70 2 0
            "2290065101" 5 704 707  1 71 2 0
            "2290065102" 3 690 696  1 70 2 1
            "2290065102" 4 696 704  1 71 2 1
            "2290065102" 5 704 707  1 71 2 1
            "2290067101" 3 691 698  1 46 3 0
            "2290067101" 4 698 702  1 46 3 0
            "2290067101" 5 702 707  1 47 3 0
            "2290067102" 3 691 698  1 45 3 1
            "2290067102" 4 698 702  1 45 3 1
            "2290067102" 5 702 707  1 46 3 1
            "2290067103" 3 691 698  0 18 3 0
            "2290067103" 4 698 702  0 18 3 0
            "2290067103" 5 702 707  0 18 3 0
            "2290068101" 3 691 697  1 34 2 1
            "2290068101" 4 697 703  1 34 2 1
            "2290068101" 5 703 707  1 35 2 1
            "2290068102" 3 691 697  1 32 2 0
            "2290068102" 4 697 703  1 33 2 0
            "2290068102" 5 703 707  1 33 2 0
            "2290068103" 3 691 697 .b  8 2 1
            "2290068103" 4 697 703 .b  8 2 1
            "2290068103" 5 703 707 .b  8 2 1
            "2290068104" 3 691 697 .b  4 2 1
            "2290068104" 4 697 703 .b  4 2 1
            "2290068104" 5 703 707 .b  4 2 1
            "2290072101" 3 691 697  0 50 3 1
            "2290072101" 4 697 703  0 51 3 1
            "2290072101" 5 703 707  1 51 3 1
            "2290072102" 3 691 697  0 26 3 1
            "2290072102" 4 697 704  0 27 3 1
            "2290072102" 5 704 707  0 27 3 1
            "2290072103" 3 691 697 .b  5 3 0
            "2290072103" 4 697 704 .b  6 3 0
            "2290072103" 5 704 707 .b  6 3 0
            "2290073101" 3 692 698  0 64 2 0
            "2290073101" 4 698 703  0 64 2 0
            "2290073101" 5 703 707  0 65 2 0
            "2290074101" 3 690 696  1 43 2 0
            "2290074101" 4 696 703  1 44 2 0
            "2290074101" 5 703 707  1 44 2 0
            "2290074102" 3 690 696  1 40 1 1
            "2290074102" 4 696 703  1 40 1 1
            "2290074102" 5 703 707  1 41 1 1
            "2290074103" 3 690 696 .b  4 2 0
            "2290074103" 4 696 703 .b  5 2 0
            "2290074103" 5 703 707 .b  5 2 0
            "2290074104" 3 690 696 .b  2 2 1
            "2290074104" 4 696 703 .b  2 2 1
            "2290074104" 5 703 707 .b  3 2 1
            "2290077101" 3 691 697  0 22 1 0
            "2290077101" 4 697 703  0 22 1 0
            "2290077101" 5 703 707  0 22 1 0
            "2290078101" 3 690 696  0 73 3 0
            "2290078101" 4 696 703  0 74 3 0
            "2290078101" 5 703 707  0 74 3 0
            "2290078102" 3 691 696  0 58 3 1
            "2290078102" 4 696 704  0 58 3 1
            "2290078102" 5 704 707  0 59 3 1
            "2290079101" 3 692 696  0 28 2 1
            "2290079101" 4 696 702  0 28 2 1
            "2290079101" 5 702 707  0 29 2 1
            "2290079102" 3 692 696  0 27 1 0
            "2290079102" 4 696 702  0 27 1 0
            "2290079102" 5 702 707  0 28 1 0
            "2290081101" 3 691 699  1 33 1 1
            "2290081101" 4 699 703  1 33 1 1
            "2290081101" 5 703 707  1 34 1 1
            "2290081102" 3 691 699  1 34 1 0
            "2290081102" 4 699 703  1 35 1 0
            "2290081102" 5 703 707  1 35 1 0
            "2290081103" 3 691 699 .b 12 1 1
            "2290081103" 4 699 703 .b 13 1 1
            "2290081103" 5 703 707 .b 13 1 1
            "2290081104" 3 691 699 .b 10 1 1
            "2290081104" 4 699 703 .b 11 1 1
            "2290081104" 5 703 707 .b 11 1 1
            "2290082101" 3 691 697  0 58 5 1
            "2290082101" 4 697 702  0 58 5 1
            "2290082101" 5 702 707  0 59 5 1
            "2290082102" 3 691 697  1 27 5 0
            "2290082102" 4 697 702  1 28 5 0
            "2290082102" 5 702 707  1 28 5 0
            "2290083101" 3 691 698  1 66 2 0
            "2290083101" 4 698 704  1 67 2 0
            "2290083101" 5 704 707  1 67 2 0
            "2290083102" 3 691 698  1 54 2 1
            "2290083102" 4 698 704  1 55 2 1
            "2290083102" 5 704 707  1 55 2 1
            "2290084101" 3 692 698  1 55 3 1
            "2290084101" 4 698 703  1 56 3 1
            "2290084101" 5 703 707  1 56 3 1
            "2290084102" 3 692 698  1 59 3 0
            "2290084102" 4 698 703  1 60 3 0
            "2290084102" 5 703 707  1 60 3 0
            "2290084103" 3 692 698  0 24 3 0
            "2290084103" 4 698 703  0 25 3 0
            "2290084103" 5 703 707  0 25 3 0
            "2290084104" 3 692 698  0 20 3 1
            "2290084104" 4 698 703  0 21 3 1
            "2290084104" 5 703 707  0 21 3 1
            "2290085101" 3 692 698  0 29 5 1
            "2290085101" 4 698 703  0 29 5 1
            "2290085101" 5 703 707  1 29 5 1
            "2290085103" 3 692 698 .b  3 5 0
            "2290085103" 4 698 703 .b  4 5 0
            "2290085103" 5 703 707 .b  4 5 0
            "2290085104" 3 692 698 .b  1 5 1
            "2290085104" 4 698 703 .b  1 5 1
            "2290085104" 5 703 707 .b  1 5 1
            "2290086101" 3 691 696  1 25 2 1
            "2290086101" 4 696 702  1 26 2 1
            "2290086101" 5 702 707  1 26 2 1
            "2290086102" 3 691 696  1 27 2 0
            "2290086102" 4 696 702  1 27 2 0
            "2290086102" 5 702 707  1 28 2 0
            "2290086103" 3 691 696 .b  1 2 0
            "2290086103" 4 696 702 .b  2 2 0
            "2290086103" 5 702 707 .b  2 2 0
            "2290086104" 3 691 696 .b  4 2 0
            "2290086104" 4 696 702 .b  5 2 0
            "2290086104" 5 702 707 .b  5 2 0
            "2290086105" 3 693 696 .b  0 2 0
            "2290086105" 4 696 702 .b  0 2 0
            "2290086105" 5 702 707 .b  1 2 0
            "2290087101" 3 691 697  1 58 3 0
            "2290087101" 4 697 702  1 59 3 0
            "2290087101" 5 702 707  1 59 3 0
            "2290087102" 3 691 697  1 50 3 1
            "2290087102" 4 697 702  1 51 3 1
            "2290087102" 5 702 707  1 51 3 1
            "2290088101" 3 691 698  0 23 1 1
            "2290088101" 4 698 702  0 24 1 1
            "2290088101" 5 702 707  0 24 1 1
            "2290088102" 3 691 698  1 47 1 1
            "2290088102" 4 698 702  1 48 1 1
            "2290088102" 5 702 707  1 48 1 1
            "2290088103" 3 691 698  1 49 1 0
            "2290088103" 4 698 702  1 50 1 0
            "2290088103" 5 702 707  1 50 1 0
            "2290088104" 3 691 698 .b 13 1 0
            "2290088104" 4 698 702 .b 14 1 0
            "2290088104" 5 702 707 .b 14 1 0
            "2290090101" 3 691 699  1 48 3 0
            "2290090101" 4 699 705  1 48 3 0
            "2290090101" 5 705 707  1 48 3 0
            "2290090102" 3 691 699  1 48 3 1
            "2290090102" 4 699 705  1 48 3 1
            "2290090102" 5 705 707  1 48 3 1
            "2290090103" 3 691 699 .b 15 3 0
            "2290090103" 4 699 705 .b 15 3 0
            "2290090103" 5 705 707 .b 15 3 0
            "2290091101" 3 692 698  0 61 1 0
            "2290091101" 4 698 705  0 61 1 0
            "2290091101" 5 705 707  0 62 1 0
            "2290091102" 3 692 697  0 48 2 1
            "2290091102" 4 697 702  0 49 2 1
            "2290091102" 5 702 707  0 49 2 1
            "2290091103" 3 692 698  0 18 1 0
            "2290091103" 4 698 702  0 19 1 0
            "2290091103" 5 702 707  0 19 1 0
            "2290094101" 3 694 698  0 49 2 0
            "2290094101" 4 698 702  0 50 2 0
            "2290094101" 5 702 707  0 50 2 0
            "2290094102" 3 694 698 .b 11 2 0
            "2290094102" 4 698 702 .b 12 2 0
            "2290094102" 5 702 707 .b 12 2 0
            "2290095101" 3 691 696  0 58 2 0
            "2290095101" 4 696 702  0 58 2 0
            "2290095101" 5 702 707  0 59 2 0
            "2290095103" 3 691 696  0 36 2 0
            "2290095103" 4 696 702  0 36 2 0
            "2290095103" 5 702 707  0 37 2 0
            "2290095104" 3 691 696  0 26 2 1
            "2290095104" 4 696 702  0 27 2 1
            "2290095104" 5 702 707  0 27 2 1
            "2290095105" 3 691 696 .b  8 2 0
            "2290095105" 4 696 702 .b  9 2 0
            "2290095105" 5 702 707 .b  9 2 0
            "2290095106" 3 691 696 .b  5 2 0
            "2290095106" 4 696 702 .b  5 2 0
            "2290095106" 5 702 707 .b  6 2 0
            "2290096101" 3 692 696  0 66 2 0
            "2290096101" 4 696 702  0 66 2 0
            "2290096101" 5 702 707  0 67 2 0
            "2290097101" 3 692 698  1 34 2 1
            "2290097101" 4 698 703  1 34 2 1
            "2290097101" 5 703 707  0 35 2 1
            "2290097102" 3 692 698  1 38 5 0
            "2290097102" 4 698 703  1 39 5 0
            "2290097102" 5 703 707  0 39 5 0
            "2290098101" 3 692 696  1 70 2 0
            "2290098101" 4 696 702  1 70 2 0
            "2290098101" 5 702 707  1 71 2 0
            "2290098102" 3 692 696  1 69 2 1
            "2290098102" 4 696 702  1 70 2 1
            "2290098102" 5 702 707  1 70 2 1
            "2290099101" 3 692 699  0 35 2 0
            "2290099101" 4 699 702  0 35 2 0
            "2290099101" 5 702 707  0 36 2 0
            "2290099102" 3 691 696  0 35 2 1
            "2290099102" 4 696 702  1 35 2 1
            "2290099102" 5 702 707  1 35 2 1
            "2290099103" 3 691 696 .b  8 2 1
            "2290099103" 4 696 702 .b  9 2 1
            "2290099103" 5 702 707 .b  9 2 1
            "2290099104" 3 691 696 .b  6 2 0
            "2290099104" 4 696 702 .b  7 2 0
            "2290099104" 5 702 707 .b  7 2 0
            "2290099106" 4 701 702  1 30 2 0
            "2290099106" 5 702 707  1 30 2 0
            "2290100101" 3 693 699  1 46 1 0
            "2290100101" 4 699 703  1 46 1 0
            "2290100101" 5 703 707  1 47 1 0
            "2290100102" 3 693 699  1 45 1 1
            "2290100102" 4 699 703  1 45 1 1
            "2290100102" 5 703 707  1 46 1 1
            "2290100103" 3 693 699  0 17 1 1
            "2290100103" 4 699 703  0 18 1 1
            "2290100103" 5 703 707  0 18 1 1
            "2290100104" 3 693 699 .b 13 1 1
            "2290100104" 4 699 703 .b 13 1 1
            "2290100104" 5 703 707 .b 13 1 1
            "2290102101" 3 691 697  1 56 3 0
            "2290102101" 4 697 703  1 57 3 0
            "2290102101" 5 703 707  1 57 3 0
            "2290102102" 3 691 697  1 58 3 1
            "2290102102" 4 697 703  1 58 3 1
            "2290102102" 5 703 707  1 58 3 1
            "2290102103" 3 691 697  1 58 3 0
            "2290102103" 4 697 703  1 58 3 0
            "2290102103" 5 703 707  1 59 3 0
            "2290102104" 3 691 697  1 50 3 1
            "2290102104" 4 697 703  1 51 3 1
            "2290102104" 5 703 707  1 51 3 1
            "2290103101" 3 692 697  1 41 2 0
            "2290103101" 4 697 703  1 41 2 0
            "2290103101" 5 703 707  1 41 2 0
            "2290103102" 3 692 697  1 44 2 1
            "2290103102" 4 697 703  1 44 2 1
            "2290103102" 5 703 707  1 45 2 1
            "2290103103" 3 692 697  0 85 2 1
            "2290103103" 4 697 703  0 85 2 1
            "2290103103" 5 703 707  0 85 2 1
            "2290107101" 3 693 697  1 65 2 0
            "2290107101" 4 697 704  1 65 2 0
            "2290107101" 5 704 707  1 66 2 0
            "2290107102" 3 693 697  1 69 2 1
            "2290107102" 4 697 704  1 70 2 1
            "2290107102" 5 704 707  1 70 2 1
            "2290108101" 3 691 697  0 73 2 0
            "2290108101" 4 697 702  0 74 2 0
            "2290108101" 5 702 707  0 74 2 0
            "2290110101" 3 691 696  1 49 2 0
            "2290110101" 4 696 702  1 49 2 0
            "2290110101" 5 702 707  1 50 2 0
            "2290110102" 3 691 696  1 51 2 1
            "2290110102" 4 696 702  1 52 2 1
            "2290110102" 5 702 707  1 52 2 1
            "2290110103" 3 691 696 .b 10 2 1
            "2290110103" 4 696 702 .b 11 2 1
            "2290110103" 5 702 707 .b 11 2 1
            "2290110104" 3 691 696  0 16 2 1
            "2290110104" 4 696 702  0 17 2 1
            "2290110104" 5 702 707  0 17 2 1
            "2290110105" 3 691 696 .b 15 2 1
            "2290110105" 4 696 702 .b 15 2 1
            "2290110105" 5 702 707 .b 15 2 1
            "2290111101" 3 691 699  0 70 2 0
            "2290111101" 4 699 703  0 71 2 0
            "2290111101" 5 703 707  0 71 2 0
            "2290111102" 3 691 699  0 59 2 1
            "2290111102" 4 699 703  0 59 2 1
            "2290111102" 5 703 707  0 60 2 1
            "2290112102" 3 693 699  1 44 2 1
            "2290112102" 4 699 703  1 44 2 1
            "2290112102" 5 703 707  1 44 2 1
            "2290112104" 3 693 699  0 16 2 0
            "2290112104" 4 699 703  0 17 2 0
            "2290112104" 5 703 707  0 17 2 0
            "2290112105" 3 693 699 .b 15 2 1
            "2290112105" 4 699 703 .b 15 2 1
            "2290112105" 5 703 707 .b 15 2 1
            "2290114101" 3 691 697  0 56 2 0
            "2290114101" 4 697 702  0 56 2 0
            "2290114101" 5 702 707  0 56 2 0
            "2290116101" 3 691 698  0 56 3 1
            "2290116101" 4 698 702  0 56 3 1
            "2290116101" 5 702 707  0 57 3 1
            "2290117101" 3 693 698  1 50 2 1
            "2290117101" 4 698 702  1 50 2 1
            "2290117101" 5 702 707  1 51 2 1
            "2290117102" 3 693 698  1 47 2 0
            "2290117102" 4 698 702  1 48 2 0
            "2290117102" 5 702 707  1 48 2 0
            "2290117103" 3 693 698 .b 15 2 1
            "2290117103" 4 698 702 .b 15 2 1
            "2290117103" 5 702 707  0 16 2 1
            "2290117104" 3 693 698 .b 13 2 0
            "2290117104" 4 698 702 .b 13 2 0
            "2290117104" 5 702 707 .b 14 2 0
            "2290118101" 3 691 698  0 82 3 1
            "2290118101" 4 698 702  0 82 3 1
            "2290118101" 5 702 707  0 83 3 1
            "2290119101" 3 692 697  1 39 4 0
            "2290119101" 4 697 703  1 39 4 0
            "2290119101" 5 703 707  1 40 4 0
            "2290119102" 3 692 697  1 39 4 1
            "2290119102" 4 697 703  1 40 4 1
            "2290119102" 5 703 707  1 40 4 1
            "2290119103" 3 692 697 .b 11 4 0
            "2290119103" 4 697 703 .b 11 4 0
            "2290119103" 5 703 707 .b 12 4 0
            "2290119104" 3 692 697 .b 11 4 1
            "2290119104" 4 697 703 .b 12 4 1
            "2290119104" 5 703 707 .b 12 4 1
            "2290120101" 3 692 697  0 68 2 0
            "2290120101" 4 697 702  0 69 2 0
            "2290120101" 5 702 707  0 69 2 0
            "2290120102" 3 692 697  0 60 2 1
            "2290120102" 4 697 702  0 60 2 1
            "2290120102" 5 702 707  0 61 2 1
            "2290121101" 3 691 696  1 48 2 1
            "2290121101" 4 696 703  1 49 2 1
            "2290121101" 5 703 707  1 49 2 1
            "2290121102" 3 691 696  1 48 5 0
            "2290121102" 4 696 703  1 48 5 0
            "2290121102" 5 703 707  1 49 5 0
            "2290124101" 3 691 697  1 56 3 0
            "2290124101" 4 697 703  1 57 3 0
            "2290124101" 5 703 707  1 57 3 0
            "2290124102" 3 691 697  1 57 3 1
            "2290124102" 4 697 703  1 57 3 1
            "2290124102" 5 703 707  1 58 3 1
            "2290125101" 3 691 696  0 49 2 1
            "2290125101" 4 696 704  0 50 2 1
            "2290125101" 5 704 707  0 50 2 1
            "2290125102" 3 691 696 .b 12 2 0
            "2290125102" 4 696 704 .b 13 2 0
            "2290125102" 5 704 707 .b 13 2 0
            "2290126101" 3 691 696  0 58 3 1
            "2290126101" 4 696 702  0 58 3 1
            "2290126101" 5 702 707  0 59 3 1
            "2290127101" 3 692 697  1 67 3 0
            "2290127101" 4 697 704  1 67 3 0
            "2290127101" 5 704 707  1 67 3 0
            "2290127102" 3 692 697  1 66 3 1
            "2290127102" 4 697 704  1 66 3 1
            "2290127102" 5 704 707  1 66 3 1
            "2290130102" 3 691 701  0 50 1 1
            "2290130102" 4 701 705  0 50 1 1
            "2290130102" 5 705 707  0 50 1 1
            "2290130103" 3 691 701  0 16 1 0
            "2290130103" 4 701 705  0 16 1 0
            "2290130103" 5 705 707  0 16 1 0
            "2290130104" 3 691 701 .b 12 1 0
            "2290130104" 4 701 705 .b 13 1 0
            "2290130104" 5 705 707 .b 13 1 0
            "2290134101" 3 693 699  0 45 1 1
            "2290134101" 4 699 704  0 45 1 1
            "2290134101" 5 704 707  0 45 1 1
            "2290134102" 3 693 699  0 66 1 1
            end
            format %tm beg_date
            format %tm end_date
            label values racet H2090874X
            label def H2090874X 1 "1 HISPANIC", modify
            label def H2090874X 2 "2 NON-HISPANIC WHITE ONLY", modify
            label def H2090874X 3 "3 NON-HISPANIC BLACK ONLY", modify
            label def H2090874X 4 "4 NON-HISPANIC ASIAN ONLY", modify
            label def H2090874X 5 "5 NON-HISPANIC OTHER RACE OR MULTIPLE RACE", modify
            label values female yesno
            label def yesno 0 "No", modify
            label def yesno 1 "Yes", modify

            Comment


            • #7
              OK, we have a much better grasp of the data now. The discovery of overlaps in the data raises another issue. Suppose two observations of the same dupersid with overlapping dates contain conflicting values for some of the other variables. You appear to want to end up with one observation per month per dupersid from the first beg_date to the last end_date: where there are two (or more) candidate values for one of these variables, which do we pick? In the code below, I have, quite arbitrarily, chosen to keep the one that appears first in the order of the original data set. You may want to change that to some other rule for harmonizing the data.

              In the code below, to fill in the gaps, I have chosen to simply carry forward the last observed value for racet and female, as it is quite reasonable to expect that these don't typically change. For age and married, however, I am leaving them as missing because the former definitely will, and the latter may, change over time--but in neither case can we know from the available data which month the change happens.

              Code:
              //    VERIFY MINIMUM CONDITIONS FOR CODE TO WORK
              assert end_date >= beg_date
              assert !missing(beg_date, end_date)
              
              //    IDENTIFY THE FIRST AND LAST OBSERVED DATES FOR EACH PERSON
              by dupersid (beg_date), sort: gen first = beg_date[1]
              by dupersid (end_date), sort: gen last = end_date[_N]
              format first last %tm
              
              //    CREATE A DATA SET THAT IS A FILLED-IN AND DUPLICATE-FREE LIST OF
              //    DUPERSID AND ALL DATES FROM FIRST TO LAST
              preserve
              keep dupersid first last
              duplicates drop
              expand last - first + 1
              by dupersid, sort: gen mdate = first + _n - 1
              by dupersid: assert mdate[1] == first & mdate[_N] == last
              drop first last
              isid dupersid mdate, sort
              tempfile reference_frame
              save `reference_frame'
              
              //    NOW EXPAND THE ORIGINAL DATA SET TO ONE OBSERVATION FOR EVERY
              //    DATE BETWEEN beg_date AND end_date
              restore
              drop first last
              gen `c(obs_t)' obs_no = _n
              expand end_date - beg_date + 1
              by obs_no, sort: gen mdate = beg_date + _n - 1
              format mdate %tm
              drop beg_date end_date
              
              //    MERGE THE TWO DATA SETS
              merge m:1 dupersid mdate using `reference_frame', assert(match using) nogenerate
              
              //    WHERE THERE WAS OVERLAP OF ROUND DATES IN THE ORIGINAL, WE WILL NOW
              //    HAVE DUPLICATE OBSERVATIONS.  KEEP THE VERSION THAT APPEARED EARLIEST
              //    IN THE ORIGINAL DATA SET
              by dupersid mdate (obs_no), sort: keep if _n == 1
              drop obs
              
              //    WHERE THERE WERE GAPS IN THE ORIGINAL DATA SET, FILL IN VARIABLES AS APPROPRIATE
              //    TO THEIR CONTENT
              foreach v of varlist racet female {
                  replace `v' = `v'[_n-1] if missing(`v')
              }
              Note: This code uses the -preserve- command as well as some local macros. So you must not try to run it line-by-line or in chunks. It must be run without interruption from beginning to end.

              Added: Moral of the story. You begin this thread by stating you had a data set that satisfied a certain condition, namely that each round's begin date was the same as the preceding round's end date. You have now learned that this is not so, and that the data contain both some gaps and some overlaps. You don't say, and nobody asked, why you wrongly believed the contrary. Perhaps it was the results of a visual exploration of the data. Perhaps it was (mis)information given to you by whoever supplied the data to you. But there is a lesson here. Schechter's First and Second Laws of Data management are: 1. Never trust anybody else's data. 2. Never trust your own data.

              The fact is that even well curated data sets that come from sources with considerable expertise in data management frequently contain errors. All the more so data sets created by a research team. So before attempting to analyze any data set, you owe it to yourself to first check it carefully for errors and then fix them or arrange for the data provider to fix them. In particular, every variable should be checked for values that are impossible, either in their own right, or because they are inconsistent with the values of other variables or of the same variable in different observations. Such errors should be corrected if the correct values can be determined somehow, or should be replaced with missing values if not. And any relationships among observations and variables that need to be true for you to do a proper analysis should be verified. Moreover these data checks should be done, not by visual inspection, nor by taking somebody else's word for it, but by the liberal use of -assert- statements in your data management process.
              Last edited by Clyde Schechter; 10 Sep 2024, 14:19.

              Comment


              • #8
                Thank you again for your help on this, but it is still not quite setup to address what I would like to do with the data. To provide more context, the end date of each round is the survey date for that round (i.e., the date when demographic data were collected in a survey to represent demographic data for that entire round). So, although there are currently overlapping dates, we can use the end date of each round to better define the start date of the subsequent round as end_date+1month. As I am still familiarizing myself with the coding techniques you suggested in the prior post, would you mind helping me update it to achieve what I just described?

                Added: I am very appreciative of the time and assistance! Although I can also appreciate your final thoughts and they are of course true, I think they go beyond the scope of my post and our discussion and are perhaps suggestive to the fact that I am not approaching this data or project correctly. There is of course much more context that was left out of this entire post in order to simplify my questions around this coding issue. I am in the exploratory phases of getting to know this data and data cleaning and I just need help acheiving a coding goal. Rather than criticizing or calling out assumptions or data discovers in a negative light, I think having this back and forth discussion and learning new things about the data from learning new coding techniques and talking through things "out loud" is normal and should be appreciated as part of any coding process. It can be quite intimidating to post one’s questions here. Just something to be mindful of while helping coders of all experience levels on this platform.

                Comment


                • #9
                  OK. You want to fill in any gaps between the end date of one round and the begin date of the next by treating the end of the earlier round as the start of the new round for your purposes. You also indicated earlier in the thread that there may be overlaps in the original data, whereby the begin date of one round actually precedes the end date of the the preceding one. I'm not sure how you want to handle that situation, but for the code below, I "excise" the beginning of such a round and designate its start as the month following the end of the preceding round. In short, where there is overlap between rounds, the data from the earlier round is retained. And where there is a gap between rounds, the earlier data is carried forward. The end result is a data set with a separate observation for each month, with neither gaps nor overlaps in the overall inter from the first beginning date to the final end date.
                  Code:
                  by dupersid (beg_date), sort: assert end_date[_n-1] <= beg_date if _n > 1
                  assert beg_date <= end_date
                  
                  by dupersid (beg_date), sort: gen beg_date2 = min(end_date[_n-1]+1, beg_date)
                  by dupersid (beg_date): replace beg_date2 = beg_date2+1 if beg_date2  <= end_date[_n-1]
                  expand end_date - beg_date2 + 1
                  by dupersid beg_date2, sort: gen mdate = beg_date2 + _n - 1
                  by dupersid (mdate), sort: assert mdate == mdate[1] + _n - 1
                  format beg_date2 mdate %tm
                  Although I can also appreciate your final thoughts and they are of course true, I think they go beyond the scope of my post and our discussion and are perhaps suggestive to the fact that I am not approaching this data or project correctly. There is of course much more context that was left out of this entire post in order to simplify my questions around this coding issue. I am in the exploratory phases of getting to know this data and data cleaning and I just need help acheiving a coding goal. Rather than criticizing or calling out assumptions or data discovers in a negative light, I think having this back and forth discussion and learning new things about the data from learning new coding techniques and talking through things "out loud" is normal and should be appreciated as part of any coding process. It can be quite intimidating to post one’s questions here. Just something to be mindful of while helping coders of all experience levels on this platform.
                  I didn't intend my "Added:" comment as a criticism of your work, and I certainly did not want it to intimidate you. I'm sorry that happened. Thank you for speaking up about it. My intent, to make a teaching point directed to anyone who might read it, was to advocate a general approach to data management, namely, always being skeptical of the correctness of the data and to always question and verify assumptions about it. In the future I will find a better way of expressing it.


                  Comment


                  • #10
                    Thanks so much! This final code chunk helped me operationalize what I needed!

                    Comment

                    Working...
                    X