Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new rows from existing by date

    Hi all!

    Long time reader, first time poster here.

    I have a dataset where each row is a state * timeframe. Some rows contain partial years (e.g., 1/1/09-7/23/09) and some contain multiple years (e.g., 7/24/09-12/31/11; see "what I have" in below screenshot). I would like to reformat the data so that there is one row per state per year (see "what I want").

    So basically, I need to: (1) figure out when two or more rows make up a single year, and then combine them, averaging out the mw variable; and (2) figure out when a row contains multiple years, break the multi-year rows into separate rows of partial years, and then combine those partial years to form a single full year, again averaging out the mw variable.

    This is how it would go, I think:

    Click image for larger version

Name:	mw data screenshot excel 2.png
Views:	1
Size:	40.8 KB
ID:	1673414


    (I set this up in Excel, but I obviously want to do it in Stata.)

    I honestly can't even wrap my head around how I'd get started doing this with code. I could do it by hand in Excel, but there are ~500 rows, so I'd rather not!

    Does anyone have any thoughts??
    Last edited by Madeline Montgomery; 13 Jul 2022, 11:56.

  • #2
    Welcome to Statalist. For your future posts, please familiarize yourself with the dataex command for presenting data examples. See FAQ Advice #12 for details. The crucial aspect is that Stata recognizes your date variables once you import from MS Excel. If this is the case, the rest is not too complicated. Note that the effective and through dates are not needed for your final dataset as the pair identifies a year. Therefore, a year variable will do.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 state float(eff_date through_date mw)
    "AZ" 17898 18101 7.25
    "AZ" 18102 18262 7.35
    "CA" 17532 17744    8
    "CA" 17737 18101  8.1
    "CA" 18102 18992  8.2
    end
    format %td eff_date
    format %td through_date
    
    l
    gen missing= year(through_date)- year(eff_date) +1
    expand missing, g(new)
    bys state eff_date through_date (new): gen year=year(eff_date)+_n-1
    l, sepby(state)
    collapse mw, by(state year)
    l

    Res.:

    Code:
    . l
    
         +--------------------------------------+
         | state    eff_date   through~e     mw |
         |--------------------------------------|
      1. |    AZ   01jan2009   23jul2009   7.25 |
      2. |    AZ   24jul2009   31dec2009   7.35 |
      3. |    CA   01jan2008   31jul2008      8 |
      4. |    CA   24jul2008   23jul2009    8.1 |
      5. |    CA   24jul2009   31dec2011    8.2 |
         +--------------------------------------+
    
    
    . 
    . l, sepby(state)
    
         +-------------------------------------------------------------+
         | state    eff_date   through~e     mw   missing   new   year |
         |-------------------------------------------------------------|
      1. |    AZ   01jan2009   23jul2009   7.25         1     0   2009 |
      2. |    AZ   24jul2009   31dec2009   7.35         1     0   2009 |
         |-------------------------------------------------------------|
      3. |    CA   01jan2008   31jul2008      8         1     0   2008 |
      4. |    CA   24jul2008   23jul2009    8.1         2     0   2008 |
      5. |    CA   24jul2008   23jul2009    8.1         2     1   2009 |
      6. |    CA   24jul2009   31dec2011    8.2         3     0   2009 |
      7. |    CA   24jul2009   31dec2011    8.2         3     1   2010 |
      8. |    CA   24jul2009   31dec2011    8.2         3     1   2011 |
         +-------------------------------------------------------------+
    
    
    
    . 
    . l
    
         +---------------------+
         | state   year     mw |
         |---------------------|
      1. |    AZ   2009    7.3 |
      2. |    CA   2008   8.05 |
      3. |    CA   2009   8.15 |
      4. |    CA   2010    8.2 |
      5. |    CA   2011    8.2 |
         +---------------------+
    
    .
    Last edited by Andrew Musau; 13 Jul 2022, 14:10.

    Comment


    • #3
      Andrew Musau THANK YOU! And thank you for the note about using dataex, that's very helpful.

      This is truly amazing, I wasn't sure it was even possible. Thank you so much!!

      Comment

      Working...
      X