Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help reshaping a dataset from long to wide

    Dear Statalist, I have a dataset in which each citing_year (within cited family) adds a new set of citing_firms not present in a previous year. I need your help for reshape the below dataset as wide in which each row is a cited_family - citing_year, but the trick is that the following rows (within a given cited_family) must to add the citing_firms in the previous years (here rows).

    That is, cited_family==1000065 has 9 unique firms in 2003. So, these 9 firms will be in a single row (9 new variables).

    In 2004, I need the same firms present in the row from 2003 plus the new firm from 2004. So, this new row will have in this case one more variable.

    In 2005, I need all the previous firms (in this case would be the 10 previous: 9 from 2003 and 1 from 2004 firms) plus the new firms from 2005, all in a new row for cited_family – citing_year.

    Can you help me with this?
    Thanks in advance for your help!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(cited_han_id cited_family) int citing_year
      2456791 1000065 2003
      4063354 1000065 2003
    106357619 1000065 2003
    112493276 1000065 2003
    112589240 1000065 2003
    113008532 1000065 2003
    113011155 1000065 2003
    113011156 1000065 2003
    113011157 1000065 2003
    112589245 1000065 2004
    105475384 1000065 2005
    105889591 1000065 2005
    105889594 1000065 2005
    105889596 1000065 2005
    105889597 1000065 2005
    105895227 1000065 2005
    105895228 1000065 2005
    105895229 1000065 2005
    105895230 1000065 2005
    100951950 1000065 2011
    100951952 1000065 2011
    100951953 1000065 2011
      1981214 1001348 2004
    106051498 1001348 2004
    106051499 1001348 2004
    106051500 1001348 2004
    106051501 1001348 2004
    125793616 1001348 2004
    125797054 1001348 2004
    125799278 1001348 2004
    125804427 1001348 2004
    113013746 1001348 2006
    113013747 1001348 2006
    113013748 1001348 2006
    113013749 1001348 2006
    100959356 1001348 2008
    100959357 1001348 2008
    100959358 1001348 2008
    100959359 1001348 2008
    105892621 1001348 2010
    185353888 1001348 2012
    185367070 1001348 2012
    185367071 1001348 2012
    185532401 1001348 2012
    185606878 1001348 2012
      2433998 1001348 2014
    118295562 1001348 2014
    118295563 1001348 2014
    139562940 1001348 2014
    139562941 1001348 2014
      3131499 1001348 2017
    101363461 1001348 2017
    101363462 1001348 2017
    101363463 1001348 2017
    101363464 1001348 2017
    end

  • #2
    Code:
    * Generate an index:
    bysort cited_family: gen seq = _n
    * Reshape
    reshape wide cited_han_id, i(cited_family citing_year) j(seq)
    
    * Move family and year to front:
    order cited_family citing_year
    
    * Fill out missing vertically:
    foreach x of varlist cited_han_id*{
        replace `x' = `x'[_n-1] if cited_family == cited_family[_n-1] & `x' == .
    }

    Comment


    • #3
      Dear Ken Chui, thanks a lot. It perfectly worked!

      Comment

      Working...
      X