Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic reshaping issue

    Hi all -

    I'm back with more patent data - this time an odd reshaping issue. The data I've got (comes in excel, hence the field names) has a patent number followed by a dynamic list of columns containing ID numbers. Note, the columns are dynamic because I did text to column in excel. If it's easier, I could do a single column containing all the numbers that are either CHAR(10) or comma delimited. My actual rows go up to AS, but I only exported to S for illustration.

    As you can see, the numbers in the columns are sometimes the same, sometimes different. What I want is a a unique row for each patent/unique ID. So, even if there are 5 columns, if there are 3 unique IDs, I would like 3 Finalnumber/ID pair rows.

    Tips on how to get there much appreciated. Thanks!

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 Finalnumber long(K L M N O P Q R S)
    "6579981"   20629      .      .      .      .      .      .      .      .
    "10000480" 214958      .      .      .      .      .      .      .      .
    "10004700" 205029      .      .      .      .      .      .      .      .
    "10004717" 210361      .      .      .      .      .      .      .      .
    "10004729" 208090 208090 208090 208090 208090      .      .      .      .
    "10004743" 201292 201292 201292      .      .      .      .      .      .
    "10004746" 210563 205552 210563 210563 210563 210563 217003      .      .
    "10004746" 205552 205552 210563 210563 210563 210563 217003      .      .
    "10004746" 217003 205552 210563 210563 210563 210563 217003      .      .
    "10004750" 209445      .      .      .      .      .      .      .      .
    "10005761" 210498 210496 210498      .      .      .      .      .      .
    "10005761" 210496 210496 210498      .      .      .      .      .      .
    "10005783" 210861 210861 211710      .      .      .      .      .      .
    "10006924" 202107      .      .      .      .      .      .      .      .
    "10010507" 210563 210563 210563 210563      .      .      .      .      .
    "10010517" 202057 202057      .      .      .      .      .      .      .
    "10010530" 212099      .      .      .      .      .      .      .      .
    "10010533" 205580 208194      .      .      .      .      .      .      .
    "10010537"  22156  22156  22156      .      .      .      .      .      .
    "10010575" 204485 204485 204485 204485      .      .      .      .      .
    "10010612" 210655 210655      .      .      .      .      .      .      .
    "10010632" 208054      .      .      .      .      .      .      .      .
    "10011633" 215498 215498 215498 215498      .      .      .      .      .
    "10011637" 208745      .      .      .      .      .      .      .      .
    "10016372" 212304 212304      .      .      .      .      .      .      .
    "10016393" 219208      .      .      .      .      .      .      .      .
    "10016396"  21038  21038      .      .      .      .      .      .      .
    "10016403" 202788 202788 202788 202788 202788 202788 202788      .      .
    "10016404" 203858 203858 203858 203858 203858 203858      .      .      .
    "10016407" 209575 209963      .      .      .      .      .      .      .
    "10016415" 209830      .      .      .      .      .      .      .      .
    "10016429" 202192 202192 202192 202192 202192      .      .      .      .
    "10016435" 205552 205552 210563 210563 210563 210563 217003      .      .
    "10016443" 211746      .      .      .      .      .      .      .      .
    "10016504" 209381      .      .      .      .      .      .      .      .
    "10017491" 213137 213137 216157      .      .      .      .      .      .
    "10017536" 214916      .      .      .      .      .      .      .      .
    "10022264" 218201      .      .      .      .      .      .      .      .
    "10022344"  21920      .      .      .      .      .      .      .      .
    "10022352" 210491 210491 212273 212273 217660 217660 218730 218730      .
    "10022379" 201281 201281 201281 208026 208026 212614 212614 212614 212614
    "10022445" 212268 212268 212268      .      .      .      .      .      .
    "10022447" 209501 209501 209501      .      .      .      .      .      .
    "10022460"  21064      .      .      .      .      .      .      .      .
    "10022502" 208912      .      .      .      .      .      .      .      .
    "10022509"  21457 207921 207921  20911  20911      .      .      .      .
    "10022510"  21457 207921 207921 208798 208798 208798 208798 208798 208799
    "10023560" 214985 214985      .      .      .      .      .      .      .
    "10028858" 206229      .      .      .      .      .      .      .      .
    "10028910" 208471 208471      .      .      .      .      .      .      .
    "10028912" 209401      .      .      .      .      .      .      .      .
    "10028920" 217370      .      .      .      .      .      .      .      .
    "10028925"  21920      .      .      .      .      .      .      .      .
    "10028937" 209394 215110      .      .      .      .      .      .      .
    "10028944" 207318 210793 207318      .      .      .      .      .      .
    "10028946" 202880 202880 202880 202880 202880 202880      .      .      .
    "10028963" 206829      .      .      .      .      .      .      .      .
    "10028965" 208912      .      .      .      .      .      .      .      .
    "10028995" 209360 209360 209360      .      .      .      .      .      .
    "10029010" 215431      .      .      .      .      .      .      .      .
    "10029011" 208673      .      .      .      .      .      .      .      .
    "10030005" 213721      .      .      .      .      .      .      .      .
    "10034841"  22395      .      .      .      .      .      .      .      .
    "10034867" 210595 202450      .      .      .      .      .      .      .
    "10034873" 208692 208692 208692      .      .      .      .      .      .
    "10034877" 201280      .      .      .      .      .      .      .      .
    "10034879" 213137 213137 216157      .      .      .      .      .      .
    "10035788" 208051      .      .      .      .      .      .      .      .
    "10035822" 215014 217171      .      .      .      .      .      .      .
    "10039718" 203094 203100 206353 207561 210455 205395 205395      .      .
    "10039719" 205831 205831 205831 205831 205831 205831 205831      .      .
    "10039728" 205029      .      .      .      .      .      .      .      .
    "10039745" 208686      .      .      .      .      .      .      .      .
    "10039754" 209394 215110      .      .      .      .      .      .      .
    "10039757" 208692 208692 208692      .      .      .      .      .      .
    "10039766" 218197 218197      .      .      .      .      .      .      .
    "10039779" 205834 205834      .      .      .      .      .      .      .
    "10039780" 204958      .      .      .      .      .      .      .      .
    "10039800" 208401      .      .      .      .      .      .      .      .
    "10039804" 211962 211962 211962 211962 211962 211962 211962 213895      .
    "10040872" 207155      .      .      .      .      .      .      .      .
    "10045958" 203284      .      .      .      .      .      .      .      .
    "10045959" 203284      .      .      .      .      .      .      .      .
    "10045991" 211710      .      .      .      .      .      .      .      .
    "10046031"  22472  22472  22472      .      .      .      .      .      .
    "10047053" 218730 218730      .      .      .      .      .      .      .
    "10047097" 210861 210861 211710      .      .      .      .      .      .
    "10047117" 207999 207999      .      .      .      .      .      .      .
    "10052267" 210361      .      .      .      .      .      .      .      .
    "10052314" 210951      .      .      .      .      .      .      .      .
    "10052334" 212295      .      .      .      .      .      .      .      .
    "10052337" 207999 207999      .      .      .      .      .      .      .
    "10052385" 208194      .      .      .      .      .      .      .      .
    "10052386" 210132 210132      .      .      .      .      .      .      .
    "10058504" 214835 214835      .      .      .      .      .      .      .
    "10058511" 210565 210933      .      .      .      .      .      .      .
    "10058518" 215430      .      .      .      .      .      .      .      .
    "10058536" 213801      .      .      .      .      .      .      .      .
    "10058546" 210491 210491 218730      .      .      .      .      .      .
    "10058554" 210655 210655      .      .      .      .      .      .      .
    end

  • #2
    You have two obstacles to overcome in -reshape-ing this data. First, the variable names K, L, M... do not lend themselves to the -reshape- syntax. Then you have the problem that there are multiple observations for some values of Finalnumber.

    Code:
    sort Finalnumber, stable
    by Finalnumber: gen row = _n
    rename (K-S) ID#, addnumber
    reshape long ID, i(Finalnumber row)
    drop if missing(ID)
    by Finalnumber (row _j), sort: replace _j = _n
    drop row

    Comment


    • #3
      Am I correct in surmising that OP would like each row to be a unique combination of Finalnumber and the IDs in rows K-S? If so, here is some code that slightly simplifies #2 and adds a line to drop the duplicates:

      Code:
      gen `c(obs_t)' x = _n
      rename (K-S) ID#, addnumber
      reshape long ID, i(Finalnumber x)
      drop if missing(ID)
      drop x _j
      duplicates drop Finalnumber ID, force
      For the example data, #2 produces a dataset with 256 observations, while this produces 131 observations , having gotten rid of many duplicate pairs of Finalnumber ID (before the last duplicates drop line, the observation count is the same 256).
      Last edited by Hemanshu Kumar; 31 Jul 2025, 16:30.

      Comment


      • #4
        That worked - thank you both - you never cease to amaze me. Hemanshu was correct - I do want to delete duplicates but that was secondary to the reshape.

        Comment

        Working...
        X