Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a dummy out of a string

    Hi all,

    my db looks as follows:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long docdb_family_id strL tech_fields_cited int earliest_publn_year
    1187498 "['Y02P  20/10']"                                                                                                                                       1998
    1226468 "['Y10T 156/1023']"                                                                                                                                     1992
    1236571 "['Y02B  30/13' 'Y02B  30/12' 'Y02E  60/14' 'Y02B  10/70']"                                                                                             1991
    1239098 "['Y10S 426/805' 'Y02A  40/81']"                                                                                                                        1992
    1239277 "['Y02E  60/10' 'Y02T  90/12' 'Y02T  10/7072' 'Y02T  90/14' 'Y02T  10/70']"                                                                             1993
    1239483 "['Y10T  29/49014']"                                                                                                                                    1996
    1239622 "['Y10T  29/49833' 'Y10S 100/913' 'Y10S  52/06' 'Y10T  29/5343']"                                                                                       1997
    1239624 "['Y10S 128/903']"                                                                                                                                      1994
    1239749 "['Y10T 428/31938' 'Y10T 428/31692' 'Y10T 428/24355' 'Y10T 428/31']"                                                                                    1993
    1340405 "['Y10T 137/7759' 'Y10T 137/7761']"                                                                                                                     1992
    1340418 "['Y02T  10/7072' 'Y02T  90/12' 'Y02T  90/14' 'Y02T  10/70']"                                                                                           1995
    1340462 "['Y10T 156/1052']"                                                                                                                                     1994
    1340471 "['Y10T  83/9317' 'Y10S  76/02']"                                                                                                                       1993
    1340485 "['Y10S 362/80' 'Y10S 362/806']"                                                                                                                        1995
    1340488 "['Y10T 428/249981' 'Y10T 442/15' 'Y10T 428/249985' 'Y10T 428/249984' 'Y10T 428/249982']"                                                               1994
    1340508 "['Y10T 428/30' 'Y10T 428/31663' 'Y10T 428/249928' 'Y10S 428/902']"                                                                                     1993
    1340519 "['Y10S 297/08' 'Y10T 137/86517' 'Y10T 137/86509' 'Y10T 137/86501']"                                                                                    1995
    1340541 ""                                                                                                                                                      1994
    1340647 "['Y10S 435/968']"                                                                                                                                      1996
    1340659 "['Y02T  10/72' 'Y02T  10/64']"                                                                                                                         1995
    1340673 "['Y10T 428/2998' 'Y10T 428/2991']"                                                                                                                     1996
    1340826 "['Y02T  10/30' 'Y02T  10/12' 'Y02E  20/14']"                                                                                                           1995
    1341006 "['Y02E  10/30']"                                                                                                                                       1997
    1341170 "['Y10S 148/124']"                                                                                                                                      1996
    1341175 "['Y10T  29/49146' 'Y10T  29/49171']"                                                                                                                   1998
    1578920 "['Y10T 403/472' 'Y10T 403/4602']"                                                                                                                      1971
    1763337 "['Y10T 428/3154' 'Y10T 428/31663' 'Y10T 428/31794' 'Y10T 428/31507']"                                                                                  1995
    2463680 "['Y10S 526/91' 'Y10S 526/911']"                                                                                                                        1995
    3411930 "['Y10S 242/03']"                                                                                                                                       1996
    3459918 "['Y10S 428/909' 'Y10T 428/233' 'Y10T 428/24355' 'Y10T 428/24479' 'Y10T 428/249953' 'Y10T 428/24496' 'Y10T 428/24182' 'Y10T 428/239' 'Y10T 428/24174']" 1998
    3460237 "['Y02P  70/50']"                                                                                                                                       1995
    3460621 "['Y02E  10/50']"                                                                                                                                       2008
    3460785 "['Y02T  30/00']"                                                                                                                                       2001
    3460796 "['Y02T  10/12']"                                                                                                                                       2001
    3460811 "['Y10T  70/7904' 'Y10T  70/7057' 'Y10T  70/5863']"                                                                                                     2002
    3460812 "['Y10S 514/963' 'Y10S 514/965' 'Y10T 428/2985']"                                                                                                       2004
    3460829 "['Y02T  10/30']"                                                                                                                                       2002
    3460972 "['Y10S 426/806' 'Y10S 203/18']"                                                                                                                        1998
    3460979 "['Y02E  60/10']"                                                                                                                                       2000
    3460980 "['Y10T  24/12' 'Y10T  24/44769']"                                                                                                                      2000
    3460981 "['Y10T 403/42' 'Y10T 403/347' 'Y10T 403/341' 'Y10T 403/342' 'Y10T 403/73' 'Y10T 403/343']"                                                             2000
    3461010 "['Y10T 428/24124' 'Y10T 428/26' 'Y10T 428/24' 'Y10T 428/12333']"                                                                                       2000
    3461018 "['Y02P  10/20']"                                                                                                                                       2002
    3461040 "['Y10T  74/1967']"                                                                                                                                     2001
    3461044 "['Y10T  74/1836' 'Y10T  74/2057']"                                                                                                                     1999
    3461088 "['Y10S 606/907']"                                                                                                                                      1999
    3461099 "['Y10T 137/9464']"                                                                                                                                     2001
    3461127 "['Y10T 428/26' 'Y10T 428/12924' 'Y10T 428/12569']"                                                                                                     2003
    3461172 "['Y10S   5/928' 'Y10S   5/90']"                                                                                                                        2000
    3461192 "['Y02W  30/62']"                                                                                                                                       2002
    3461205 "['Y02T  10/90' 'Y02E  10/728' 'Y02T  10/7072' 'Y02E  10/74']"                                                                                          2005
    3461208 "['Y02E  10/72' 'Y02B  10/30']"                                                                                                                         2001
    3464855 "['Y10S 516/906']"                                                                                                                                      1974
    3466402 "['Y02T  10/12']"                                                                                                                                       1974
    3467767 "['Y02E  10/30' 'Y02E  10/20']"                                                                                                                         1980
    3467833 "['Y02E  10/46']"                                                                                                                                       1980
    3467977 "['Y10T 428/218']"                                                                                                                                      1976
    3469171 "['Y10T  24/2593' 'Y10T  24/2598']"                                                                                                                     1976
    3469272 "['Y10S 623/90']"                                                                                                                                       1978
    3470490 "['Y10T 403/7003']"                                                                                                                                     1979
    3471419 "['Y10T  74/19121']"                                                                                                                                    1979
    3471545 "['Y02T  10/12']"                                                                                                                                       1978
    3472136 "['Y10T 137/3127' 'Y10T 137/86236']"                                                                                                                    1980
    3472994 "['Y10S 292/57' 'Y10T 292/0934']"                                                                                                                       1981
    3473428 "['Y10S 623/90']"                                                                                                                                       1980
    3474014 "['Y02B  10/20' 'Y02B  10/70']"                                                                                                                         1986
    3474470 "['Y10T  29/53783']"                                                                                                                                    1980
    3475793 "['Y10T 403/7015' 'Y10T 403/16']"                                                                                                                       1984
    3477524 "['Y10S 507/931' 'Y10S 507/93']"                                                                                                                        1984
    3477979 "['Y02T  10/12']"                                                                                                                                       1985
    3478203 "['Y02E  10/30' 'Y10T 428/1324' 'Y02E  10/20']"                                                                                                         1991
    3478301 "['Y10S 165/90' 'Y10S 165/905']"                                                                                                                        1989
    3478373 "['Y02A  40/90']"                                                                                                                                       1987
    3478451 "['Y02T  70/10']"                                                                                                                                       1988
    3478460 "['Y10T 137/87957']"                                                                                                                                    1990
    3478484 "['Y02T  70/10']"                                                                                                                                       1989
    3478514 "['Y10S  52/10']"                                                                                                                                       1992
    3478520 "['Y10S  55/30' 'Y02A  50/20' 'Y02T  10/12']"                                                                                                           1990
    3478534 "['Y02B  20/40']"                                                                                                                                       1992
    3478653 "['Y10S 150/90']"                                                                                                                                       1990
    3478700 "['Y02T  10/12' 'Y02T  10/30']"                                                                                                                         1991
    3478703 "['Y02A  10/23']"                                                                                                                                       1991
    3478717 "['Y02T  10/12']"                                                                                                                                       1993
    3478720 "['Y10T 137/86445' 'Y10T 137/87161' 'Y10T 137/87716']"                                                                                                  1992
    3478738 "['Y10S 424/08' 'Y10S  71/04' 'Y10T 428/2982']"                                                                                                         1993
    3478763 "['Y02W  30/80']"                                                                                                                                       1991
    3478796 "['Y02P  10/20']"                                                                                                                                       1991
    3478814 "['Y10T  29/53913']"                                                                                                                                    1994
    3478822 "['Y10S 315/05']"                                                                                                                                       1994
    3478830 "['Y10T 137/86823' 'Y10T 137/5109']"                                                                                                                    1994
    3478833 "['Y10S 427/105' 'Y10S 427/103']"                                                                                                                       1994
    3478863 "['Y02E  60/10']"                                                                                                                                       1994
    3478875 "['Y10S  55/17']"                                                                                                                                       1997
    3478921 "['Y10S   5/934']"                                                                                                                                      1998
    3478946 "['Y10S 128/918']"                                                                                                                                      1996
    3478957 "['Y10S 160/90']"                                                                                                                                       1998
    3478981 "['Y10T  16/3543']"                                                                                                                                     1999
    3478991 "['Y02W  30/62']"                                                                                                                                       1997
    3479012 "['Y10S 128/918']"                                                                                                                                      1998
    3479021 "['Y02E  60/10']"                                                                                                                                       1998
    end
    In particular, as you can see I have a string called tech_fields_cited which consists of a string composed by many acronyms (e.g.: ['Y02B 30/13' 'Y02B 30/12' 'Y02E 60/14' 'Y02B 10/70'] has 'Y02B 30/13', 'Y02B 30/12', 'Y02E 60/14' and 'Y02B 10/70'). What I would like to do is to create N dummy variables (being N the total number of acronyms in the string tech_fields_cited) taking 1 whenever a docdb has that acronym inside the variable tech_fields_cited.
    So, for instance, taking only the first 3 observations for the sake of simplicity, the result should be something like:

    Code:
    docdb       tech_fields_cited                                                         dummy_Y02P_20/10      dummy_Y10T_156/1023   dummy_Y02B_30/13    dummy_Y02B  30/12    dummy_Y02E_60/14   dummy_Y02B_10/70          
    1187498 "['Y02P  20/10']"                                                                   1                                    0                                         0                                  0                                    0                                 0                                                            
    1226468 "['Y10T 156/1023']"                                                             0                                   1                                         0                                  0                                    0                                  0                                                          
    1236571 "['Y02B  30/13' 'Y02B  30/12' 'Y02E  60/14' 'Y02B  10/70']"              0                                   0                                         1                                 1                                    1                                 1
    Of course it might be that two docdb has the same tech_field in the string. In that case, only one dummy should be created having a 1 for both of them.
    Thank you
    Last edited by Federico Nutarelli; 16 Jan 2023, 07:47.

  • #2
    Perhaps this does what you want, although for my own use I would not reshape the long dataset into a wide dataset. In any event, the reshape wide may well fail on your entire dataset if you have too many distinct values of citations.

    Do note that variable names cannot contain a "/" in the way that you specified.
    Code:
    generate tfc = ustrregexra(tech_fields_cited,"' *'",";")
    replace  tfc = ustrregexra(tfc,"[\[\]']","")
    split tfc, parse(";")
    drop tfc
    reshape long tfc, i(docdb_family_id) j(j)
    replace tfc = "none" if missing(tfc) & j==1
    drop if missing(tfc)
    drop j
    replace tfc = ustrregexra(tfc,"[ /]+","_")
    generate dummy_ = 1
    reshape wide dummy_, i(docdb_family_id) j(tfc) string
    quietly recode dummy_* (.=0)
    Last edited by William Lisowski; 16 Jan 2023, 09:32.

    Comment

    Working...
    X