Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting data by transposing rows in columns and deleting duplicates

    Hello all,

    I’m having a problem with my data. I have a complicated panel dataset that I somehow need to transform into a form suitable for use.
    My data looks like this
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int fyear str3 groups double values
    1966 "SH1"  -.22325588762760162
    1966 "MN1" -.006087541114538908
    1966 "MH1"  -.18323348462581635
    1966 "MH1"  -.18323348462581635
    1966 "BL1"  -.25000011920928955
    1967 "SH1"    .3140767812728882
    1967 "SN1"   .37736019492149353
    1967 "SH1"    .3140767812728882
    1967 "MH1"    .3423275351524353
    1967 "MH1"    .3423275351524353
    1967 "MH1"    .3423275351524353
    1967 "MH1"    .3423275351524353
    1967 "ML1"   .15968555212020874
    1967 "ML1"   .15968555212020874
    1967 "MN1"   .02143719792366028
    1967 "MH1"    .3423275351524353
    1967 "MN1"   .02143719792366028
    1967 "MN1"   .02143719792366028
    1967 "MH1"    .3423275351524353
    1967 "MH1"    .3423275351524353
    1967 "MN1"   .02143719792366028
    1967 "MN1"   .02143719792366028
    1967 "MH1"    .3423275351524353
    1967 "ML1"   .15968555212020874
    1967 "MN1"   .02143719792366028
    1967 "MN1"   .02143719792366028
    1967 "MH1"    .3423275351524353
    1967 "BN1"   .07543724030256271
    1967 "BN1"   .07543724030256271
    1967 "BH1"   .21663467586040497
    1967 "BH1"   .21663467586040497
    1967 "BN1"   .07543724030256271
    1968 "SH1"    .3208755850791931
    1968 "SN1"   .47685182094573975
    1968 "SH1"    .3208755850791931
    1968 "SH1"    .3208755850791931
    1968 "SH1"    .3208755850791931
    1968 "MN1"    .1500938981771469
    1968 "MN1"    .1500938981771469
    1968 "MN1"    .1500938981771469
    1968 "MN1"    .1500938981771469
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MN1"    .1500938981771469
    1968 "MH1"  .023950520902872086
    1968 "MN1"    .1500938981771469
    1968 "MN1"    .1500938981771469
    1968 "MH1"  .023950520902872086
    1968 "MN1"    .1500938981771469
    1968 "MH1"  .023950520902872086
    1968 "MN1"    .1500938981771469
    1968 "MN1"    .1500938981771469
    1968 "MN1"    .1500938981771469
    1968 "ML1"   .17206278443336487
    1968 "ML1"   .17206278443336487
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MN1"    .1500938981771469
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "MH1"  .023950520902872086
    1968 "BN1"  .012622889131307602
    1968 "BN1"  .012622889131307602
    1968 "BL1"   .07391373813152313
    1968 "BL1"   .07391373813152313
    1968 "BH1"   .03639631345868111
    1968 "BH1"   .03639631345868111
    1968 "BN1"  .012622889131307602
    1968 "BL1"   .07391373813152313
    1969 "SH1"   -.3491247296333313
    1969 "SH1"   -.3491247296333313
    1969 "SH1"   -.3491247296333313
    1969 "SH1"   -.3491247296333313
    1969 "MN1" -.025574471801519394
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    1969 "MH1"    -.227068230509758
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    1969 "MN1" -.025574471801519394
    1969 "MN1" -.025574471801519394
    1969 "MN1" -.025574471801519394
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    1969 "MH1"    -.227068230509758
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    1969 "MN1" -.025574471801519394
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    1969 "ML1"  -.25227147340774536
    1969 "MN1" -.025574471801519394
    1969 "MN1" -.025574471801519394
    1969 "MH1"    -.227068230509758
    end

    [/CODE]



    I need data that looks like this:



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float month double(SH S2L M3L M32 B4L B4L)
    249   .013085293278829024   .043213834593154976    .13737395181636264     .04166338588578065    .03745413505824836    .03745413505824836
    250  -.010761078309431446    .04028426379002013    .04205906121972218    .028631436166792486   .023240340433235886   .023240340433235886
    251  -.025670348720414855   -.05948658632200557  .0013524813327250454   -.035209416818928065  -.005922137570485326  -.005922137570485326
    252   .025518569384421833  -.002152466919142072    .01825286937343858    -.01835541474068528   -.01586764432687183   -.01586764432687183
    253    .05409068825446821    .04549920831454522    .05081598798537572     .06737072630693533    .04823178734279552    .04823178734279552
    254     .0750646423781339     .1057460498261569    .09809143373687417     .08048440998835306    .06260052676985978    .06260052676985978
    255    .12778312742039516    .02664871543993569   .005422599683839136     .12280553982899449    .07373820621476752    .07373820621476752
    256   .008686744202333544  -.023969139794021193  -.018676894072132154    -.03620668204086163   -.02748494397501544   -.02748494397501544
    257   .010906223745199256   .004452031792416378    .03161545576555367   -.009479526668080949    .03190275940377528    .03190275940377528
    258  -.004453132344389902   .001604264445831363  .0005575674718975613   -.015536869466580783  -.032951059256289904  -.032951059256289904
    259    .03842314190685747   .035391050238481915   .022842736562956812      .0199845938073069    .05247411633694113    .05247411633694113
    260    -.1359653271900494   -.18113038799349146    -.2099599805752256    -.18774235165889203   -.17709589978986195   -.17709589978986195
    261   .021973283202953255   .027219322007348334   .017842078552630185    .031135376384360584   .043805574111499594   .043805574111499594
    262    .07418239753475817      .127990906527218    .08996045716168435     .11158472232038748    .13274028547092634    .13274028547092634
    263   .044947107060367655   .019048566583838675   .031434760625073215   -.006378781377640107    .00655050680873448    .00655050680873448
    264    .07721667833214972    .05168969534023581    .07182622292685276     .07976432148878221   .056662729684025526   .056662729684025526
    265  -.009902664881596298  -.005229382606337723   -.02895773257390701    -.03387245405452595   -.05211300449495741   -.05211300449495741
    266   .017350505359865227   .058336933487940335   .044675066314672204    -.00695239205768014    .08610701739109355    .08610701739109355
    267    -.0143515579753484   .006000942901202135  -.011387061696302634   -.019686918945995866    .04390413823033178    .04390413823033178
    268   .011086857373257908    .03446907464851589   .008214832849294865    .018083191011739653    .04946719789703794    .04946719789703794
    269  -.034510546892982816  -.028455502338013004  -.054751364413745035    -.02630914718657717   -.03491548634732413   -.03491548634732413
    270  -.005024180478258432  .0032960354994681805    .06406823011416893   .0030719826264091685    .05997609859917422    .05997609859917422
    271   -.01405349136176335   .016680786428534093  -.004002054579585363     .01940945355880182    .01058552157680562    .01058552157680562
    272  .0020244656463696917   .027095637762064015    .06375296607151949    .037711660993637784    .08618185747775722    .08618185747775722
    273   .023184706901685795    .04771840100157344 -.0022234269397921544     .03952893505699534   .019739575130015573   .019739575130015573
    274    .06800372511153134    .05762335046755877    .08913408089546183     .04803368988401342   .018637278547448885   .018637278547448885
    275   .016797121720875375   -.03231449622412838    .01596297789430707    .008372106853425522 -.0018624634602148736 -.0018624634602148736
    276    .05802958901914752    .05688608183373093   .016958490654300246    .011412316150891052   .014649689809565708   .014649689809565708
    277    .06750318715088904    .04631179574830378    .05473496072092335    .051586417380578654    .02322460573960605    .02322460573960605
    278     .0563515742934172    .05926706652341497   .011875657097565509     .05767402861621516    .02829345155636642    .02829345155636642
    279    .08509477725829391    .09498273957710798   .037276059867680005      .0667429255330694    .09341548290610403    .09341548290610403
    280   .003556912806899746  -.010445300092251574  -.011123060940845308 -.00039168975627573417  -.026478917383886143  -.026478917383886143
    281    .03316125623629502   .019642182500002024    .07690264988331053    .030246161019149464    .04155693784432545    .04155693784432545
    282   .013708633832429982   .006729926157992514  -.029216424968571346     -.0199193723949751   -.05200942183616758   -.05200942183616758
    283   .044686537416901956   .029736596817330874   .005737534634672095   .0045426046611280494  -.005727512180660616  -.005727512180660616
    284   .024073572786044398  -.049350569306873766  -.008796933075507614   -.017872252568627397  -.007415661973655681  -.007415661973655681
    285 -.0027347286095505846  -.016941421344635173    -.0438101719995445   -.015979672575567778   -.05397896180081212   -.05397896180081212
    286   .050540130716912074    .05226640656135462    .08725252836243383    .053529873782192344    .04231391167586649    .04231391167586649
    287    .05433277248504494    .06343612608002257   .039172595793047486    .030435922753765846  .0003820783452133975  .0003820783452133975
    288    .10277847648114499    .04593689322767183     .0424199738172978     .07038246749702022    .06308698523203497    .06308698523203497
    289    .05801715816121845  -.004827321619877023  -.006740680263919464     .01827833193041155   -.03291100027235284   -.03291100027235284
    290    .05811094388361548   .044852653058495964    .06221077124384147    .013155843972419468    .08149994627191463    .08149994627191463
    291   .005781728647378333    .06185748848230609  .0014240542434305905   -.010306472920005345    .05276607615098789    .05276607615098789
    292   -.06528027952983607   -.14523084400456748   -.09548521108434924    -.10254704154358152   -.14798305786577565   -.14798305786577565
    293   .023726384446179638    .03895554946471631  -.014962320060009976    .012026415127858148   .012342807117401006   .012342807117401006
    294   -.04624199598565815  -.055901101626789365   -.07104761519450166   -.012640646334742966  -.053000641126315265  -.053000641126315265
    295    .05359165892504389    .12539867923482068     .1186055840381931       .104971375817556    .09732996663957122    .09732996663957122
    296   .014901206746153878    .04719088505395034    .02432745342497175   -.001755837132807574   .046785571346625504   .046785571346625504
    297    .03254359000004548   -.02373613410037965  .0036691094842736804     .04807590074767893    .03613321883596372    .03613321883596372
    298      .031361262944543   .039425949291760375    .06877284757341658    .024816504738325307   .011244519266599357   .011244519266599357
    299    .05499670044335564    .04186204818431049   .023573369796313987     .04546027697861555    .01730154130533991    .01730154130533991
    300    .04678318495101721     .0053781790937244   .014864580597430163    .010020312923232251    .01376634105424641    .01376634105424641
    301   .025503396256475214  .0018161019808185241   .025361752680606055   -.005401331211493745   -.01828354676020467   -.01828354676020467
    302   .046640515561882985   .014736342546045963    .03834748290154105     .03360201949173653   .004548424722990824   .004548424722990824
    303    .02896630093379964   -.02533370604894376   .020346796323145577    .019311443376553786   .023215771036701253   .023215771036701253
    304    .04716395311953942 -.0050575510878749135  -.015260965872649596     .01974863385618529  -.016659308868899613  -.016659308868899613
    305   -.03753573901379101   -.09094436183437503   -.17112155967717493    -.05550994714920964   -.12382954208880002   -.12382954208880002
    306    .02776512726068985  -.009101781648870063   .038735192561823575    .026733082559168928  -.016527056982145592  -.016527056982145592
    307    .05964045073232712     .0810787464607854    .12637136628194923     .06184588486024515    .10310591956532511    .10310591956532511
    308   .023504953931679593   -.07795177115692957   -.05031385974419384    .000584508620759631   -.06881267557965907   -.06881267557965907
    309   .056244742984553016    .09014614681732468    .09800337245587205     .07330367475822509    .10500273744008758    .10500273744008758
    310    .04012898794237876    .06294463865175666   .016580563508647884     .05840778643158667   .022983783720635004   .022983783720635004
    311     .0092170348874242   .010850365163936266  -.034934306308784635    -.02865467044187225  -.014720929410922373  -.014720929410922373
    312   .045628208347674976   .058953089884113595    .03010351183052394    .027941730387958733  -.006738022861841171  -.006738022861841171
    313    .09771657057312991   .046746312101231126    .13679522092542398     .10941975507918375   .059550727412862284   .059550727412862284
    314    .04229907982048955    .07127324707759987    .10611568526412621     .06044605279129428    .03959011828179901    .03959011828179901
    315   .060481820703882035    .06220901110746151   .049280223437120754     .06336486877300034  -.025624922266221407  -.025624922266221407
    316   .016054084741766967   -.03607509576501517  -.013872107701448345    -.05741245283093963  -.025532505509810073  -.025532505509810073
    317     .0627639533095392   .023176095355540455   .022174087190704416     .02249024967798668  -.014975035599480761  -.014975035599480761
    318 .00010812688317715691  -.042225403565166575   -.05509641402048942    -.05656869189742485  -.040030824282917966  -.040030824282917966
    319    .02620598073435621   .016638627588479216   .032446342607592105    .009399380322661832    .09329625769288454    .09329625769288454
    320  -.004691655624783534   -.04384402523730465   -.03652470225883008    -.02807499600819354   -.06009856156404521   -.06009856156404521
    321    .06924299821477752    .05164751810367868   .023578286633306217    .026817339286555865  -.013009338381035305  -.013009338381035305
    322    .06426385210863778 .00039311164330313453   -.01231313240483168     .04639721956349197    .01185676613854345    .01185676613854345
    323    .05130952452859716   .030826633644287598   .007220811196914111    .019018369431183086   .008765029377115144   .008765029377115144
    324    .09440981569658911    .09717989577323265     .0891089497416361     .09457219541735532    .08510123379633967    .08510123379633967
    325    .10959823503083678    .10666974930905732     .0798158867939887       .087573298132891    .07552298012062808    .07552298012062808
    326    .10594966000441573    .09241258493079473    .06994563215371587    .047659201043082716    .02915262352758573    .02915262352758573
    327    .05592019255231646   .026791310091287955    .05474054299127984     .01991038965862163   .021942420481683926   .021942420481683926
    328    .10692756857192112    .08504418774022329    .10331214267383254     .06505360752743553    .07805625936894589    .07805625936894589
    329    .14923940249351683    .10478185889354355    .07186140829151685     .06466583828099695   .057503424959341076   .057503424959341076
    330     .2357858206240616   .017374827646117544     .0734156810054387     .06430904401181901     .0779366365673496     .0779366365673496
    331  -.026523043627809845  -.024680372603949563   -.06382544120794884   -.055849841943913764   -.06467869335887248   -.06467869335887248
    332   .055374274809966116     .0421406978396363   .045636116626878755     .04777613965051476    .04871604982507179    .04871604982507179
    333   -.15716309340963597    -.2763494000088027    -.3164510427377467     -.2893827337711866    -.3288171204972565    -.3288171204972565
    334   -.13519003435828833   -.18010083479677766   -.19028332235586623    -.10906413841915724   -.12269510344206447   -.12269510344206447
    335    .03692319639711508    .14945650230764543    .10058439164479384       .141509104936201    .13323449165738066    .13323449165738066
    336    .07982122225829237    .09546397041573394    .09633667072096026     .05497131160364418    .08256136397007738    .08256136397007738
    337    .02487250178247317  -.013125019594807724   .024615241144945038   -.008797869927619597  .0016645075586112648  .0016645075586112648
    338   .007327707521404316 -.0001666939118576758  -.017112423493923695   -.027466555816139833  -.006465386786613976  -.006465386786613976
    339    .04631333843943443   .059161511448835656   .020501962867003024    .011673444098716613    .03692600891489216    .03692600891489216
    340   .038035624560501025   .028810464120642542   .014686293362589753   -.004820081712869195  .0031548844599746865  .0031548844599746865
    341    .05914303487416045     .0300501116334989    .09603910169311419     .06092176246945325    .04373728024355297    .04373728024355297
    342      .039280797190077   .009507104970949925    .00380132244907766     .02062655129216157      .029800892983492      .029800892983492
    343  -.006753780590978213   -.08538206582196387  -.049960744175810624   -.056983295933788355   -.06781405500967166   -.06781405500967166
    344   .004999850116851208   .010216853318943037   .005638312328979003    .013766080287731238   .015252539594494649   .015252539594494649
    345   .032111642649071855   .015401023799762461    .06269977044118774     .04130339891963157    .02605305480046469    .02605305480046469
    346  -.004458680841601492    -.0754133551041799    -.0400444556129314    -.03516153758303091  -.046263543186999954  -.046263543186999954
    347  -.020202847550834072 -.0033074352989729485   -.06425668840690041   -.024437622936054216   -.02663165126117664   -.02663165126117664
    348    .08466564254059829    .10469605532495581    .08710191355269122     .13001532674776045     .1463244343031201     .1463244343031201
    end
    format %tm month

    The substantive problem is that the rows have duplicates of years. Because of duplicated years, you will see duplicated groups and values for each year. I want no rows that have duplicates of years, but a dataset that has multiple rows representing the years and columns that represent e.g. MH1, where the rows of that particular column gives the value of that group in a particular year. My real dataset has somewhat 60 groups... I really appreciate any kind of help. Thank you in advance.

  • #2
    duplicates drop will do what it says. Otherwise you're better off with this layout unless you can tell us why not.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      duplicates drop will do what it says. Otherwise you're better off with this layout unless you can tell us why not.
      Hello Nick,

      Even if I drop duplicates, duplicating years will still be there due to multiple groups e.g. 1968.

      You need the other layout because I will need to make new variables which are based on some specific groups each times and this layout won't help me get what I want later on with variable creation per year and regressions etc.
      Last edited by Jonathan Smits; 08 May 2017, 09:11.

      Comment


      • #4
        Jonathan: I am confused.

        Are Jan Filips and Jonathan Smits one and the same person?

        If so, why post under two accounts? Everyone is asked to use a full real name here.

        If not, then you're answering in terms of someone else's problem.

        When people new to Stata tell me that variable creation will be difficult with a long layout, my usual answer is to ask for details, as in my experience wide poses many more difficulties than long.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Jonathan: I am confused.

          Are Jan Filips and Jonathan Smits one and the same person?

          If so, why post under two accounts? Everyone is asked to use a full real name here.

          If not, then you're answering in terms of someone else's problem.

          When people new to Stata tell me that variable creation will be difficult with a long layout, my usual answer is to ask for details, as in my experience wide poses many more difficulties than long.
          No Im replying you. If I would like to create variables and make regressions with that layout it won't help me get what I want, so I think having the layout from the first dataex. isn't doing nobody any good.

          EDIT: Try to drop the duplicates, there still will be multiple years due to multiple groups.
          Last edited by Jonathan Smits; 08 May 2017, 09:13.

          Comment


          • #6
            I don't see that you've answered the question about who you are.

            You're preferring your guess to mine, so go with it. Given that, I think I am absolved from further advice here.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              I don't see that you've answered the question about who you are.

              You seem to be preferring your guess to mine, so go with it. Given that, I think I am absolved from further advice here.
              I am Jonathan Smits, not the OP.

              Comment


              • #8
                OK, but it was confusing to me at least that you answered without explaining that.

                If you've got the same question, then the same advice applies so far as I am concerned.

                Sorry if I seem awkward here, but what is different about your situation is still unclear to me: "creating variables" could mean anything and regression with panel data requires long layout unless you explain otherwise.

                So, I still can't add anything to what I said in #2.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  OK, but it was confusing to me at least that you answered without explaining that.

                  If you've got the same question, then the same advice applies so far as I am concerned.

                  Sorry if I seem awkward here, but what is different about your situation is still unclear to me: "creating variables" could mean anything and regression with panel data requires long layout unless you explain otherwise.

                  So, I still can't add anything to what I said in #2.
                  English isn't my native language, so excuse me if I am not clear enough. I do not have the same question. What I meant with ''creating variables'' is that when you create new variables, STATA puts the new variables automatically in the columns, right?

                  I think the wide format is more practical, what makes it better compared to a chaotic long format.

                  Maybe I do not know anything about it, but a long-format seems totally impractical to me;

                  http://stats.idre.ucla.edu/stata/mod...-long-to-wide/



                  Comment


                  • #10
                    New variables are indeed, in spreadsheet terms,,new columns. If you want or expect something different, then I don't understand the question.

                    I don't especially like invoking experience rather than giving a concrete answer, but far from being "totally impractical" long layout is, in my experience, immensely preferable for panel data. What's chaotic about it?

                    To get a better answer from me or anybody else, you'd need to give details of your variables and a data example in this thread and explain what your new variables are to be.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      New variables are indeed, in spreadsheet terms,,new columns. If you want or expect something different, then I don't understand the question.

                      I don't especially like invoking experience rather than giving a concrete answer, but far from being "totally impractical" long layout is, in my experience, immensely preferable for panel data. What's chaotic about it?

                      To get a better answer from me or anybody else, you'd need to give details of your variables and a data example in this thread and explain what your new variables are to be.
                      I do not have data to give an example, but I think it's best if I stop replying because it isn't helping OP.

                      Comment


                      • #12
                        Wow, so much posts.


                        I tried to do:

                        Code:
                        reshape wide groups, i(fyear)  j(values)
                        But it gave me an error:

                        Code:
                        groups-.387081116437912 invalid variable name
                        r(198);

                        That number is one of the 'values' in one of the rows...

                        Comment


                        • #13
                          As already pointed out in #2 you need to drop duplicates first.

                          Your syntax is the wrong way round as it's the values of values that are being reshaped and it's fyear and groups that tell you about their identifiers. This works.

                          Code:
                          duplicates drop 
                          reshape wide values, i(fyear) j(groups) string
                          Whether this is a good idea has already been wondered.

                          Comment


                          • #14
                            Originally posted by Nick Cox View Post
                            As already pointed out in #2 you need to drop duplicates first.

                            Your syntax is the wrong way round as it's the values of values that are being reshaped and it's fyear and groups that tell you about their identifiers. This works.

                            Code:
                            duplicates drop
                            reshape wide values, i(fyear) j(groups) string
                            Whether this is a good idea has already been wondered.
                            Hi,

                            I will try if it worked out..
                            Last edited by Jan Filips; 08 May 2017, 11:02.

                            Comment


                            • #15
                              Jan:

                              I can't read your screenshot easily as the column headers have disappeared. We do ask that you don't post screenshots. FAQ Advice #12 applies as always.

                              The implication of a missing value is that for one or more of your groups there was no value for particular years. In a long layout that could be just tacit as a value not in the dataset.

                              This isn't a limitation of the reshape. It can't use values it can't see in your data.

                              Comment

                              Working...
                              X