Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matched two datasets to create a control sample

    Dear statlist,

    I am using stata 12
    I am having trouble with two datasets I am trying to match (one cross sectional one panel data)

    I would like to create a control sample to the data on my cross sectional data thanks to the data on my panel data.
    The control group need to be macthed based on the size(the closest), variable TOTALASSET) and the sector (ICB2). ; The final purpose is to use the ttest command between the two groups
    I tried the command rangejoin but it didn’t work
    Here the two datasets

    The data of interest (Crossectional)


    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 ISIN int YEAR str10 TOTALASSET float ICB2 str8(EBITDAN EBITDAN_2 EBITDAN_3 EBITDAN_4)
    "FR0000120404" 2007 "10635000" 5 "1639000" "1053000" "1034000" "1084000"
    "FR0000120404" 2007 "10635000" 5 "1639000" "1053000" "1034000" "1084000"
    "FR0000051732" 2007 "4918100" 6 "376300" "529500" "393400" "311700"
    "FR0000130213" 2007 "12613000" 5 "1077000" "1349000" "1337400" "1183200"
    "FR0011056027" 2007 "4317000" 1 "594000" "315000" "129000" "-474000"
    "FR0000064388" 2007 "10961" 5 "-2471" "-2255" "583" "-3593"
    "FR0000127771" 2007 "43657000" 5 "6650000" "6166000" "4901000" "3587000"
    "FR0000076861" 2008 "9702" 6 "618" "-31" "16" "NA"
    "FR0010478248" 2008 "298300" 3 "-2400" "24400" "21200" "31500"
    "FR0000060873" 2008 "880253" 3 "21835" "61710" "12627" "20167"
    "FR0010095596" 2008 "42367" 4 "-20845" "-13025" "-6876" "-4700"
    "FR0000061129" 2008 "445665" 4 "79333" "38001" "44635" "52339"
    "FR0000121725" 2008 "9744147" 2 "586202" "490277" "511093" "538145"
    "FR0000121857" 2008 "2232153" 3 "154168" "171671" "166094" "162758"
    "FR0000124414" 2008 "540052" 2 "31038" "41522" "-9202" "30699"
    "FR0000033888" 2008 "218729" 2 "15292" "15235" "14329" "23566"
    "FR0012612646" 2008 "979646" 5 "86909" "133228" "126265" "146888"
    "FR0004007813" 2008 "1306124" 3 "69039" "164137" "128254" "98424"
    "FR0000120537" 2008 "40427000" 2 "4526000" "3746000" "3310000" "2776000"
    "FR0000032278" 2008 "891263" 2 "26136" "51739" "45202" "39856"
    "FR0000053225" 2008 "1419200" 5 "272200" "305300" "315200" "320300"
    "FR0004048734" 2008 "6361" 3 "74" "1526" "-303" "-895"
    "FR0000052870" 2008 "1813828" 2 "223206" "168593" "158721" "135390"
    "FR0000121691" 2008 "695118" 5 "42240" "101421" "152945" "118076"
    "FR0000124570" 2008 "1940763" 3 "139980" "222564" "208243" "151474"
    "FR0004044600" 2008 "48666" 5 "5231" "5333" "6063" "4654"
    "FR0000131906" 2008 "63579000" 3 "4077000" "6366000" "6737000" "7036000"
    "FR0010451203" 2008 "9946800" 2 "669700" "588500" "360400" "312000"
    "FR0000073272" 2008 "16541000" 2 "496000" "735000" "5000" "293008"
    "FR0000120107" 2008 "2774491" 3 "188556" "264779" "206518" "251316"
    "FR0000121972" 2008 "23875000" 2 "3263000" "2485800" "1990100" "1637400"
    --more--



    The data used to create the control group (cross sectional)


    dataex Name ISIN year TOTALASSET ICB2 NETSALES OPERATINGINCOME CURRENTLIABILITIES PPE TOTALASSET EBITDA SICCODE1

    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str21 Name str12 ISIN int year double TOTALASSET float ICB2 long(NETSALES OPERATINGINCOME CURRENTLIABILITIES PPE EBITDA) str4 SICCODE1
    "SOLVAY SA " "BE0003470755" 2006 10595000 1 9399000 1120000 3825000 3869000 1478000 "2821"
    "SOLVAY SA " "BE0003470755" 2007 10656000 1 9572000 1203000 4180000 3885000 1741000 "2821"
    "SOLVAY SA " "BE0003470755" 2008 11615000 1 9490000 893000 4513000 4218000 1201000 "2821"
    "SOLVAY SA " "BE0003470755" 2009 12059000 1 5694000 198000 7471000 3921000 . "2821"
    "SOLVAY SA " "BE0003470755" 2010 13428000 1 6796000 892000 8809000 3698000 653000 "2821"
    "SOLVAY SA " "BE0003470755" 2011 18657000 1 8109000 706000 7373000 5652000 973000 "2821"
    "SOLVAY SA " "BE0003470755" 2012 17782000 1 12831000 1036000 6728000 5393000 1916000 "2821"
    "SOLVAY SA " "BE0003470755" 2013 17931000 1 10367000 740000 7242000 4679000 1552000 "2821"
    "SOLVAY SA " "BE0003470755" 2014 17184000 1 10629000 978000 6365000 5386000 1245000 "2821"
    "SOLVAY SA " "BE0003470755" 2015 24270000 1 11047000 1025000 6613000 6946000 1559000 "2821"
    "SOLVAY SA " "BE0003470755" 2016 23255000 1 11403000 1063000 6597000 6472000 1838000 "2821"
    "SOLVAY SA " "BE0003470755" 2017 20375000 1 10891000 1117000 6057000 5433000 1804000 "2821"
    "SOLVAY SA " "BE0003470755" 2018 20877000 1 11299000 1142000 6574000 5454000 1844000 "2821"
    "SOLVAY SA " "BE0003470755" 2019 20238000 1 11227000 1129000 6272000 5919000 1247000 "2821"
    "SOLVAY SA " "BE0003470755" 2020 16004000 1 9714000 829000 4484000 5122000 1686000 "2821"
    "MONTEA SCA " "BE0003853703" 2006 . 2 . . . . . "6512"
    "MONTEA SCA " "BE0003853703" 2007 143812 2 11329 6405 . 136439 13640 "6512"
    "MONTEA SCA " "BE0003853703" 2008 224281 2 17548 9136 . 211015 -4162 "6512"
    "MONTEA SCA " "BE0003853703" 2009 216264 2 19186 10176 . 206490 -7314 "6512"
    "MONTEA SCA " "BE0003853703" 2010 258799 2 20530 11580 . 234447 10084 "6512"
    "MONTEA SCA " "BE0003853703" 2011 269482 2 23591 13117 . 253218 2511 "6512"
    "MONTEA SCA " "BE0003853703" 2012 307498 2 25353 13907 . 289983 -64 "6512"
    "MONTEA SCA " "BE0003853703" 2013 339797 2 24766 16725 . 320196 19489 "6512"
    "MONTEA SCA " "BE0003853703" 2014 453867 2 26995 17111 . 421660 11480 "6512"
    "MONTEA SCA " "BE0003853703" 2015 549685 2 34295 22762 . 517434 29934 "6512"
    "MONTEA SCA " "BE0003853703" 2016 594759 2 46809 35549 . 545234 40875 "6512"
    "MONTEA SCA " "BE0003853703" 2017 748426 2 41562 30088 . 719308 43814 "6512"
    "MONTEA SCA " "BE0003853703" 2018 949477 2 49886 37787 . 910022 70658 "6512"
    "MONTEA SCA " "BE0003853703" 2019 1193698 2 65497 50739 . 1160820 117420 "6512"
    "MONTEA SCA " "BE0003853703" 2020 1398921 2 69596 54607 . 1359665 169221 "6512"
    "CELYAD ONCO " "BE0974260896" 2006 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2007 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2008 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2009 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2010 17495 4 1516 -8711 6476 592 -5528 "8099"
    "CELYAD ONCO " "BE0974260896" 2011 13813 4 0 -12010 3650 355 -8377 "8099"
    "CELYAD ONCO " "BE0974260896" 2012 12485 4 54 -14994 2337 383 -12003 "8099"
    "CELYAD ONCO " "BE0974260896" 2013 32386 4 0 -13054 22602 243 -11035 "8099"
    "CELYAD ONCO " "BE0974260896" 2014 43976 4 146 -20850 32935 598 -15561 "8099"
    "CELYAD ONCO " "BE0974260896" 2015 159525 4 3 -29994 109419 1136 -28037 "8099"
    "CELYAD ONCO " "BE0974260896" 2016 138806 4 8523 -30591 85367 3563 -22040 "8099"
    "CELYAD ONCO " "BE0974260896" 2017 77626 4 3540 -29234 36394 3290 -55789 "8099"
    "CELYAD ONCO " "BE0974260896" 2018 94299 4 3115 -39066 51692 3014 -36575 "8099"
    "CELYAD ONCO " "BE0974260896" 2019 89836 4 6 -34451 42836 5061 -28085 "8099"
    "CELYAD ONCO " "BE0974260896" 2020 . 4 . . . . . "8099"
    "ALLIANCE DEVELOP " "BE0974269012" 2006 53769 2 2438 -571 11374 42324 1463 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2007 62416 2 2872 -4820 12344 50021 -2809 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2008 57572 2 2479 -1990 17561 39975 -105 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2009 42513 2 2115 -1143 8979 31199 1556 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2010 50079 2 495 -1782 7349 30744 -1635 "6513"
    end




    I would be very gratefull if you could help me with this

    Thanks a lot,
    Najiba

  • #2
    In principle this is not difficult. It is made somewhat harder because the data sets themselves seem to be somewhat haphazardly created and need to be cleaned before you try to do anything with them.

    For starters, the first data set contains a duplicate observation for ISIN FR0000120404 in year 2007. This should not happen and indicates that something went wrong in the data management that created this data set. While it is easy enough to remove duplicate observations with the -duplicates drop- command, you should not just do that. When one thing has gone wrong, there may be other errors that have not yet surfaced. So you should carefully review how these data sets were created and try to correct errors. In addition to the duplicate observation, the variable TOTALASSETS in the first data set is a string variable. That is unworkable for computations in general, and, in particular, will prevent you from using it to match to the corresponding variable in the second data set, which, properly, is numeric. So that variable hast to be destrung. I imagine the same is true of the EBIDTA* variables: they are stored as strings but need to be numeric for your purposes. Finally, the variable year is called YEAR in one data set and year in the other. You need to have them consistent.

    Concerning the matching itself, you do not say whether you also want the matched observations to be from the same year. Typically, that is what is wanted, so the code below assumes you want this, too.

    In the code below, I have put a few lines of code to patch the problems in the first data set just before I save it. But, again, you should not do that. You should properly fix the data management that created this data set.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 ISIN int YEAR str10 TOTALASSET float ICB2 str8(EBITDAN EBITDAN_2 EBITDAN_3 EBITDAN_4)
    "FR0000120404" 2007 "10635000" 5 "1639000" "1053000" "1034000" "1084000"
    "FR0000120404" 2007 "10635000" 5 "1639000" "1053000" "1034000" "1084000"
    "FR0000051732" 2007 "4918100" 6 "376300" "529500" "393400" "311700"
    "FR0000130213" 2007 "12613000" 5 "1077000" "1349000" "1337400" "1183200"
    "FR0011056027" 2007 "4317000" 1 "594000" "315000" "129000" "-474000"
    "FR0000064388" 2007 "10961" 5 "-2471" "-2255" "583" "-3593"
    "FR0000127771" 2007 "43657000" 5 "6650000" "6166000" "4901000" "3587000"
    "FR0000076861" 2008 "9702" 6 "618" "-31" "16" "NA"
    "FR0010478248" 2008 "298300" 3 "-2400" "24400" "21200" "31500"
    "FR0000060873" 2008 "880253" 3 "21835" "61710" "12627" "20167"
    "FR0010095596" 2008 "42367" 4 "-20845" "-13025" "-6876" "-4700"
    "FR0000061129" 2008 "445665" 4 "79333" "38001" "44635" "52339"
    "FR0000121725" 2008 "9744147" 2 "586202" "490277" "511093" "538145"
    "FR0000121857" 2008 "2232153" 3 "154168" "171671" "166094" "162758"
    "FR0000124414" 2008 "540052" 2 "31038" "41522" "-9202" "30699"
    "FR0000033888" 2008 "218729" 2 "15292" "15235" "14329" "23566"
    "FR0012612646" 2008 "979646" 5 "86909" "133228" "126265" "146888"
    "FR0004007813" 2008 "1306124" 3 "69039" "164137" "128254" "98424"
    "FR0000120537" 2008 "40427000" 2 "4526000" "3746000" "3310000" "2776000"
    "FR0000032278" 2008 "891263" 2 "26136" "51739" "45202" "39856"
    "FR0000053225" 2008 "1419200" 5 "272200" "305300" "315200" "320300"
    "FR0004048734" 2008 "6361" 3 "74" "1526" "-303" "-895"
    "FR0000052870" 2008 "1813828" 2 "223206" "168593" "158721" "135390"
    "FR0000121691" 2008 "695118" 5 "42240" "101421" "152945" "118076"
    "FR0000124570" 2008 "1940763" 3 "139980" "222564" "208243" "151474"
    "FR0004044600" 2008 "48666" 5 "5231" "5333" "6063" "4654"
    "FR0000131906" 2008 "63579000" 3 "4077000" "6366000" "6737000" "7036000"
    "FR0010451203" 2008 "9946800" 2 "669700" "588500" "360400" "312000"
    "FR0000073272" 2008 "16541000" 2 "496000" "735000" "5000" "293008"
    "FR0000120107" 2008 "2774491" 3 "188556" "264779" "206518" "251316"
    "FR0000121972" 2008 "23875000" 2 "3263000" "2485800" "1990100" "1637400"
    end
    tempfile cases
    // THE THREE FOLLOWING COMMANDS ARE PATCHES TO ERRORS IN THIS DATA SET
    // IN YOUR REAL DATA, DO NOT DO THESE: INSTEAD GO BACK AND FIX THE DATA MANAGEMENT
    // THAT CREATED THIS DATA SET
    rename YEAR year
    duplicates drop
    destring TOTALASSET, replace
    save `cases'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str21 Name str12 ISIN int year double TOTALASSET float ICB2 long(NETSALES OPERATINGINCOME CURRENTLIABILITIES PPE EBITDA) str4 SICCODE1
    "SOLVAY SA " "BE0003470755" 2006 10595000 1 9399000 1120000 3825000 3869000 1478000 "2821"
    "SOLVAY SA " "BE0003470755" 2007 10656000 1 9572000 1203000 4180000 3885000 1741000 "2821"
    "SOLVAY SA " "BE0003470755" 2008 11615000 1 9490000 893000 4513000 4218000 1201000 "2821"
    "SOLVAY SA " "BE0003470755" 2009 12059000 1 5694000 198000 7471000 3921000 . "2821"
    "SOLVAY SA " "BE0003470755" 2010 13428000 1 6796000 892000 8809000 3698000 653000 "2821"
    "SOLVAY SA " "BE0003470755" 2011 18657000 1 8109000 706000 7373000 5652000 973000 "2821"
    "SOLVAY SA " "BE0003470755" 2012 17782000 1 12831000 1036000 6728000 5393000 1916000 "2821"
    "SOLVAY SA " "BE0003470755" 2013 17931000 1 10367000 740000 7242000 4679000 1552000 "2821"
    "SOLVAY SA " "BE0003470755" 2014 17184000 1 10629000 978000 6365000 5386000 1245000 "2821"
    "SOLVAY SA " "BE0003470755" 2015 24270000 1 11047000 1025000 6613000 6946000 1559000 "2821"
    "SOLVAY SA " "BE0003470755" 2016 23255000 1 11403000 1063000 6597000 6472000 1838000 "2821"
    "SOLVAY SA " "BE0003470755" 2017 20375000 1 10891000 1117000 6057000 5433000 1804000 "2821"
    "SOLVAY SA " "BE0003470755" 2018 20877000 1 11299000 1142000 6574000 5454000 1844000 "2821"
    "SOLVAY SA " "BE0003470755" 2019 20238000 1 11227000 1129000 6272000 5919000 1247000 "2821"
    "SOLVAY SA " "BE0003470755" 2020 16004000 1 9714000 829000 4484000 5122000 1686000 "2821"
    "MONTEA SCA " "BE0003853703" 2006 . 2 . . . . . "6512"
    "MONTEA SCA " "BE0003853703" 2007 143812 2 11329 6405 . 136439 13640 "6512"
    "MONTEA SCA " "BE0003853703" 2008 224281 2 17548 9136 . 211015 -4162 "6512"
    "MONTEA SCA " "BE0003853703" 2009 216264 2 19186 10176 . 206490 -7314 "6512"
    "MONTEA SCA " "BE0003853703" 2010 258799 2 20530 11580 . 234447 10084 "6512"
    "MONTEA SCA " "BE0003853703" 2011 269482 2 23591 13117 . 253218 2511 "6512"
    "MONTEA SCA " "BE0003853703" 2012 307498 2 25353 13907 . 289983 -64 "6512"
    "MONTEA SCA " "BE0003853703" 2013 339797 2 24766 16725 . 320196 19489 "6512"
    "MONTEA SCA " "BE0003853703" 2014 453867 2 26995 17111 . 421660 11480 "6512"
    "MONTEA SCA " "BE0003853703" 2015 549685 2 34295 22762 . 517434 29934 "6512"
    "MONTEA SCA " "BE0003853703" 2016 594759 2 46809 35549 . 545234 40875 "6512"
    "MONTEA SCA " "BE0003853703" 2017 748426 2 41562 30088 . 719308 43814 "6512"
    "MONTEA SCA " "BE0003853703" 2018 949477 2 49886 37787 . 910022 70658 "6512"
    "MONTEA SCA " "BE0003853703" 2019 1193698 2 65497 50739 . 1160820 117420 "6512"
    "MONTEA SCA " "BE0003853703" 2020 1398921 2 69596 54607 . 1359665 169221 "6512"
    "CELYAD ONCO " "BE0974260896" 2006 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2007 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2008 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2009 . 4 . . . . . "8099"
    "CELYAD ONCO " "BE0974260896" 2010 17495 4 1516 -8711 6476 592 -5528 "8099"
    "CELYAD ONCO " "BE0974260896" 2011 13813 4 0 -12010 3650 355 -8377 "8099"
    "CELYAD ONCO " "BE0974260896" 2012 12485 4 54 -14994 2337 383 -12003 "8099"
    "CELYAD ONCO " "BE0974260896" 2013 32386 4 0 -13054 22602 243 -11035 "8099"
    "CELYAD ONCO " "BE0974260896" 2014 43976 4 146 -20850 32935 598 -15561 "8099"
    "CELYAD ONCO " "BE0974260896" 2015 159525 4 3 -29994 109419 1136 -28037 "8099"
    "CELYAD ONCO " "BE0974260896" 2016 138806 4 8523 -30591 85367 3563 -22040 "8099"
    "CELYAD ONCO " "BE0974260896" 2017 77626 4 3540 -29234 36394 3290 -55789 "8099"
    "CELYAD ONCO " "BE0974260896" 2018 94299 4 3115 -39066 51692 3014 -36575 "8099"
    "CELYAD ONCO " "BE0974260896" 2019 89836 4 6 -34451 42836 5061 -28085 "8099"
    "CELYAD ONCO " "BE0974260896" 2020 . 4 . . . . . "8099"
    "ALLIANCE DEVELOP " "BE0974269012" 2006 53769 2 2438 -571 11374 42324 1463 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2007 62416 2 2872 -4820 12344 50021 -2809 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2008 57572 2 2479 -1990 17561 39975 -105 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2009 42513 2 2115 -1143 8979 31199 1556 "6513"
    "ALLIANCE DEVELOP " "BE0974269012" 2010 50079 2 495 -1782 7349 30744 -1635 "6513"
    end
    
    //  PREPARE THE CONTROL DATASET FOR MATCHING
    ds ICB2 year, not
    rename (`r(varlist)') ctrl=
    tempfile controls
    save `controls'
    
    //  DO THE MATCHING
    use `cases'
    joinby ICB2 year using `controls', unmatched(master)
    gen delta = abs(TOTALASSET - ctrlTOTALASSET)
    by ISIN year (delta), sort: keep if _n == 1
    That will give you your matched pairs. You can then use the -ttest- command (-help ttest- if you are not familiar with how to use it) as needed, although you will probably have to destring some variables in the first data set in order to do that. You can't do t-tests on string variables.

    Comment


    • #3
      Tanks a lot it worked perfectly

      I would like to know if it possible to keep with the last command only the data where the variable NETSALES was different from 0
      I tried : "
      by ISIN year (delta), sort: keep if _n == 1 & ctrlNETSALES !=0 " But it didnit work Best regards
      Last edited by najiba etu; 06 May 2021, 08:45.

      Comment


      • #4
        Well, you aren't clear about what "didn't work" about that command. I will imagine that what it did was leave you with no match for any case in which the closest size match happened to be a firm with NETSALES == 0, whereas what you were hoping to do was to select the closest match from among those potential controls for which NETSALES != 0.

        If I have correctly guessed your intent, then the code would be this:

        Code:
        //  PREPARE THE CONTROL DATASET FOR MATCHING
        ds ICB2 year, not
        rename (`r(varlist)') ctrl=
        tempfile controls
        save `controls'
        
        //  DO THE MATCHING
        use `cases'
        joinby ICB2 year using `controls', unmatched(master)
        keep if ctrlNETSALES != 0
        gen delta = abs(TOTALASSET - ctrlTOTALASSET)
        by ISIN year (delta), sort: keep if _n == 1
        The only to the change is the addition of one command (shown in italics).

        At your earliest convenience, please read the Forum FAQ for excellent advice about asking questions in ways that maximize your chances of getting a helpful response. In particular, saying something "didn't work" isn't very helpful. There is only one way in which a solution will "work," but there are many ways in which it can not work, and those reading your post typically cannot guess which of the many failure modes transpired. So, going forward, never say "didn't work." Instead, always show what happened, and then, unless it is blatantly obvious, explain how what happened differs from what you wanted.

        Comment

        Working...
        X