Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching on Assets and FF 48 Industry Groups

    Dear Statalist users,

    I'm am trying to match a sample within Stata, not yet sucessfully. I have a sample of single -and dual-class share firms that I'm trying to match. To be specific 372 single-class and 127 dual class-firms.
    The goal is to match a single single-class firm with a dual-class firm based on total assets and Fama-French 48 industry groups. The total assets should be in a range of 50%-150%.
    I have tried to match with the Stata extension called -kmatch-, however this extension doesn't seem to recognize that the the FF48 groups cannot be for instance 47.6, and when I tried the exact matching command, I'm not able to set the desired range in assets (50-150) leading to no matches.
    I was hoping someone might have a suggestion for a command or method to properly match my data.

    Kind regards,

    Marnik van der Gun

  • #2
    To clarify a some of my data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte Dual float AT byte FF48
    0        .  2
    1        .  2
    0        .  2
    0        .  2
    0        .  2
    1 2168.011  2
    0        .  2
    0 5081.191  2
    1        .  2
    1        .  2
    0        .  2
    0  112.462  2
    0        .  2
    1        .  2
    0        .  2
    1        .  4
    1        .  4
    1   34.132  4
    1        .  4
    1        .  4
    0        .  5
    0        .  5
    0        .  5
    0   285.02  5
    0        .  5
    0        .  6
    0 1736.171  6
    0        .  6
    0        .  6
    0        .  6
    1        .  7
    1        .  7
    0        .  7
    1        .  7
    0        .  7
    0        .  7
    1        .  7
    1        .  7
    0        .  7
    1        .  7
    1        .  7
    1 5046.724  7
    1 1485.696  7
    0        .  7
    1  699.177  7
    1        .  7
    1        .  7
    0 2582.273  7
    1        .  7
    0    23535  7
    0        .  7
    0        .  7
    0        .  7
    1        .  7
    1        .  7
    0        .  8
    0        .  8
    0        .  8
    0        .  8
    0 2910.386  8
    0        .  9
    1        .  9
    1        .  9
    1        .  9
    1        .  9
    1     6470  9
    1  917.691  9
    0        .  9
    1        .  9
    0  157.729  9
    0        .  9
    1        .  9
    1        .  9
    0        .  9
    1        .  9
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0 2106.684 11
    0        . 11
    0        . 11
    0  229.737 11
    0        . 11
    0 2177.944 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0        . 11
    0    58.04 11
    0        . 11
    0   265.24 11
    0        . 11
    end
    So I want to match the Dual-Class share dummy on Assets (within a 50% to 150% range) and an exact Fama French Industry 48 groups.

    Kind regards,

    Marnik van der Gun

    Comment


    • #3
      Code:
      drop if missing(AT, FF48, Dual)
      gen long obs_no = _n
      
      preserve
      keep if Dual == 1
      tempfile duals
      save `duals'
      restore
      keep if Dual == 0
      gen lower = AT*0.5
      gen upper = AT*1.5
      rangejoin AT lower upper using `duals', by(FF48)
      gen delta = abs(AT-AT_U)
      by obs_no (delta), sort: keep if _n == 1
      -rangejoin- is written by Robert Picard and is available from SSC. To use -rangejoin- you must also have -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.
      The above code will leave you with each Single matched to the Dual in the same FF48 group having the nearest AT (but always between 50% and 150% of the Single's). If there are two firms tied for closest AT, the tie is broken randomly and irreproducibily in this code.

      All of that said, I think this is going to be a painful experience for you. AT is missing in 80% of your observations! And you cannot include any such observation in the matching, for obvious reasons. Even in complete data, matching often leaves some entities without any available match. With so much missing data, I suspect you will end up with very few matches. In your example data, only one matched pair that satisfies your criteria can be found. If you are going to pursue this, I think you should go to great lengths to find the values of AT that are currently missing. If that's not possible, give some serious consideration to a different approach altogether.

      Also missing from your data is any sort of firm ID--so without that you can't tell which firm is matched with which. To work around that, I have create a variable obs_no to substitute for a firm ID. If you have a real firm ID variable in your actual data and just neglected to include it in your example, you should not create the obs_no variable and you should replace obs_no by the name of your real firmID in the final line of code.

      Comment


      • #4
        Dear Mr. Schechter,

        Thank you very much for your answer! To come back to your concerns, I seem to have posted some data with was sorted wrong, I do have most of the necessary ATs and I do also have a firm identifier.
        However, when I run your code, the end result is that only single-class firms remain in the data, there is no sign of any dual-class firm anymore. Do you by any chance know why?

        This is the code I ran:

        preserve
        keep if Dual == 1
        tempfile duals
        save `duals'
        restore
        keep if Dual == 0
        gen lower = AT*0.5
        gen upper = AT*1.5
        rangejoin AT lower upper using `duals', by(FF48)
        gen delta = abs(AT-AT_U)
        by CompanyName (delta), sort: keep if _n == 1

        Here some of the outcomes:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str28 CompanyName float AT byte(Dual FF48)
        "2U INC"                         113.039 0 33
        "A10 NETWORKS INC"                186.98 0 35
        "AAC HOLDINGS INC"                     . 0 11
        "AC IMMUNE SA"                   153.642 0 13
        "ACACIA COMMUNICATIONS INC"      516.936 0 36
        "ACCELERON PHARMA INC"           123.732 0 13
        "ACHAOGEN INC"                    70.322 0 13
        "ACLARIS THERAPEUTICS INC"        94.076 0 13
        "ACUSHNET HOLDINGS CORP"               . 0  6
        "ADAMAS PHARMACEUTICALS INC"     161.189 0 13
        "ADESTO TECHNOLOGIES CORP"             . 0 36
        "ADMA BIOLOGICS INC"                   . 0 13
        "AEGLEA BIOTHERAPEUTICS INC"      67.063 0 13
        "AERIE PHARMACEUTICALS INC"       70.458 0 13
        "AEROHIVE NETWORKS INC"                . 0 35
        "AFFIMED NV"                           . 0 13
        "AGILE THERAPEUTICS INC"               . 0 13
        "AGIOS PHARMACEUTICALS"          201.205 0 13
        "AIMMUNE THERAPEUTICS INC"       212.361 0 13
        "AIRGAIN INC"                          . 0 36
        "AKEBIA THERAPEUTICS INC"        110.995 0 13
        "ALARM.COM HOLDINGS INC"         226.095 0 34
        "ALDER BIOPHARMACEUTICALS INC"    64.359 0 13
        "ALDEYRA THERAPEUTICS INC"             . 0 13
        "ALLEGIANCE BANCSHARES INC"     2084.579 0 44
        "ALLISON TRANSMISSION HLDGS"        4866 0 23
        "ALLY FINANCIAL INC"              151828 0 44
        "AMBER ROAD INC"                 108.231 0 34
        "AMPHASTAR PHARMACEUTICLS INC"    389.37 0 13
        "ANTERO RESOURCES CORP"         6613.581 0 30
        "APPLIED GENETIC TECHNOLOGIES"    77.407 0 13
        "APPLIED OPTOELECTRONICS INC"          . 0 36
        "AQUA METALS INC"                 47.276 0 48
        "ARAMARK"                      10267.106 0 43
        "ARATANA THERAPEUTICS"           115.536 0 13
        "ARCADIA BIOSCIENCES INC"         74.242 0 13
        "ARDELYX INC"                    113.414 0 13
        "ARDMORE SHIPPING CORP"          357.966 0 40
        "ARISTA NETWORKS INC"            811.023 0 35
        "ASPEN AEROGELS INC"             145.043 0 15
        "ATARA BIOTHERAPEUTICS INC"      106.122 0 13
        "ATENTO SA"                     1657.901 0 34
        "ATKORE INC"                           . 0 22
        "ATOMERA INC"                          . 0 36
        "ATOSSA THERAPEUTICS INC"          6.826 0 13
        "ATYR PHARMA INC"                129.675 0 13
        "AVINGER INC"                          . 0 12
        "AXALTA COATING SYSTEMS LTD"      6252.8 0 14
        "AXON ENTERPRISE INC"                  . 0 26
        "AXSOME THERAPEUTICS INC"              . 0 13
        "BANKWELL FINANCIAL GROUP INC"  1099.531 0 44
        "BELLICUM PHARMACEUTICALS INC"   195.794 0 13
        "BENEFITFOCUS INC"               139.611 0 34
        "BERRY GLOBAL GROUP INC"               . 0 15
        "BIOCEPT INC"                          . 0 13
        "BLACKLINE INC"                  420.437 0 34
        "BLOOMIN' BRANDS INC"           3016.553 0 43
        "BLUE CAP REINSURANCE"                 . 0 45
        "BLUE HILLS BANCORP INC"        1728.148 0 44
        "BLUEBIRD BIO INC"                224.39 0 13
        "BLUEPRINT MEDICINES CORP"       178.898 0 13
        "BOISE CASCADE CO"              1104.186 0 41
        "BRIGHT HORIZONS FAMILY SOLTN"         . 0 33
        "BRIGHTCOVE INC"                  96.993 0 34
        "CAESARS ENTERTAINMENT CORP"     27998.1 0  7
        "CAESARSTONE LTD"                      . 0 17
        "CAFEPRESS INC"                  157.532 0 42
        "CALITHERA BIOSCIENCES INC"       104.77 0 13
        "CANCER GENETICS INC"                  . 0 11
        "CAPSTAR FINANCIAL HLDGS INC"   1333.675 0 44
        "CARA THERAPEUTICS INC"                . 0 13
        "CARE.COM INC"                   173.104 0 34
        "CARECLOUD INC"                   23.107 0 34
        "CAREDX INC"                      61.141 0 13
        "CATALENT INC"                    3090.2 0 13
        "CDW CORP"                             . 0 41
        "CEMPRA INC"                      70.738 0 13
        "CENTURY COMMUNITIES INC"        675.979 0 18
        "CHANNELADVISOR CORP"            148.786 0 34
        "CHEGG INC"                      327.371 0 34
        "CHEMOCENTRYX INC"               122.323 0 13
        "CHENIERE ENERGY PTNRS LP LLC"  8516.783 0 31
        "CHIASMA INC"                    153.108 0 13
        "CHIMERIX INC"                   113.387 0 13
        "CHUY'S HOLDINGS INC"            129.721 0 43
        "CIDARA THERAPEUTICS INC"        109.974 0 13
        "CITIZENS FINANCIAL GROUP INC"    132857 0 44
        "CIVITAS SOLUTIONS INC"                . 0 11
        "CLEARSIDE BIOMEDICAL INC"        84.813 0 13
        "CLEARSIGN TECHNOLOGIES CORP"          . 0 37
        "COHERUS BIOSCIENCES INC"        187.221 0 13
        "COLLEGIUM PHARMACEUTICAL INC"    97.718 0 13
        "COMMSCOPE HOLDING CO INC"             . 0 36
        "CONATUS PHARMACEUTICALS INC"     56.936 0 13
        "CONCERT PHARMACEUTICLS INC"      84.454 0 13
        "CONFORMIS INC"                        . 0 12
        "CONIFER HOLDINGS INC"                 . 0 45
        "CONNECTONE BANCORP INC"        1673.082 0 44
        "CONTINENTAL BUILDING PRODS"     700.981 0 17
        "CONTROL4 CORP"                        . 0 37
        end
        Kind regards,

        Marnik van der Gun

        Comment


        • #5
          The new data example you posted is not suitable because it contains only Dual = 0 observations. So there is nothing to match with there. That may perhaps be the problem you are encountering?

          The other explanation I can think of is that you are misunderstanding the output. When run successfully, as with the data example you gave in #2, the data left in memory has observations that represent matched pairs. So obs_no represents the obs_no of the single-share class firms, and, in the same observation, obs_no_U is the obs_no of the dual-share class firm to which it is matched. If you want to get them into separate observations, you can do this:
          Code:
          rename (Dual AT obs_no) =_single
          rename *_U *_dual
          gen long pair_num = _n
          drop lower upper delta
          reshape long Dual AT obs_no, i(pair_num) j(single_dual) string
          drop if missing(obs_no)




          Comment


          • #6
            You were right, I was misinterpreting the output, thanks for the clarification! The code for the separate observations worked very well, thanks!

            Comment

            Working...
            X