Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge Compustat Annual and CRSP Daily with a linking table

    Hello Statalist-Community, I have the following problem that I have been unable to solve for quite some time. For my thesis, I have Compustat Annual Data (funda.dta) and CRSP Daily Stock Data (CRSP.dta) , and I want to merge them using the linking table (CRSP_Compstat_Link.dta). I have already successfully performed a 1:1 merge with the Compustat Annual dataset and the Linking Table using the following command:

    use "funda.dta", replace
    merge 1:1 gvkey fyear using "CRSP_Compustat_Link.dta"
    drop if _merge==1
    drop if _merge==2
    drop _merge
    save "funda_link.dta", replace


    However, I have been unable to find adequate help online for correctly merging the CRSP Daily dataset with the Annual dataset. Perhaps it's so simple with a provided Linking Table, that I couldn't find Stata instructions online. Everything I found was in a different programming language, or I couldn't relate the presented problem to my situation. This is just a small excerpt of the variables from the datasets. But from what I've read so far over the last days, I believe these are necessary ones for the merge. If anything is missing, I'd be happy to provide it.
    What I have understood so far is that I need an M:1 merge and that the Compustat data represents fiscal years, while the CRSP (stock price data) is on a calendar basis, and they need to be aligned.
    Therefore, I hope to receive your solution here.


    Compustat Annual Data (funda.dta):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 gvkey long datadate double fyear
    "001004" 15126 2000
    "001004" 15491 2001
    "001004" 16222 2003
    "001004" 16587 2004
    "001004" 16952 2005
    "001004" 18778 2010
    "001004" 19144 2011
    "001004" 19509 2012
    "001004" 19874 2013
    "001004" 20239 2014
    "001004" 20605 2015
    "001004" 20970 2016
    "001013" 14914 2000
    "001013" 15279 2001
    "001013" 15644 2002
    "001013" 16009 2003
    "001013" 16375 2004
    "001013" 16740 2005
    "001013" 17105 2006
    "001013" 17470 2007
    "001013" 17836 2008
    "001013" 18170 2009
    "001013" 18535 2010
    "001038" 15065 2000
    "001038" 15430 2001
    "001038" 15795 2002
    "001038" 16161 2003
    "001045" 19723 2013
    "001045" 20088 2014
    "001045" 20453 2015
    "001045" 20819 2016
    "001045" 21184 2017
    "001050" 14975 2000
    "001050" 15340 2001
    "001050" 15705 2002
    "001050" 16070 2003
    "001050" 16436 2004
    "001050" 16801 2005
    "001050" 17166 2006
    "001050" 17531 2007
    "001050" 17897 2008
    "001050" 18262 2009
    "001050" 18627 2010
    "001050" 18992 2011
    "001050" 19358 2012
    "001050" 19723 2013
    "001050" 20088 2014
    "001050" 20453 2015
    "001050" 20819 2016
    "001050" 21184 2017
    "001056" 14791 2000
    "001056" 15156 2001
    "001056" 15521 2002
    "001056" 15886 2003
    "001056" 16252 2004
    "001056" 16617 2005
    "001056" 16982 2006
    "001072" 15065 2000
    "001072" 15430 2001
    "001072" 15795 2002
    "001072" 16161 2003
    "001072" 16526 2004
    "001072" 16891 2005
    "001072" 17256 2006
    "001072" 17622 2007
    "001072" 17987 2008
    "001072" 18352 2009
    "001072" 18717 2010
    "001072" 19083 2011
    "001072" 19448 2012
    "001072" 19813 2013
    "001072" 20178 2014
    "001072" 20544 2015
    "001072" 20909 2016
    "001078" 14975 2000
    "001078" 15340 2001
    "001078" 15705 2002
    "001078" 16070 2003
    "001078" 16436 2004
    "001078" 16801 2005
    "001078" 17166 2006
    "001078" 17531 2007
    "001078" 17897 2008
    "001078" 18262 2009
    "001078" 18627 2010
    "001078" 18992 2011
    "001078" 19358 2012
    "001078" 19723 2013
    "001078" 20088 2014
    "001078" 20453 2015
    "001078" 20819 2016
    "001078" 21184 2017
    "001082" 15095 2000
    "001082" 15460 2001
    "001082" 15825 2002
    "001082" 16191 2003
    "001082" 16556 2004
    "001082" 16921 2005
    "001082" 17286 2006
    "001082" 17652 2007
    end
    format %td datadate
    Linking Table (CRSP_Compustat_Link.dta):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 gvkey int(datadate fyear) long GVKEY1 int(endfyr begfyr) str1 linkprim str3 liid str2 linktype long(lpermno lpermco) byte USEDFLAG int(linkdt linkenddt)
    "001004" 13300 1995 1004 13300 12935 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 13665 1996 1004 13665 13301 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 14030 1997 1004 14030 13666 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 14395 1998 1004 14395 14031 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 14761 1999 1004 14761 14396 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 15126 2000 1004 15126 14762 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 15491 2001 1004 15491 15127 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 15856 2002 1004 15856 15492 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 16222 2003 1004 16222 15857 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 16587 2004 1004 16587 16223 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 16952 2005 1004 16952 16588 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 17317 2006 1004 17317 16953 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 17683 2007 1004 17683 17318 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 18048 2008 1004 18048 17684 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 18413 2009 1004 18413 18049 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 18778 2010 1004 18778 18414 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 19144 2011 1004 19144 18779 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 19509 2012 1004 19509 19145 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 19874 2013 1004 19874 19510 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 20239 2014 1004 20239 19875 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 20605 2015 1004 20605 20240 "P" "01" "LU" 54594 20000 1  4497     .
    "001004" 20970 2016 1004 20970 20606 "P" "01" "LU" 54594 20000 1  4497     .
    "001013" 13087 1995 1013 13087 12723 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 13453 1996 1013 13453 13088 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 13818 1997 1013 13818 13454 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 14183 1998 1013 14183 13819 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 14548 1999 1013 14548 14184 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 14914 2000 1013 14914 14549 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 15279 2001 1013 15279 14915 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 15644 2002 1013 15644 15280 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 16009 2003 1013 16009 15645 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 16375 2004 1013 16375 16010 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 16740 2005 1013 16740 16376 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 17105 2006 1013 17105 16741 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 17470 2007 1013 17470 17106 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 17836 2008 1013 17836 17471 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 18170 2009 1013 18170 17806 "P" "01" "LU" 50906  2902 1  7014 18627
    "001013" 18535 2010 1013 18535 18171 "P" "01" "LU" 50906  2902 1  7014 18627
    "001021" 12964 1995 1021 12964 12600 "P" "01" "LU" 10218   109 1  7592 14322
    "001021" 13330 1996 1021 13330 12965 "P" "01" "LU" 10218   109 1  7592 14322
    "001021" 13695 1997 1021 13695 13331 "P" "01" "LU" 10218   109 1  7592 14322
    "001021" 14060 1998 1021 14060 13696 "P" "01" "LU" 10218   109 1  7592 14322
    "001034" 13148 1995 1034 13148 12784 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 13514 1996 1034 13514 13149 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 13879 1997 1034 13879 13515 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 14244 1998 1034 14244 13880 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 14609 1999 1034 14609 14245 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 14975 2000 1034 14975 14610 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 15340 2001 1034 15340 14976 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 15705 2002 1034 15705 15341 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 16070 2003 1034 16070 15706 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 16436 2004 1034 16436 16071 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 16801 2005 1034 16801 16437 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 17166 2006 1034 17166 16802 "P" "01" "LC" 65832 20007 1  8811 17895
    "001034" 17531 2007 1034 17531 17167 "P" "01" "LC" 65832 20007 1  8811 17895
    "001036" 13148 1995 1036 13148 12784 "C" "01" "LC" 65453 20008 1  8740 15126
    "001036" 13514 1996 1036 13514 13149 "C" "01" "LC" 65453 20008 1  8740 15126
    "001036" 13879 1997 1036 13879 13515 "C" "01" "LC" 65453 20008 1  8740 15126
    "001036" 14244 1998 1036 14244 13880 "C" "01" "LC" 65453 20008 1  8740 15126
    "001036" 14609 1999 1036 14609 14245 "C" "01" "LC" 65453 20008 1  8740 15126
    "001036" 14975 2000 1036 14975 14610 "C" "01" "LC" 65453 20008 1  8740 15126
    "001038" 13239 1995 1038 13239 12874 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 13604 1996 1038 13604 13240 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 13969 1997 1038 13969 13605 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 14334 1998 1038 14334 13970 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 14700 1999 1038 14700 14335 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 15065 2000 1038 15065 14701 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 15430 2001 1038 15430 15066 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 15795 2002 1038 15795 15431 "P" "01" "LU" 66413  6301 1  8630 16436
    "001038" 16161 2003 1038 16161 15796 "P" "01" "LU" 66413  6301 1  8630 16436
    "001043" 12995 1995 1043 12995 12631 "C" "01" "LC" 80071 30061 1 12393 14636
    "001043" 13361 1996 1043 13361 12996 "C" "01" "LC" 80071 30061 1 12393 14636
    "001043" 13726 1997 1043 13726 13362 "C" "01" "LC" 80071 30061 1 12393 14636
    "001043" 14091 1998 1043 14091 13727 "C" "01" "LC" 80071 30061 1 12393 14636
    "001043" 14456 1999 1043 14456 14092 "C" "01" "LC" 80071 30061 1 12393 14636
    "001045" 19723 2013 1045 19723 19359 "P" "04" "LC" 21020 20010 1 19701     .
    "001045" 20088 2014 1045 20088 19724 "P" "04" "LC" 21020 20010 1 19701     .
    "001045" 20453 2015 1045 20453 20089 "P" "04" "LC" 21020 20010 1 19701     .
    "001045" 20819 2016 1045 20819 20454 "P" "04" "LC" 21020 20010 1 19701     .
    "001045" 21184 2017 1045 21184 20820 "P" "04" "LC" 21020 20010 1 19701     .
    "001045" 13148 1995 1045 13148 12784 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 13514 1996 1045 13514 13149 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 13879 1997 1045 13879 13515 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 14244 1998 1045 14244 13880 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 14609 1999 1045 14609 14245 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 14975 2000 1045 14975 14610 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 15340 2001 1045 15340 14976 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 15705 2002 1045 15705 15341 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 16070 2003 1045 16070 15706 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 16436 2004 1045 16436 16071 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 16801 2005 1045 16801 16437 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 17166 2006 1045 17166 16802 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 17531 2007 1045 17531 17167 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 17897 2008 1045 17897 17532 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 18262 2009 1045 18262 17898 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 18627 2010 1045 18627 18263 "P" "01" "LC" 21020 20010 1   761 18996
    "001045" 18992 2011 1045 18992 18628 "P" "01" "LC" 21020 20010 1   761 18996
    "001050" 13148 1995 1050 13148 12784 "P" "01" "LC" 11499   176 1  7637     .
    "001050" 13514 1996 1050 13514 13149 "P" "01" "LC" 11499   176 1  7637     .
    "001050" 13879 1997 1050 13879 13515 "P" "01" "LC" 11499   176 1  7637     .
    end
    format %td endfyr
    format %td begfyr
    CRSP Daily Stock Data (CRSP.dta):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long PERMNO int date long PERMCO
    10012 14612 7969
    10012 14613 7969
    10012 14614 7969
    10012 14615 7969
    10012 14616 7969
    10012 14619 7969
    10012 14620 7969
    10012 14621 7969
    10012 14622 7969
    10012 14623 7969
    10012 14627 7969
    10012 14628 7969
    10012 14629 7969
    10012 14630 7969
    10012 14633 7969
    10012 14634 7969
    10012 14635 7969
    10012 14636 7969
    10012 14637 7969
    10012 14640 7969
    10012 14641 7969
    10012 14642 7969
    10012 14643 7969
    10012 14644 7969
    10012 14647 7969
    10012 14648 7969
    10012 14649 7969
    10012 14650 7969
    10012 14651 7969
    10012 14654 7969
    10012 14655 7969
    10012 14656 7969
    10012 14657 7969
    10012 14658 7969
    10012 14662 7969
    10012 14663 7969
    10012 14664 7969
    10012 14665 7969
    10012 14668 7969
    10012 14669 7969
    10012 14670 7969
    10012 14671 7969
    10012 14672 7969
    10012 14675 7969
    10012 14676 7969
    10012 14677 7969
    10012 14678 7969
    10012 14679 7969
    10012 14682 7969
    10012 14683 7969
    10012 14684 7969
    10012 14685 7969
    10012 14686 7969
    10012 14689 7969
    10012 14690 7969
    10012 14691 7969
    10012 14692 7969
    10012 14693 7969
    10012 14696 7969
    10012 14697 7969
    10012 14698 7969
    10012 14699 7969
    10012 14700 7969
    10012 14703 7969
    10012 14704 7969
    10012 14705 7969
    10012 14706 7969
    10012 14707 7969
    10012 14710 7969
    10012 14711 7969
    10012 14712 7969
    10012 14713 7969
    10012 14714 7969
    10012 14717 7969
    10012 14718 7969
    10012 14719 7969
    10012 14720 7969
    10012 14724 7969
    10012 14725 7969
    10012 14726 7969
    10012 14727 7969
    10012 14728 7969
    10012 14731 7969
    10012 14732 7969
    10012 14733 7969
    10012 14734 7969
    10012 14735 7969
    10012 14738 7969
    10012 14739 7969
    10012 14740 7969
    10012 14741 7969
    10012 14742 7969
    10012 14745 7969
    10012 14746 7969
    10012 14747 7969
    10012 14748 7969
    10012 14749 7969
    10012 14752 7969
    10012 14753 7969
    10012 14754 7969
    end
    format %td date

  • #2
    I suspect it is obvious to people who work with these data sets regularly, but I cannot discern exactly how you want to pair up the observations in CRSP.dta with those in funda_link.dta. The data in CRSP are identified by variables date, PERMNO, and PERMCO. But there are no variables named PERMNO and PERMCO in funda_link. The closest match to those variable names are lpermno and lpermco. So I guess you waant PERMNO to match with lpermno and PERMCO to match with lpermco. But do they both have to match, or is just one sufficient?

    Also, you say you want to align the dates betweern the data sets. But funda_link has a large number of dates, and I can't guess which ones to use there. I take it that you want the pairing of observations between these data sets to have the value of date in CRSP.dta fall: is it between begfyr and endfyr or is it between linkdt and linkenddt? Or does datadate play a role in this somehow? (And, at least in the example data provided, datadate seems completely redundant, as it always equals endfyr.

    Anyway, if I knew what had to match with what and what variables define the range within which the date match needed to fall, this could be solved very easily using the -rangejoin- command, written by Robert Picard, and available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC. Perhaps if you do that and read -help rangejoin- you will be able to figure out how to use it. Just bear in mind that for this task, funda_link must be the "master" data set and CRSP.dta must be the "using" data set for the -rangejoin- to work correctly.

    If you need additional assistance, I'd be happy to give you more specific advice if you can resolve the ambiguities in your question for me.

    Comment


    • #3
      Hey Clyde, thanks a lot for your quick help!!
      For someone working with financial data, this is probably a piece of cake. My thought on this matter is indeed your first suggestion, but here I am unsure if this is an adequate approach to align the two datasets.

      My idea was to rename PERMNO in the CRSP dataset to lpermno to identify the company and take the Date variable as the second merge variable. By that, I mean that in the CRSP dataset, I have daily stock data (variable: date) encoded as day.month.year. In the LinkTable dataset, I have endfyr and begfyr, which are encoded as day.month.year. Now, my thought was also to say that if the CRSP variable date falls within the period of LinkTable endfyr & begfyr, it should be assigned. In theory, this should work for the matching, shouldn't it?

      I am also having difficulties with linkdt (First Effective Date of Link) and linkenddt (Last Effective Date of Link). I'm not sure if they are still necessary, but I believe I understood that they indicate the period during which the linking process is valid. Can this also be taken into account or do I have to address this afterward?

      The variable datadate, following our thought construct, is initially not needed, but I included it here for completeness. I have already installed Rangejoin and Rangestat but have never had to use them. I would need to familiarize myself with them first.

      As you can see, I have many question marks in my head and am not exactly sure how to align this data correctly without producing garbage. But it reassures me that your initial thought aligns with my consideration.

      Comment


      • #4
        OK, based on what you have explained in #3, the code you need is:
        Code:
        clear*
        use funda
        merge 1:1 gvkey fyear using CRSP_Compustat_Link, keep(match) nogenerate
        rename lpermno PERMNO
        
        rangejoin date begfyr endfyr using CRSP, by(PERMNO)
        Now, this gets us a pairing up of observations from fund.dta and CRSP.dta when the agree on PERMNO (PERMNO from fund.dta being ascertained by virtue of the linkage with CRSP_Compustat_Link.dta) and when date falls between begfyr and endfyr. If there are further requirements regarding linkdt and linkenddt, those can be imposed with an appropriate -keep- command, perhaps something like -keep if inrange(date, linkdt, linkenddt)-.

        Note that with the example data sets provided, the end result contains no actual linkages because the PERMNO's in the example CRSP data do not overlap at all with the PERMNO's in the example CRSP_Compustat_Link data that in turn match on gvkey with observations in the example funda.dta. Presumably this same problem will not occur with the full data sets.

        Comment


        • #5
          Thank you, Clyde, for your assistance again! I actually programmed something similar today and found your post from 2016 along with Robert's, addressing a similar issue. Thanks to you, I could better tackle it now. My code might not be the most elegant way, but I believe it works (my PC has been processing for quite some time) and its more or less the same rangejoin command that you gave me in #4.

          The tip about the "keep" command is excellent! I'll try it out to see how it affects the dataset. I noticed that you were also involved in the creating process of the rangejoin command when I was reading the help file. Therefore, I have a performance question.
          The CRSP dataset is only 6GB (compressed), but it has around 20 million observations, while the Funda dataset has about 50k observations. My Macbook has 8GB of RAM. Is there a trick to make this work without maxing out my memory / without the computer processing for hours? I was thinking of simplifying things by dividing the CRSP dataset into several parts based on fyear and running the rangejoin command on each of the subsets and append it afterwards. Something like:

          use "CRSP.dta", clear
          keep if fyear<=2005
          save "data1", replace

          use "CRSP.dta", clear
          keep if fyear>=2006 & fyear<=2011
          save "data2", replace
          .
          .
          .

          Is this a possible solution, or do you have another "tuning" trick that you can recommend? For example, sorting the dataset in a specific way in advance...
          As I'm waiting for my PC to process, is this even possible? From a merge command standpoint, I know that the variables that weren't matched are dropped. Since I haven't seen the results yet, does `rangejoin` do the same, or do I end up with a lot of duplicates?


          Comment


          • #6
            I was not involved in the development of -rangejoin-, nor with the closely related -rangestat-. I may be the most frequent user of both commands--they happen to fill a need that comes up very often in my work. And I did identify a bug in the original release of -rangestat-, which Robert Picard promptly fixed in the next release. But other than that, I am purely a consumer of these programs. You may be thinking of -runby- which Robert Picard and I co-authored. I had the original idea for how it would work, but Robert came up with a much better implementation, to which I added a few "finishing touches." -runby- is the only published Stata program I have contributed to.

            I think the best thing I can recommend for your speed problem is just to be patient. As for memory issues, the way -rangejoin- works, as I understand it, the memory that it grabs for the task comes early in the program--if you were going to encounter a memory problem, I think it would have happened shortly after the beginning. If it hasn't happened yet, I doubt it will. If you do hit a memory problem later, it probably means that the end result data set is too large for your memory. And in that case, you will encounter the same difficulty doing it in chunks when you try to put all the chunks together. I think the best you can do to prevent that is to -compress- all of the original data files before using them, and, before putting these data sets together, -drop- any variables that you will not need for the present purpose. If the end result data set is still too large with that, there is no way around it short of getting a computer with more RAM.

            I think it is unlikely that breaking up funda into chunks defined by fiscal year, processing them separately, and appending the results will speed things up. In fact, it is more likely to slow things down. So I would just let it keep running. Find something else to busy yourself with until it finishes. I know that finance is a high-pressure world where things change rapidly and speed is important. But in this case, I don't think that there is a faster way to code this--processor speed is probably the major limiting factor here, and if the time it takes is really too long for the end result to be useful, then you need to get a faster machine. But I doubt that is the case, as you say you've been stuck on this coding for a while, so I infer that the results are not needed urgently.

            When you run -rangejoin-, all of the observations in funda that found a match in the Compustat Link file will appear in the end results. Observations from CRSP that do not have a match there will not be included in the end results. In effect, -rangejoin- acts like -merge- with -keep(match master)-. If, for some reason, you need to include the unmatched observations from CRSP.dta, you can bring them in later by -merge-ing -rangejoin-'s output with CRSP.

            Comment

            Working...
            X