Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge annual data & daily data

    Hello Statalist members,



    I'm trying to use Compustat Global annual data and daily data to calculate a firm's market value on the last day of each fiscal year. The annual data has information on the fiscal year and last date. The ending date of a fiscal year can vary across firms. The daily data provides stock price and number of stocks on a daily basis.



    For example,

    Annual data
    *isin: Firm id
    *fyear: fiscal year
    *date: Ending date

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    
    clear
    
    input str8 isin str4 fyear str10 date
    
    "A1234567" "2001" "31dec2001"
    
    "A1234567" "2002" "31dec2002"
    
    "A1234567" "2003" "31dec2003"
    
    "B1234567" "2001" "30jun2001"
    
    "B1234567" "2002" "30jun2002"
    
    "B1234567" "2003" "30jun2003"
    
    end
    Daily data
    *sotckprice: Stock price
    *stockno: Number of stocks

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    
    clear
    
    input str8 isin str10 datadate float(stockprice stockno)
    
    "A1234567" "29dec2001" 10.2  10000
    
    "A1234567" "30dec2001" 10.1  10000
    
    "A1234567" "31dec2001" 10.3  10000
    
    "A1234567" "29dec2002"   20  11000
    
    "A1234567" "30dec2002" 20.5  11000
    
    "A1234567" "31dec2002"   21  11000
    
    "A1234567" "29dec2003"   15  11000
    
    "A1234567" "30dec2003" 15.2  11000
    
    "A1234567" "31dec2003"   15  11000
    
    "B1234567" "28jun2001"   20 100000
    
    "B1234567" "29jun2001" 20.1 100000
    
    "B1234567" "30jun2001" 20.2 100000
    
    "B1234567" "28jun2002" 25.5 100000
    
    "B1234567" "29jun2002" 25.6 100000
    
    "B1234567" "30jun2002"   26 100000
    
    "B1234567" "28jun2003"   30 110000
    
    "B1234567" "29jun2003" 30.5 110000
    
    "B1234567" "30jun2003"   31 110000
    
    end


    And I only want to have stock price & number of stocks on the last day of a fiscal year for each fiscal year. The data would look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    
    clear
    
    input str8 isin str4 fyear float(stockprice stockno)
    
    "A1234567" "2001" 10.3  10000
    
    "A1234567" "2002"   21  11000
    
    "A1234567" "2003"   15  11000
    
    "B1234567" "2001" 20.2 100000
    
    "B1234567" "2002"   26 100000
    
    "B1234567" "2003"   31 110000
    
    end


    I tried to merge the two datasets, but didn't work. Is there any other way to address this issue?

    Your suggestions and advice will be appreciated.



    Thanks,

    Anderson

  • #2
    We can't see your merge code to comment on why it didn't work but the most obvious reason is that daily dates have different variable names in the two datasets.

    Another issue that will arise sooner or later is that string dates are of limited use in Stata. This works for the data you give.

    Code:
    clear
    input str8 isin str4 fyear str10 date
    "A1234567" "2001" "31dec2001"
    "A1234567" "2002" "31dec2002"
    "A1234567" "2003" "31dec2003"
    "B1234567" "2001" "30jun2001"
    "B1234567" "2002" "30jun2002"
    "B1234567" "2003" "30jun2003"
    end 
    gen wanted = daily(date, "DMY")
    
    save tomerge 
    
    clear 
    
    clear
    
    input str8 isin str10 datadate float(stockprice stockno)
    "A1234567" "29dec2001" 10.2  10000
    "A1234567" "30dec2001" 10.1  10000
    "A1234567" "31dec2001" 10.3  10000
    "A1234567" "29dec2002"   20  11000
    "A1234567" "30dec2002" 20.5  11000
    "A1234567" "31dec2002"   21  11000
    "A1234567" "29dec2003"   15  11000
    "A1234567" "30dec2003" 15.2  11000
    "A1234567" "31dec2003"   15  11000
    "B1234567" "28jun2001"   20 100000
    "B1234567" "29jun2001" 20.1 100000
    "B1234567" "30jun2001" 20.2 100000
    "B1234567" "28jun2002" 25.5 100000
    "B1234567" "29jun2002" 25.6 100000
    "B1234567" "30jun2002"   26 100000
    "B1234567" "28jun2003"   30 110000
    "B1234567" "29jun2003" 30.5 110000
    "B1234567" "30jun2003"   31 110000
    end
    
    gen wanted = daily(datadate, "DMY")
    
    merge 1:1 isin wanted using tomerge 
    
    keep if _merge == 3
    Naturally wanted is just an emphatic variable name and you should use your own and apply a date display format too.

    Comment


    • #3
      Hello Nick,
      I do appreciate your help! Your code worked well for me. I never thought about creating a common time-related variable for the two datasets. Before my posting, I tried "merge m:1", thinking that information in the annual data could be used for multiple observations in the daily data. Also, I checked "rangejoin" command, but didn't work. Thanks again for your kind help!

      Best,
      Anderson

      Comment

      Working...
      X