Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Code debug; minor for replication

    Hi
    Each time I run the code below, I get very slightly different results in the coefficient for the time-series regression. The code merged two panels and then creates time series data.
    The replicability issue happens only when I merged with ib_data_prepped and create earnings_measure in two steps. With the same steps if earnings measure is based on alt_earn, no issues arise. It is very puzzling.

    Code:
    * Step 1: Prepare first data
    use data_ib.dta, clear
    
    rename PERMNO firm_id
    gen report_date = ANNDATS_ACT
    format report_date %td
    
    drop if fcast_actual == .
    drop if fcast_median == .
    drop if shares_out == .
    
    keep if report_date >= td(13dec1982) & report_date <= td(31mar2021)
    
    sort firm_id report_date
    save ib_data_prepped.dta, replace
    
    * Step 2: Merge base dataset with first data
    use base_data.dta, clear
    sort firm_id report_date
    merge 1:1 firm_id report_date using ib_data_prepped.dta
    drop if _merge == 2
    drop _merge
    
    keep if report_date >= td(13dec1982) & report_date <= td(31mar2021)
    
    merge m:1 report_date using trading_day_lookup.dta
    keep if _merge == 3
    drop _merge
    
    keep if is_trading_day == 1
    keep if shrtype == 10 | shrtype == 11
    keep if country_code == "USA"
    keep if currency_1 == "USD"
    keep if currency_2 == "USD"
    
    sort firm_code fiscal_quarter
    duplicates drop firm_code fiscal_quarter, force
    xtset firm_code fiscal_quarter, quarterly
    
    * Create new variables
    gen market_cap = share_price * shares
    drop if market_cap < 2
    drop if (report_date - fiscal_end) > 90
    
    gen l4_income = l4.income
    gen lag_assets = l.assets
    gen l4_market_cap = l4.market_cap
    
    drop if income == .
    drop if l4_income == .
    drop if l4_market_cap == .
    
    bys report_date: egen n_reporters = count(income)
    drop if n_reporters < 20
    
    * Step 3: new vars
    replace fcast_actual = round(fcast_actual, 0.01)
    replace fcast_median = round(fcast_median, 0.01)
    
    gen forecast_error = shares_out * (fcast_actual - fcast_median) / l4_market_cap
    gen forecast_error_rounded = round(forecast_error, 0.0001)
    
    gen valid_ferror = !missing(fcast_actual) & !missing(fcast_median) & !missing(shares_out) & !missing(l4_market_cap) & l4_market_cap > 0
    replace forecast_error_rounded = . if !valid_ferror
    
    gen alt_earn = (income - l4_income) / l4_market_cap
    gen tsg = cond(valid_ferror, forecast_error_rounded, alt_earn)
    
    * Step 4: Winsorization and aggregation
    pctrim tsg, p(1 99) by(year real_month) recode(bound) replace
    
    bys report_date: egen vw_tsg = wtmean(tsg), weight(lag_assets)
    
    save firm_daily_temp.dta, replace
    
    * Step 5: Collapse to one obs per day
    use firm_daily_temp.dta, clear
    gen vw_earnings_clean = round(vw_tsg, 0.0001)
    sort report_date vw_tsg_clean firm_code
    duplicates drop report_date vw_tsg_clean, force
    drop vw_earnings_measure
    rename vw_tsg_clean vw_tsg_measure
    
    rename fiscal_quarter fiscal_qtr
    keep report_date fiscal_qtr vw_* n_reporters is_trading_day
    tsset report_date
    
    keep if report_date >= td(13dec1982) & report_date <= td(31mar2021)
    replace vw_tsg_measure = 100 * vw_tsg_measure
    save daily_series_final.dta, replace
    
    * Step 6: Merge with market data
    use full_returns.dta, clear
    merge 1:1 report_date using daily_series_final.dta
    drop _merge
    tsset report_date
    save merged_returns_panel.dta, replace
    
    * Step 7: Merge with policy data
    use merged_returns_panel.dta, clear
    keep if report_date >= td(13dec1982) & report_date <= td(31mar2021)
    
    merge 1:1 report_date using policy_events_data.dta
    keep if _merge == 3
    drop _merge
    tsset report_date
    
    
    save final_panel_for_analysis.dta, replace
    
    * Step 8: Regressions
    use final_panel_for_analysis.dta, clear
    
    pctrim vw_tsg_measure if year >= 2002 & year <= 2024, p(1 99) recode(miss) replace
    keep if policy_phase <= 15 & year >= 2002& year <= 2024
    
    sort report_date
    gen time_index = _n
    tsset time_index
    
    
    
     xi: ivreg2 ret_excess vw_tsg_measure i.dow_group*i.week_group if condition_flag == 1, robust bw(auto) small
    
    
     xi: ivreg2 ret_excess vw_tsg_measure i.dow_group*i.week_group if condition_flag == 0, robust bw(auto) small
    Last edited by Mike Kraft; 20 Jul 2025, 22:23.

  • #2
    I count five sort and two bysort lines of code. For how many of those could you substitute isid ..., sort?

    There are also two duplicates drop ..., force lines of code. Could it be that at least one of them follows one or more indeterminate sort steps?

    Comment


    • #3
      I am so puzzled. I ran now bits and pieces of the code and the issues arises because of the merge below which seems to retain different number of obs each time. I am not sure how to deal with this

      Code:
       use base_data.dta, clear
      sort firm_id report_date
      merge 1:1 firm_id report_date using ib_data_prepped.dta
      drop if _merge == 2 drop _merge

      Comment


      • #4
        basically each time it matches different number of firms

        Comment

        Working...
        X