Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Good morning everyone,

    I have an inquiry, please.

    I estimated the model that explains freight flow (Flow_ij) by using ppmlhdfe command. Thereafter, I predicted freight flow (pr_Flow_ij) by using predict command as follows:
    PHP Code:
    ppmlhdfe Flow_ij lnTEU_it GDP_jt lnDistance_ij d_Rail_ij d_Redsea_i lnBunkerRate_t lnFreightRate_tvce(cluster IDd(newvar1
    PHP Code:
    predict pr_Flow_ijmu 
    The freight flow originated from 4 ports (i) to 13 provinces (j) for the period 2006-2018. I noticed that the total freight flow (e.g. sum of predicted flows from the 4 ports to A province in a specific year) varies considerably from the actual one. In other words, non of the estimates summed to the actual province totals. as shown in the example below.

    actual flow
    year Port A Port B Port C Port D total annual
    2001 563,400 100 0 563,500
    2002 672,500 0 0 672,500
    2003 913,400 0 0 913,400
    2004 926,200 200 0 926,400
    2005 940,700 0 0 940,700
    2006 1,003,100 0 0 1,003,100
    2007 1,045,300 0 0 1,045,300
    2008 1,076,800 0 0 1,076,800
    2009 1,075,700 34,200 0 0 1,109,900
    2010 1,269,900 59,800 0 0 1,329,700
    2011 1,288,700 80,600 0 0 1,369,300
    2012 932,100 84,900 0 0 1,017,000
    2013 990,600 110,000 0 0 1,100,600
    13,068,200
    predicted flow
    year Port A Port B Port C Port D total annual
    2001 884,352 27,833 4,546 916,731
    2002 912,977 29,815 8,401 951,192
    2003 1,023,809 34,223 10,654 1,068,687
    2004 931,028 31,854 11,039 973,921
    2005 1,100,573 35,674 13,220 1,149,468
    2006 1,177,734 39,208 15,681 1,232,623
    2007 1,376,118 40,863 20,048 1,437,030
    2008 1,325,111 38,929 20,002 1,384,042
    2009 1,301,724 77,291 45,346 24,225 1,448,586
    2010 1,261,080 111,610 46,129 24,117 1,442,936
    2011 1,246,122 112,726 43,460 26,113 1,428,421
    2012 1,203,509 121,631 41,309 28,658 1,395,106
    2013 1,297,077 146,315 43,332 31,280 1,518,005
    16,346,749

    I am interested to have a similar or very close actual and predicted total flow.
    My question is: How to constrain the flow estimates to the total known (actual) value? I know that I can set a fixed effect for provinces. but if I do so, I won't be able to use the IV of (GDP_it) which explains province attribute.

    Thank you,
    Hussain Sulaimani

    Comment


    • #17
      Dear Tom Zylkin

      may I also ask you for some advice? I estimated the following gravity equation with ppmlhdfe:

      Code:
      local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest lngdp_o_naics2_4 lngdp_o_naics2_5 lngdp_o_naics2_6 lngdp_o_naics2_7 lngdp_o_naics2_8 lngdp_o_naics2_9 lngdp_o_naics2_11 lngdp_o_naics2_13 lngdp_o_naics2_14 lngdp_o_naics2_15 lngdp_o_naics2_16 lngdp_o_naics2_19 lngdp_o_naics2_2 lngdp_o_naics2_20  lngdp_o_naics2_21 lngdp_o_naics2_22 lngdp_o_naics2_23   lngdp_d_naics2_4  lngdp_d_naics2_5 lngdp_d_naics2_6 lngdp_d_naics2_7 lngdp_d_naics2_8  lngdp_d_naics2_9  lngdp_d_naics2_11  lngdp_d_naics2_13 lngdp_d_naics2_14 lngdp_d_naics2_15 lngdp_d_naics2_16  lngdp_d_naics2_19 lngdp_d_naics2_2 lngdp_d_naics2_20  lngdp_d_naics2_21 lngdp_d_naics2_22 lngdp_d_naics2_23   lndistw_naics2_4 lndistw_naics2_5 lndistw_naics2_6 lndistw_naics2_7 lndistw_naics2_8 lndistw_naics2_9 lndistw_naics2_11 lndistw_naics2_13 lndistw_naics2_14 lndistw_naics2_15 lndistw_naics2_16 lndistw_naics2_19     lndistw_naics2_2 lndistw_naics2_20   lndistw_naics2_21  lndistw_naics2_22 lndistw_naics2_23  lnsumgdp_naics2_4 lnsumgdp_naics2_5 lnsumgdp_naics2_6 lnsumgdp_naics2_7 lnsumgdp_naics2_8 lnsumgdp_naics2_9 lnsumgdp_naics2_11 lnsumgdp_naics2_13 lnsumgdp_naics2_14 lnsumgdp_naics2_15 lnsumgdp_naics2_16 lnsumgdp_naics2_19  lnsumgdp_naics2_2   lnsumgdp_naics2_20    lnsumgdp_naics2_21     lnsumgdp_naics2_22 lnsumgdp_naics2_23 lnsmp_dest_naics2_4 lnsmp_dest_naics2_5 lnsmp_dest_naics2_6 lnsmp_dest_naics2_7 lnsmp_dest_naics2_8 lnsmp_dest_naics2_9 lnsmp_dest_naics2_11 lnsmp_dest_naics2_13 lnsmp_dest_naics2_14 lnsmp_dest_naics2_15 lnsmp_dest_naics2_16 lnsmp_dest_naics2_19 lnsmp_dest_naics2_2  lnsmp_dest_naics2_20   lnsmp_dest_naics2_21    lnsmp_dest_naics2_22 lnsmp_dest_naics2_23
      ppmlhdfe OperatingrevenueTurnover `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode, savefe) d cluster(country_pair_encode)
      I want to plot the residuals against the predicted FDI values. Is my code for this correct? Especially, I'm unsure whether including "d" in the options of ppmlhdfe is sufficient to account for the fixed effects in the prediction.

      Code:
      predict fitted_values, mu
      gen residuals = fitted_values - OperatingrevenueTurnover
      twoway scatter residuals  fitted_values, ///
          yline(0, lcolor(red)) ///
          xlabel(, angle(45)) ///
          ylabel(, angle(0)) ///
          title("Residuals vs. Predicted Values") ///
          xtitle("Predicted Values") ///
          ytitle("Residuals")
      I would appreciate your advice a lot.

      Best,
      Noemi

      Comment


      • #18
        Originally posted by Noemi Seng View Post
        Dear Tom Zylkin

        may I also ask you for some advice? I estimated the following gravity equation with ppmlhdfe:

        Code:
        local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest lngdp_o_naics2_4 lngdp_o_naics2_5 lngdp_o_naics2_6 lngdp_o_naics2_7 lngdp_o_naics2_8 lngdp_o_naics2_9 lngdp_o_naics2_11 lngdp_o_naics2_13 lngdp_o_naics2_14 lngdp_o_naics2_15 lngdp_o_naics2_16 lngdp_o_naics2_19 lngdp_o_naics2_2 lngdp_o_naics2_20 lngdp_o_naics2_21 lngdp_o_naics2_22 lngdp_o_naics2_23 lngdp_d_naics2_4 lngdp_d_naics2_5 lngdp_d_naics2_6 lngdp_d_naics2_7 lngdp_d_naics2_8 lngdp_d_naics2_9 lngdp_d_naics2_11 lngdp_d_naics2_13 lngdp_d_naics2_14 lngdp_d_naics2_15 lngdp_d_naics2_16 lngdp_d_naics2_19 lngdp_d_naics2_2 lngdp_d_naics2_20 lngdp_d_naics2_21 lngdp_d_naics2_22 lngdp_d_naics2_23 lndistw_naics2_4 lndistw_naics2_5 lndistw_naics2_6 lndistw_naics2_7 lndistw_naics2_8 lndistw_naics2_9 lndistw_naics2_11 lndistw_naics2_13 lndistw_naics2_14 lndistw_naics2_15 lndistw_naics2_16 lndistw_naics2_19 lndistw_naics2_2 lndistw_naics2_20 lndistw_naics2_21 lndistw_naics2_22 lndistw_naics2_23 lnsumgdp_naics2_4 lnsumgdp_naics2_5 lnsumgdp_naics2_6 lnsumgdp_naics2_7 lnsumgdp_naics2_8 lnsumgdp_naics2_9 lnsumgdp_naics2_11 lnsumgdp_naics2_13 lnsumgdp_naics2_14 lnsumgdp_naics2_15 lnsumgdp_naics2_16 lnsumgdp_naics2_19 lnsumgdp_naics2_2 lnsumgdp_naics2_20 lnsumgdp_naics2_21 lnsumgdp_naics2_22 lnsumgdp_naics2_23 lnsmp_dest_naics2_4 lnsmp_dest_naics2_5 lnsmp_dest_naics2_6 lnsmp_dest_naics2_7 lnsmp_dest_naics2_8 lnsmp_dest_naics2_9 lnsmp_dest_naics2_11 lnsmp_dest_naics2_13 lnsmp_dest_naics2_14 lnsmp_dest_naics2_15 lnsmp_dest_naics2_16 lnsmp_dest_naics2_19 lnsmp_dest_naics2_2 lnsmp_dest_naics2_20 lnsmp_dest_naics2_21 lnsmp_dest_naics2_22 lnsmp_dest_naics2_23
        ppmlhdfe OperatingrevenueTurnover `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode, savefe) d cluster(country_pair_encode)
        I want to plot the residuals against the predicted FDI values. Is my code for this correct? Especially, I'm unsure whether including "d" in the options of ppmlhdfe is sufficient to account for the fixed effects in the prediction.

        Code:
        predict fitted_values, mu
        gen residuals = fitted_values - OperatingrevenueTurnover
        twoway scatter residuals fitted_values, ///
        yline(0, lcolor(red)) ///
        xlabel(, angle(45)) ///
        ylabel(, angle(0)) ///
        title("Residuals vs. Predicted Values") ///
        xtitle("Predicted Values") ///
        ytitle("Residuals")
        I would appreciate your advice a lot.

        Best,
        Noemi
        Hi Noemi,
        It looks like you are calculating the residuals correctly, except that I think it should be OperatingrevenueTurnover-fitted_values (=y-mu-hat). I can confirm that "predict, mu" does give you the predicted value inclusive of the fixed effects. It is necessary to include "d" to enable "predict, mu" , but it should not be necessary to include "savefe". Hope this helps.
        Regards,
        Tom

        Comment


        • #19
          Hi Tom,

          thank you very much for your answer. For quite a lot residuals, I get missing values as for the variable _ppmlhdfe_d, there are also missing values. Do you have any idea how this can happen?

          Best,
          Noemi

          Comment


          • #20
            Hi Tom Zylkin ,

            one addition: I have scanned through my data and I identified two cases in which _ppmlhdfe_d has a missing value. Alongside the year FE I have FE for each unique combination of origin country & sector as well as destination country & sector. The 2 cases in which ppmlhdfe_d is missing are (here explained for the destination country-sector FE):

            1) If there is only one observation per destination country-sector combination & the dependent variable (Operatingrevenue) takes on a 0 in this observation.

            2) It is also missing for those destination country-sector combinations, for which some of the regressors in my model (e.g. log of destination/origin-GDP or dummies for common language, colonial relationship etc.) have missing values.

            Can you tell me whether it makes sense that ppmlhdfe_d is missing in these 2 cases? Should I drop the observations manually for which ppmldfe_d is missing?

            I appreciate your advice.

            Best
            Noemi

            Comment

            Working...
            X