Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on calculating weighted variable by lagged variable

    Dear Statalists,

    Even though I went through many related discussions and read many papers about this topic, I couldn't find the solution. I will be thankful for your advice here. I am working with balanced panel data and running a regression with year and industry fe, testing whether Chinese exports to Brazil have displaced Argentina exports to Brazil during the 2001-2017 period.

    I am running two regressions, for the first one I use ln of China’s exports to Brazil (In_Chn_X_Bra) and for the second one I use ln of China’s exports to Brazil weighted by the lagged share of Chinese exports in Brazil imports (In_wChn_X_Bra) (The intuition for weighting export growth by lagged trade share is that China’s export growth will only matter if China is a significant supplier).

    I have run xtreg in the first one with no problem:

    Code:
    xtreg In_Arg_X_Bra In_Chn_X_Bra In_Bra_I_Wld i.year, fe
    Note:
    In_Arg_X_Bra: ln of exports of Argentina to Brazil at year t in a given sector h
    In_Chn_X_Bra: ln of exports from China to Brazil at year t in a given sector h
    In_Bra_I_Wld: ln of imports of Brazil from all countries (other than China and Argentina)


    But I am not sure how should I calculate (In_wChn_X_Bra) to run xtreg in the second one. This is what I've tried so far, but the result for the coefficient In_wChn_X_Bra is far from being significant (P = 0.548), while for the first regression it is (P = 0.000). I might be making some mistake here:

    Code:
    **1.Add new var to the panel (no commands needed here)
    **In_Bra_I_AWld: ln of imports of Brazil from all countries (including China and Argentina)
    
    **2.Generate new var:
    generate Chn_Share = Chn_X_Bra/Bra_I_Wld
    
    **3.Log that new var:
    gen In_Chn_Share=ln(Chn_Share+1)
    
    **3.Lag that new var:
    generate lagChn_Share = In_Chn_Share[_n-1]
    
    **4.Generate new var:
    generate In_wChn_X_Bra = In_Chn_X_Bra*lagChn_Share
    
    **6.Run new regression:
    xtreg In_Arg_X_Bra In_wChn_X_Bra In_Bra_I_Wld i.year, fe
    Should I change the sequence of these steps? For example, instead of generate and then log the var Chn_Share, divide the aleady logged values of In_Chn_X_Bra/In_Bra_I_Wld. Also, should I replace all missing values with 0 in any step? I have tried several different ways and would appreciate the right advice here.

    I am using Stata/SE 15.0.
    Thanks in advance for your cooperation and apologies if I made any mistake writing, it is my first time posting here and I have carefully read the rules.
    Kind regards,

    Giuliana.
    Last edited by Giuliana Moroni; 28 Jan 2019, 06:48.

  • #2
    I am not entirely clear on what you are trying to do here. But this does not look to me like the way to do a weighted regression. I think what you are looking for is
    Code:
    xtreg In_Arg_X_Bra In_Chn_X_Bra In_Bra_I_Wld i.year [aweight = lagChn_Share], fe
    Also, should I replace all missing values with 0 in any step?
    No. Missing values will appear in the first year of each panel because the lagged value is unknown. The missing value will cause that observation to be omitted from the analysis, which is exactly what should happen.

    If this is not helpful, when posting back, please provide some example data so that alternative code can be tested. Be sure to use -dataex- when doing so (avaiilable from SSC if you do not already have it).

    As an aside, the widely used ln(X+1) as a way of getting around the impossibility of calculating ln(X) when X = 0 is mathematically bogus. Your results may well depend on having used ln(X+1) instead of ln(X+1000) or ln(X+0.01) or ln(X+some other magic number). First, you may not need to actually transform these variables at all: since the variables on both sides probably exhibit the same wide-ranging scales, the residual distribution may be quite well behaved in any case. If that is not true, consider other ways of rescaling the variables such as cube root or inverse hyperbolic sine (-asinh()-).

    Comment


    • #3
      Hi Clyde, thank you very much for your reply. I have run -xtreg- with -aweight- as you suggested, but Stata gives me in return -weight must be constant within id- since my weights are not constant within panel. Here is an example:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 unit int year long(Arg_X_Bra Chn_X_Bra Bra_I_Wld Bra_I_AWld)
      "'0201" 2001 11158 0  31641  42799
      "'0201" 2002 12646 0  33565  46211
      "'0201" 2003 10005 0  37549  47554
      "'0201" 2004 13648 0  43308  56956
      "'0201" 2005 17192 0  43718  60910
      "'0201" 2006 21175 0  28916  50091
      "'0201" 2007 35209 0  42606  77815
      "'0201" 2008 33358 0  52447  85805
      "'0201" 2009 29673 0  40828  70501
      "'0201" 2010 31625 0  64277  95902
      "'0201" 2011 33183 0  84590 117773
      "'0201" 2012 31733 0 144279 176012
      "'0201" 2013 20573 0 143280 163853
      "'0201" 2014 24245 0 217445 241690
      "'0201" 2015 16522 0 136968 153490
      "'0201" 2016 18361 0 141411 159772
      "'0201" 2017 22049 0 128215 150264
      "'0202" 2001  8687 0   6508  15195
      "'0202" 2002  9952 0  12061  22013
      "'0202" 2003  7317 0   5101  12418
      "'0202" 2004  9365 0   5602  14967
      "'0202" 2005 13329 0   5426  18755
      "'0202" 2006 11140 0   4553  15693
      "'0202" 2007 11190 0   5691  16881
      "'0202" 2008 23936 0  11983  35919
      "'0202" 2009 25879 0  21841  47720
      "'0202" 2010 38459 0  26368  64827
      "'0202" 2011 62618 0  52091 114709
      "'0202" 2012 57317 0  59311 116628
      "'0202" 2013 48309 0  64540 112849
      "'0202" 2014 52072 0  94930 147002
      "'0202" 2015 33956 0  68983 102939
      "'0202" 2016 30206 0  54171  84377
      "'0202" 2017 44695 0  67593 112288
      "'0203" 2001     0 0     99     99
      "'0203" 2002     0 0     63     63
      "'0203" 2003     0 0     46     46
      "'0203" 2004     0 0     50     50
      "'0203" 2005     0 0     48     48
      "'0203" 2006     0 0    332    332
      "'0203" 2007     0 0    160    160
      "'0203" 2008     0 0     88     88
      "'0203" 2009     2 0     41     43
      "'0203" 2010     7 0     84     91
      "'0203" 2011     0 0     36     36
      "'0203" 2012     0 0     96     96
      "'0203" 2013     0 0    205    205
      "'0203" 2014     0 0    185    185
      "'0203" 2015     0 0    108    108
      "'0203" 2016     0 0     73     73
      "'0203" 2017     0 0    335    335
      "'0204" 2001     0 0   6470   6470
      "'0204" 2002   113 0   5558   5671
      "'0204" 2003   331 0   5774   6105
      "'0204" 2004    41 0   6031   6072
      "'0204" 2005     0 0  11059  11059
      "'0204" 2006     0 0  14928  14928
      "'0204" 2007   326 0  17158  17484
      "'0204" 2008   699 0  22737  23436
      "'0204" 2009   682 0  20788  21470
      "'0204" 2010  1239 0  33375  34614
      "'0204" 2011   496 0  33372  33868
      "'0204" 2012  1239 0  34014  35253
      "'0204" 2013  1127 0  44769  45896
      "'0204" 2014  2212 0  54625  56837
      "'0204" 2015  1461 0  44282  45743
      "'0204" 2016  1456 0  36088  37544
      "'0204" 2017  1925 0  38154  40079
      "'0205" 2001     0 0      0      0
      "'0205" 2002     0 0      0      0
      "'0205" 2003     0 0      0      0
      "'0205" 2004     0 0      0      0
      "'0205" 2005     0 0      0      0
      "'0205" 2006     0 0      0      0
      "'0205" 2007     0 0      0      0
      "'0205" 2008     0 0      0      0
      "'0205" 2009     0 0      0      0
      "'0205" 2010     0 0      0      0
      "'0205" 2011     0 0      0      0
      "'0205" 2012     0 0      0      0
      "'0205" 2013     0 0      0      0
      "'0205" 2014     0 0      0      0
      "'0205" 2015     0 0      0      0
      "'0205" 2016     0 0      0      0
      "'0205" 2017     0 0      0      0
      "'0206" 2001  1520 0    630   2150
      "'0206" 2002  2894 0    979   3873
      "'0206" 2003  3770 0   1014   4784
      "'0206" 2004  3533 0    803   4336
      "'0206" 2005  4083 0   1206   5289
      "'0206" 2006   345 0   1073   1418
      "'0206" 2007  1844 0    799   2643
      "'0206" 2008  3477 0    455   3932
      "'0206" 2009  5585 0    441   6026
      "'0206" 2010  5394 0    878   6272
      "'0206" 2011  9374 0   1122  10496
      "'0206" 2012  2836 0    669   3505
      "'0206" 2013  2062 0    983   3045
      "'0206" 2014  1227 0    460   1687
      "'0206" 2015  2104 0   1498   3602
      end
      format %ty year
      The commands I have run are, in order:

      Code:
      gen In_Arg_X_Bra=ln(Arg_X_Bra+1)
      gen In_Chn_X_Bra=ln(Chn_X_Bra+1)
      gen In_Bra_I_Wld=ln(Bra_I_Wld+1)
      
      encode unit, generate(unit1)
      tabulate unit1
      xtset unit1 year, yearly
      
      generate Chn_Share = Chn_X_Bra/Bra_I_AWld
      generate lagChn_Share = Chn_Share[_n-1]
      *(1,113 missing values generated)*
      
      xtreg In_Arg_X_Bra In_Chn_X_Bra In_Bra_I_Wld i.year [aweight = lagChn_Share], fe
      *weight must be constant within unit1*

      Regarding the widely used ln(X+1), I was suggested to do this by my master's supervisor. Otherwise, I was not able to run -xtreg y x1 x2 i.year, fe- since many observations were omitted because of collinearity. But I totally understand your point and appreciate your observation, I have suggested my supervisor run a second set of regressions rescaling the variables as cube root to see if there's any significant difference.
      Last edited by Giuliana Moroni; 03 Feb 2019, 22:31.

      Comment


      • #4
        I see. So, I think you need to emulate -xtreg, fe- with -regress-:

        Code:
        regress In_Arg_X_Bra In_Chn_X_Bra In_Bra_I_Wld i.year i.unit1 [aweight = lagChn_Share] i.unit1
        If you have a very large number of units in your data you may need to reset your matrix size to a large number in order for this to run.

        This kind of weighting is unusual, with the weight being itself closely related to one of the covariates. I have never seen this done before and I cannot even begin to grasp what this does to the standard errors, confidence intervals, ttests and p-values. But I would not trust any of those unless somebody who has a deeper understanding of it than I do says it's OK or advises you how to adjust them.

        Comment


        • #5
          I will check other alternatives.

          Thank you for your feedback, much appreciated.

          Comment

          Working...
          X