Panel data: clustering with countries and years

Alexander Sukhanov

Join Date: Jul 2018

Posts: 1
#1

Panel data: clustering with countries and years

20 Jul 2018, 12:13

Dear Statalist users,

I am doing my master thesis on the effects of culture on Foreign Direct Investment (FDI). I am making use of panel data with a time span of 1995 - 2013 and 35 OECD members. I measure culture via three cultural dimensions (which are my explanatory variables): trust, individualism and hierarchy using data from World Value Surveys. These cultural dimensions are measured in percentages. I have a list control variables, including the time-invariant variables such as common border, distance, common language etc. My dependent variable is bilateral FDI flows, meaning I have two countries for 1 FDI value (host and source country). To specify the panel data I have generated a variable by grouping the the host and source countries using egen command and called it CountryI_J. So that my panel data is specified by xtset CountryI_J Year. Please note that CountryI_J variables has generated 1156 unique country-pair values.

After the feedback with my supervisor she has told me to make use of 2 models:
1) Panel data with two-dimensional clustering (Year, Country)
2) Fama Macbeth regression with Newey West standard errors using 3 lags.

I have questions mostly related to the 1) modelling. Firstly, I am still not entirely sure which model to use for the panel data when I need to cluster on 2 dimensions (Year, Country) and account for fixed year and country effects. So far I have excluded xtreg, fe model as I have 5 time-invariant control variables. After some research, I have landed on the model reghdfe , which allows two-dimensional (or more) clustering as well as accounting for fixed effects via function absorb which is identical to i.variable of interest [correct me if I am wrong]. Secondly, I am wondering whether I should cluster by the generated country-pair variable (CountryI_J) or cluster by the host (CountryI) and source (CountryJ) separately.

Lastly, I have also used institutional quality as a control variable which I have obtained by using first principal component of Woldwide Governance Indicators (WGI). The data for WGI is available for the years 1996-2014 except for 1997, 1999, 2001. When I wanted to run Fama Macbeth regression it says that Year is not regularly spaced because of the gaps in institutional quality measures. I tried to find in research papers on how they deal with gaps in years, but they simply report that data is available for 1996-2014. I am wondering how did they fill in the missing years, taking from the recent period or taking the average, what would be the most appropriate way in your opinion?

Please excuse me if I wasn't clear enough as it is my first time posting here as my deadline is approaching in 4 days and I am desperate for your help. Thank you in advance!
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

26 Jul 2018, 12:47

You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output and sample data using dataex.

It is standard to use xtreg,fe and add i.year to allow for year effects. reghdfe should give you similar results. Whether to cluster by pair or by the two countries as separate dimensions is a substantive question. You'll be allowing different intercepts in the different approaches. I would guess that by pair is going to result in more panels which is probably more conservative than by host and source countries.

At least in some implementations, Fama McBeth makes some assumptions about a lack of serial correlation in the errors. It was originally used with stock returns where that lack of serial correlation was more plausible than it is in FDI. There are many ways to handle the missing data. Many on this list serve favor multiple imputation. Alternatively, you might just impute given the observed values. If you're just missing single years, I'd be tempted to average the previous and subsequent years.
Comment

Announcement

Panel data: clustering with countries and years

Comment