Dear Statalist users,
I am doing my master thesis on the effects of culture on Foreign Direct Investment (FDI). I am making use of panel data with a time span of 1995 - 2013 and 35 OECD members. I measure culture via three cultural dimensions (which are my explanatory variables): trust, individualism and hierarchy using data from World Value Surveys. These cultural dimensions are measured in percentages. I have a list control variables, including the time-invariant variables such as common border, distance, common language etc. My dependent variable is bilateral FDI flows, meaning I have two countries for 1 FDI value (host and source country). To specify the panel data I have generated a variable by grouping the the host and source countries using egen command and called it CountryI_J. So that my panel data is specified by xtset CountryI_J Year. Please note that CountryI_J variables has generated 1156 unique country-pair values.
After the feedback with my supervisor she has told me to make use of 2 models:
1) Panel data with two-dimensional clustering (Year, Country)
2) Fama Macbeth regression with Newey West standard errors using 3 lags.
I have questions mostly related to the 1) modelling. Firstly, I am still not entirely sure which model to use for the panel data when I need to cluster on 2 dimensions (Year, Country) and account for fixed year and country effects. So far I have excluded xtreg, fe model as I have 5 time-invariant control variables. After some research, I have landed on the model reghdfe , which allows two-dimensional (or more) clustering as well as accounting for fixed effects via function absorb which is identical to i.variable of interest [correct me if I am wrong]. Secondly, I am wondering whether I should cluster by the generated country-pair variable (CountryI_J) or cluster by the host (CountryI) and source (CountryJ) separately.
Lastly, I have also used institutional quality as a control variable which I have obtained by using first principal component of Woldwide Governance Indicators (WGI). The data for WGI is available for the years 1996-2014 except for 1997, 1999, 2001. When I wanted to run Fama Macbeth regression it says that Year is not regularly spaced because of the gaps in institutional quality measures. I tried to find in research papers on how they deal with gaps in years, but they simply report that data is available for 1996-2014. I am wondering how did they fill in the missing years, taking from the recent period or taking the average, what would be the most appropriate way in your opinion?
Please excuse me if I wasn't clear enough as it is my first time posting here as my deadline is approaching in 4 days and I am desperate for your help. Thank you in advance!
I am doing my master thesis on the effects of culture on Foreign Direct Investment (FDI). I am making use of panel data with a time span of 1995 - 2013 and 35 OECD members. I measure culture via three cultural dimensions (which are my explanatory variables): trust, individualism and hierarchy using data from World Value Surveys. These cultural dimensions are measured in percentages. I have a list control variables, including the time-invariant variables such as common border, distance, common language etc. My dependent variable is bilateral FDI flows, meaning I have two countries for 1 FDI value (host and source country). To specify the panel data I have generated a variable by grouping the the host and source countries using egen command and called it CountryI_J. So that my panel data is specified by xtset CountryI_J Year. Please note that CountryI_J variables has generated 1156 unique country-pair values.
After the feedback with my supervisor she has told me to make use of 2 models:
1) Panel data with two-dimensional clustering (Year, Country)
2) Fama Macbeth regression with Newey West standard errors using 3 lags.
I have questions mostly related to the 1) modelling. Firstly, I am still not entirely sure which model to use for the panel data when I need to cluster on 2 dimensions (Year, Country) and account for fixed year and country effects. So far I have excluded xtreg, fe model as I have 5 time-invariant control variables. After some research, I have landed on the model reghdfe , which allows two-dimensional (or more) clustering as well as accounting for fixed effects via function absorb which is identical to i.variable of interest [correct me if I am wrong]. Secondly, I am wondering whether I should cluster by the generated country-pair variable (CountryI_J) or cluster by the host (CountryI) and source (CountryJ) separately.
Lastly, I have also used institutional quality as a control variable which I have obtained by using first principal component of Woldwide Governance Indicators (WGI). The data for WGI is available for the years 1996-2014 except for 1997, 1999, 2001. When I wanted to run Fama Macbeth regression it says that Year is not regularly spaced because of the gaps in institutional quality measures. I tried to find in research papers on how they deal with gaps in years, but they simply report that data is available for 1996-2014. I am wondering how did they fill in the missing years, taking from the recent period or taking the average, what would be the most appropriate way in your opinion?
Please excuse me if I wasn't clear enough as it is my first time posting here as my deadline is approaching in 4 days and I am desperate for your help. Thank you in advance!
Comment