Hello everyone,

I am new to this forum and Stata and have a question regarding my research project.

I want to analyse a set of variables and their influence on the location decision of European multinational companies, i.e. find out if factors like labor costs and infrastructure in a country increase the likelihood of European firms to invest in this country.

I look at five different (target) countries, a big number of firms and a time span of 10 years (panel data set). I want to do the analysis using a count data regression. This means my dependent variable is a count variable counting the number of subsidiaries of firm i in country j in year t, which gives me a number of T*I*J observations. From related papers I know that a zero-inflated negative binomial regression model is most suitable. I also believe that I have understood the basic function of the negative binomial model and the reason for the zero-inflation. I was also able to create the count variables in Stata.

However, I don't know how to actually get the model to work from there on. From papers on similiar research as well as statistics books I find that it is not possible to use the zero-inflated negative binomial model on panel data. So I always read in these papers, that instead the panel structure is ignored and the data is pooled / pooled estimation techinques are used. I believe I understand what that means. However, many problems seem to arise from this procedure, like "correlation of standard errors". The solutions seem to be using year-fixed and/or country-fixed and/or firm-fixed effects as well as "clustering standard errors" --> unfortunately, this is where papers don't go into more detail and I have no idea what is meant by that.

I spent the last days reading statistics books and haven't really made any progress. That's why I am asking here for help. Could anyone please explain what I have to take care of when "pooling panel data" in my specific case (it is special, since I look at different countries AND different firms, right?). What kind of fixed-effects do I have to consider and why (and maybe already how can I do this in Stata)? What is clustering standard errors and why and how can I do this?

Thanks a lot in advance! Any advice, literature reference and explanation would be highly appreciated.

Best regards,

Anton

## Comment