Industry-level data analysis

Michal Barinka

Join Date: May 2019

Posts: 6
#1

Industry-level data analysis

12 May 2019, 15:41

Dear Statalist community,

I'm new here, this is my first post and would like to kindly ask you for your help.

I've created a panel dataset in a long format. I have collected data for 10 years, in 36 countries by 9 industries. This means, I have 3240 country-industry-year observations (including missing values). I'm working on a study which examines the effect of institutional quality (IQ) and foreign direct investments (FDI) by industry and the moderating effect of unit labor costs (ULC).

My IV, institutional quality, has data available by country and varies across years. It has values between -2,5 to 2,5 and it's basically an "index".
My DV, foreign direct investment, has data available by country by industry and varies across years. Values are presented as absolute numbers (dollars).
My MV, unit labor costs, has data available by country by industry and varies across years. Values are presented as annual percentage growth/change.

I'd like to see what's the effect of IQ on FDI and it's moderation of ULC in all of 9 industries so I can describe the differences across industries. I'm not interested in describing the effect from a country-level perspective but I'd like to see what the effect by separate industries in those 36 countries is.

I'm expecting that IQ will have a positive effect on FDI and that ULC will moderate that relationship in a way that when it's growth is higher, the relationship weakens and when the growth is lower or negative, the relationship would be stronger. I'm interested in examining this effect and I'm struggling with finding how to run tests that I need to see the results. I'd like to see not only overall effect but I want to see effect by individual industries as well.

Here's a sample example of my dataset:

input str16 Country int Year str51 industry double(fdiperindustry iqaverage ulc) byte id_industry
"Denmark" 2008 "B_Mining" 10.236 1.7379027009010315 16.480524 1
"Denmark" 2008 "C_Manufacturing" 5397.156 1.7379027009010315 3.503547 2
"Denmark" 2008 "D_Electricity" -80.424 1.7379027009010315 16.480524 3
"Denmark" 2008 "F_Construction" -40.943 1.7379027009010315 1.165331 4
"Denmark" 2008 "G_Wholesale and retail trade" -639.002 1.7379027009010315 11.399856 5
"Denmark" 2008 "H_Transportation and storage" 842.255 1.7379027009010315 11.399856 6
"Denmark" 2008 "J_Information and communication" 538.107 1.7379027009010315 -7.490213 7
"Denmark" 2008 "K_Financial and insurance activities" -5326.968 1.7379027009010315 -6.895259 8
"Denmark" 2008 "M_Professional, scientific and technical activities" -134.527 1.7379027009010315 1.804702 9
"Estonia" 2008 "B_Mining" 2.44 1.071056495110194 35.034043 1
"Estonia" 2008 "C_Manufacturing" 13.288 1.071056495110194 9.124084 2
"Estonia" 2008 "D_Electricity" 54.738 1.071056495110194 35.034043 3
"Estonia" 2008 "F_Construction" -88.879 1.071056495110194 -1.08078 4
"Estonia" 2008 "G_Wholesale and retail trade" 66.889 1.071056495110194 23.727642 5
"Estonia" 2008 "H_Transportation and storage" 218.485 1.071056495110194 23.727642 6
"Estonia" 2008 "J_Information and communication" 69.262 1.071056495110194 8.235225 7
"Estonia" 2008 "K_Financial and insurance activities" 1402.091 1.071056495110194 -3.535888 8
"Estonia" 2008 "M_Professional, scientific and technical activities" -65.039 1.071056495110194 21.825679 9

Based on Hausman test, random effect model is more suitable.

Here are the commands I use:

Code:

egen pan_id = group (Country industry)

Code:

xtset pan_id Year

Code:

xtreg fdiperindustry iqaverage, re

Code:

xtreg fdiperindustry iqaverage i.id_industry

Code:

xtreg fdiperindustry c.iqaverage##c.ulc, re

Code:

xtreg fdiperindustry c.iqaverage##c.ulc i.id_industry

This is one of the results:

And moderation:

1) Would you please advise me on how to correctly interpret the effect per industry, both with and without a moderation?
Is there any guidance to see based on what I should decide about which industry shouldn't be coded as a dummy? I've read that usually it's the biggest sample but in my case, the number of values by industry is more or less the same (when taking into account missing values).

2) I'm struggling with interpreting the results. Looking at the picture above, the industries 2-9 are compared to industry 1. Is it possible to describe the effects of all the industries without comparing them to the other industry? By this I mean I would like to see the effect of IQ on FDI by industry, i.e. the effect of IQ on FDI in industry 1 is x, the effect of IQ on FDI in industry 2 is x. etc.

3) Should I run any other commands to see the desired relationship between IQ and FDI, including a moderator, by industry?

I hope I properly stated my issues and also provided you with clear explanation and examples. I will be happy to provide more information if needed.
Thank you very much in advance, I really appreciate your help.
Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

13 May 2019, 00:08

Michal:
I would say something different.
Looking at your model #1, interaction does not seem to add piece of information, as its 95% CI is pretty wide.
Probably, a model without interaction is the way to go.
For the future, please post shorter queries (as this will increase your chances of getting helpful replies) and use CODE delimiters (not screenshots) to share what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Michal Barinka

Join Date: May 2019

Posts: 6
#3

13 May 2019, 12:04

Dear Carlo,
thank you for your answer. I assume that by interaction you mean the moderation effect. By saying that it doesn't seem to add piece of information, you mean that ulc P value 0,711 and c.iqaverage##c.ulc P value 0,550 are too high? What if P value for ulc was e.g. 0,330 and for c.iqaverage##c.ulc 0,055? Would it indicate a better result?

When comparing industries, which one should I not code as a dummy?
Also, I understand that with dummies, the regression compares industries against the reference industry (in this case i1), but is it also possible to see individual effects by using a different command/method?

Many thanks in advance.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#4

14 May 2019, 01:37

Michal:
1) actually, I meant that the interaction is probably redundant in your model; conversely, I would keep -c.iqaverage- and -c.ulc- as predictors;
2) as you have laudably used -fvvarlist-notation, you can choose which industry set as reference category (see -ib#.- option);
3) the individual effect is already reported in the related coefficient (adjusted for the remaining predictors).

As an aside, I would test -via -xttest0- whether your regression modell is suitable for -xtreg,re-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Yasemin Ozbal

Join Date: Jun 2022
Posts: 2

20 Nov 2023, 10:30

Dear Statalist community, I have a similar question re: the question above. I cannot work with the dataset from home because the dataset is not open to public so I go and make research at statistical institute and they sent the regression results back to me.

I run an OLS regression where the dependent variable is the natural log. of labor productivity and used some ICT indicators and independent variables as well as industrial dummies. I have four dummy variables for manufacturing, construcion, service and trade sectors separately.

If I take manufacturing as the base, I ran this regression: regress loglaborprod ERP website share_internet_labor service trade construction

How can I interpret the following results? I cannot run the regression back home, this is the result I get back from the institute after they control it.

	(1)
VARIABLES	loglaborprod

ERP	0.440***
	(0.065)
website	0.316***
	(0.038)
share_internet_labor	0.002
	(0.001)
service	0.080
	(0.050)
trade	0.193***
	(0.044)
construction	-0.038
	(0.057)
Constant	9.579***
	(0.035)

Observations	5,418
R-squared	0.053
Robust standard errors in parentheses
* p<0.01, p<0.05, * p<0.1

Thank you and best regards,
Yasemin

Announcement

Industry-level data analysis

Comment

Comment

Comment

Comment