Hi, new to stata!
I am trying to find out, after controlling for industry, country, and year, the effect that internet usage rates have had on exports, and I want to understand how this effect differs according to how technology-intensive the industry is. In particular I want to control for country-year year-industry and industry-country fixed effects.
I have export data for every country, over 5 years broken down by industry (99 industries) - and for each industry I also have a corresponding industry R&D intensity variable (1-4). I also have data on %internet users by country for each year.
sample: country_code 4:afghanistan. country_code 8:albania.
Again, trying to control for fixed country, time, and industry effects, and see w
YTCI =alphaCT + betaIT + gammaCI + (Dintensity * deltausers_CT)
where:
alphaCT is the term for country year fixed effects, which I generated using egen c_y = group (country_code year), label
betaIT is the term for industry year fixed effects, which I generated using egen i_y = group (industry_code year), label
gammaCI is the term for country industry fixed effects, which I generated using egen c_i = group (country_code industry_code), label
Dintensity is a dummy for industry intensity
deltausers_CT is country time investment in IT
I am trying to run it using reghdfe, and absorbing country-industry year-industry, etc. But is it a problem with too many degrees of freedom? I have generated the c_i2, y_i2 and y_c2 variables using
egen c_i2 = group(country_code industry_code)
then
reghdfe log_exports c.internet_users#1.intensity, absorb(c_i2 y_i2 y_c2)
the results of the regression are listed below. They don't make sense, as the coefficient values should all be positive. The issue if I am not mistaken is that absorbed c_y fixed effects runs

However, if I run the regression using
reghdfe log_exports c.internet_users#1.intensity, absorb(year country_code industry_code)
I get results that do make sense.

Question: can someone either show me a better way of including c-i, i-y and y-c fixed effects? or can someone comment on how much a difference running it with just fixed year, country and industry effects would make to the validity of the results?
Could it even be acceptable for me to run it with year-industry and then just country fixed effects [absorb(y_i2 country_code)] ?
Thanks so much in advance!
I am trying to find out, after controlling for industry, country, and year, the effect that internet usage rates have had on exports, and I want to understand how this effect differs according to how technology-intensive the industry is. In particular I want to control for country-year year-industry and industry-country fixed effects.
I have export data for every country, over 5 years broken down by industry (99 industries) - and for each industry I also have a corresponding industry R&D intensity variable (1-4). I also have data on %internet users by country for each year.
sample: country_code 4:afghanistan. country_code 8:albania.
year | country_code | industry_code | intensity | exports_usd | internet_users |
1998 | 4 | 19 | 2 | 209823 | .15 |
1998 | 4 | 20 | 4 | 23423 | .15 |
1998 | 4 | 21 | 3 | 988474 | .15 |
1998 | 4 | 22 | 2 | 3344 | .15 |
1998 | 4 | 23 | 1 | 134523 | .15 |
1998 | 8 | 19 | 2 | 46578435 | .22 |
1998 | 8 | 20 | 4 | 555675 | .22 |
1998 | 8 | 21 | 3 | 3837 | .22 |
1998 | 8 | 22 | 2 | 863522 | .22 |
1998 | 8 | 23 | 1 | 43355 | .22 |
2002 | 4 | 19 | 2 | 435246 | .18 |
2002 | 4 | 20 | 4 | 445554 | .18 |
YTCI =alphaCT + betaIT + gammaCI + (Dintensity * deltausers_CT)
where:
alphaCT is the term for country year fixed effects, which I generated using egen c_y = group (country_code year), label
betaIT is the term for industry year fixed effects, which I generated using egen i_y = group (industry_code year), label
gammaCI is the term for country industry fixed effects, which I generated using egen c_i = group (country_code industry_code), label
Dintensity is a dummy for industry intensity
deltausers_CT is country time investment in IT
I am trying to run it using reghdfe, and absorbing country-industry year-industry, etc. But is it a problem with too many degrees of freedom? I have generated the c_i2, y_i2 and y_c2 variables using
egen c_i2 = group(country_code industry_code)
then
reghdfe log_exports c.internet_users#1.intensity, absorb(c_i2 y_i2 y_c2)
the results of the regression are listed below. They don't make sense, as the coefficient values should all be positive. The issue if I am not mistaken is that absorbed c_y fixed effects runs
However, if I run the regression using
reghdfe log_exports c.internet_users#1.intensity, absorb(year country_code industry_code)
I get results that do make sense.
Question: can someone either show me a better way of including c-i, i-y and y-c fixed effects? or can someone comment on how much a difference running it with just fixed year, country and industry effects would make to the validity of the results?
Could it even be acceptable for me to run it with year-industry and then just country fixed effects [absorb(y_i2 country_code)] ?
Thanks so much in advance!
Comment