Running Regressions based on terciles?

alex lee

Join Date: Apr 2018

Posts: 1
#1

Running Regressions based on terciles?

23 Apr 2018, 14:31

The Question I'm solving for: How does the proportion of immigrants in each US state impact the national market share of companies in an industry based in that state from 2002-2015? In other words, do companies that are located in states with a higher proportion of immigrants have a larger market share of their industry?

Description of my data: Panel Data at the state level
Column 1: ID– a company's unique ID code (integer)
Column 2: fyear– fiscal year (so a company's unique ID code is the same across years, years 2002-2015 for each company) (integer)
Column 3: state– state the company is located in (integer)
Column 4: sic– a two digit code that represents company's industry (integer)
Column 5: propImmigrants– that state population's proportion of immigrants in that year (float)
Column 6: marketShare– a company's market share in its industry in a specific year (float)

My Current Regression:
Right now, I first create fixed effects variables for each fyear, state and sic. Then, I run:

Code:

reg marketShare propImmigrants _I*

(this regresses a state's proportion of immigrants on the market share of a company, with the _I* representing all the fixed effects for fyear, state and sic)

MY QUESTION:
In my current regression, I'm just seeing the general impact of the proportion of immigrants on a firm's market share. However, I want to see if the proportion of immigrants are disproportionally impacting large firms or small firms. So, is there a way I can run the regression 3 times separately? Once for "small" firms, once for "medium" firms and once for "large" firms? I'm thinking the way I would define small/medium/large is how big the companys were in the initial year 2002. So, something like "only if a firm's market share is below the 33rd percentile for that industry in 2002, then run the above regression for all those firms from 2002-2015 (which I would assume you'd need to match the unique ID code somehow)". I would want to do the same for firms "with a market share between the 34th and 66th percentile in 2002" and firms "with a market share above the 66th percentile in 2002".

Thanks and let me know if something isn't clear enough.

Last edited by alex lee; 23 Apr 2018, 14:36.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#2

23 Apr 2018, 15:11

It sounds to me as if you are ignoring the nesting of observations in your data. Do you have observations for the same companies in different years? It sounds like you have yearly observations nested in companies, nested in turn in industries. If that is the case, then you are fitting a three-level data structure into a 1 level model.

There is also a fundamental conceptual problem with "see if the proportion of immigrants are disproportionally impacting large firms or small firms." If firm size were one of the predictor variables of the model this would be OK. But you are defining firm size by tercile of market share, which is your outcome variable. It is not valid to build a model that is conditional on the very outcome you are trying to predict. While you could, in fact, partition your data into terciles of market share and then do some things to estimate separate regression coefficients of marketShare on propImmigrants, if you were given data on a new company with unknown market share, your model would be useless, because you would not know which of the three regression equations to apply to it! So I think you need to rethink this question and either come up with a definition of size that is not defined from your outcome variable, or ask a different question.

Added: In this case there are some conceptual issues that need to be nailed down before progress can be made. But in the general case, if you want help with code, it is very important to provide example data. To show example data, please use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Last edited by Clyde Schechter; 23 Apr 2018, 15:14.
Comment

Announcement

Running Regressions based on terciles?

Comment