Comparison of two independent variables in multiple linear regression OLS

robert johnson

Join Date: Sep 2021

Posts: 3
#1

Comparison of two independent variables in multiple linear regression OLS

17 Sep 2021, 15:39

Hello All,

I'm working on a research project which will seek to compare the benefits of an additional year of military service vs. an additional year of education on individuals who have completed active duty military service and are now considered veterans. The question would be posed as "How does an additional year of military service affect later income (once a veteran and civilian) as compared to an additional year of education?" My econometrics class has dictated the use of multiple linear regression OLS as the analysis tool so that's mostly decided. The dataset I've compiled is also cross-sectional. I was hoping for some assistance with my thought on how to go about the model construction and econometric analysis. I'm looking for a ceteris paribus relationship of each, not just a connection. My thought was to construct one multiple linear regression model that substitutes in years of military service for education level, with other control variables entered into the model to attempt to remove as much bias on each as possible. However, in the model I can never hold either of the variables I want to compare (education or years of military service) constant because then I can't compare them. So my plan would be as follows:

Model: wage income = b0 + b1 (years of military service) OR (years of education) + b2 age + b3 sex + b4 race + b5 marriage status + potentially other controls

In the same model, I'd find the b1 for an additional year of military service and its effect on income, then do the same for education on income. I could then perform a comparison between the two as well as discussing results and issues.

My data would come from the US Census bureau 1990 census where I could generate a random sample of individuals and their years of military service (numeric from 0-20), years of education (numeric from 0-20), age (numeric), sex (binary 0 for M, 1 for F), race (I'd construct a set of dummy binary variables to regress onto this), marriage status (single or married 0 or 1), and other data if other valuable controls should be added. Cross-sectional, all data from 1 time period.

I'd plan on filtering the data to only include:
Veterans (nobody should be active duty or have never served)
Ages 25-65 (to allow for full potential educational attainment)
Employed persons only (nobody without a job)
Wage income > 0

Is this the correct approach to answer the question I'm targeting with the specified methods? Suggestions are appreciated if I have incorrect thought processes.

Thanks
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

17 Sep 2021, 17:23

I don't think this is a good approach. When you enter just education or just years of service into the model, without the other, you are overlooking the fact that these variables are probably not independent. Consequently, those regression you propose are getting biased (omitted variable bias) estimates of the effects of education and service, respectively. There is no entirely satisfactory way to go about this, but a simple solution that has some face validity is to do a single regression that contains both the education and service variables--that way the results you get for each are adjusted for the effects of the other. You will have, in effect, separated out their confounded effects. Then to see how different those effects are, after the regression you can use -lincom-. (-help lincom- if you are not familiar with the -lincom- command.)
2 likes
Comment
robert johnson

Join Date: Sep 2021

Posts: 3
#3

17 Sep 2021, 21:20

So this was originally my way of thinking as well. I had originally proposed a model of:

wage income = b0 + b1 military yrs + b2 educ + b3 age + b4 race + b5 sex + b6 marriage status

And my plan was to do as you described, to then compare the effects of each when regressing within the same model to control for the bias. However, when I was discussing with my professor, he said that if I proposed such a model, holding age and education constant within it, then I must be comparing an additional year of military service to something else (like another year of working), implying that I can't really use this model to answer the question. This frankly confused me a but because like you had said, the bias is an issue if I do it my originally proposed way above. Is there a reason why I wouldn't be able to compare b1 and b2 within this model to be able to speak about the differing effects of military service vs. education on income as my professor seemed to imply?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17713
#4

18 Sep 2021, 02:30

Robert:
as an aside to Clyde's excellent hints, there is, in my humble opinion, a more substantive issue hanging over your model specification: endogeneity.
Usually, when the regressand is whatever notion of income (gross; disposable) and education is included among the set of predictors, the researcher should keep in mind that a latent variable, such as individual ability, can impact both the regressand and education. Other things being equal, smarter people reach higher level of education and negotiate better incomes.
This issue is comprehensively covered in Chapter 6 of the valuable textbook https://www.stata.com/bookstore/micr...metrics-stata/ and in the wonderful slide presentation on IV regression given by Kit Baum some years ago (see slide 17):
http://repec.org/usug2007/baumUKSUG2007.pdf.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#5

18 Sep 2021, 11:06

robert johnson I have to say I don't grasp what your professor is getting at here. That said, Carlo makes an excellent point in #4 that is, I think, more important than any of the other things we have been considering here.
Comment

Announcement

Comparison of two independent variables in multiple linear regression OLS

Comment

Comment

Comment

Comment