How does Stata know which variable is the main variable?

Jun Park

Join Date: Apr 2022

Posts: 48
#1

How does Stata know which variable is the main variable?

07 Apr 2023, 23:24

I'm starting to get a little confused about how R knows which variable is the main variable and which others are simply controls. For example, let's say there are three variables Y, X1, and X2, and I'm mainly interested in X1. I regress Y = constant + X1 + X2 + error. The function of X2 is to block any indirect effect that can confound the true relationship between Y and X1. However, the coefficients and standard errors are probably going to be the exact same if we run Y = constant + X2 + X1 + error. So does that mean the coefficient that Stata calculates for X1 and X2 are already taking into account that both could influence each other regardless of what the main variable is?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17743
#2

08 Apr 2023, 02:25

Jun:
Stata cannot read our mind.
Therefore, it treats each predictor like something plugged in the right-hand side of the regression equation.
In a multiple regression, the contribution of each predictor to variations in the regressand is adjusted (all in all, you do not control for anything, unless in randomized empirical studies and, possibly, via not that easy causal inference procedures) fro the other predictor.
The whole interpretation is up to us.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35795
#3

08 Apr 2023, 02:52

The question asks about how R knows which is the main variable. Presumably just a slip, and I am no kind of expert on R, but I believe the answer to be exactly the same as for Stata. In plain regression, there is no main variable except in the researcher's mind.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17743
#4

08 Apr 2023, 03:11

The very same question was (cross-)posted 6 days ago on https://stats.stackexchange.com/ques...le-of-interest.
As expected, the main answer was that R does not know either.
It seems to me that this topic is very similar to the Ramsey's test: it woud be great if a test could tell me "Carlo, you're missing that predictor" (and I hoped it was for real while taking my very first step in the OLS realm), but things do not work out this way.
Unfortunately, it's up to researchers to know the data generating process and take full responsibility for meodel specification and results interpretation.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#5

08 Apr 2023, 05:41

To be kind of dry about it, Stata knows only what we tell it. If I have two variables x and y, and I do

Code:

reg y x

Then because of under-the-hood ado details that I don't wanna get into, Stata knows the first thing after reg must be the outcome and anything after that the predictors. Beyond that, as others here have said, and as Alberto Abadie once said in a YouTube lecture that I can't find, "You can't talk to the computer you know and say *whispers* 'tell me the causal effect of the soda tax on consumption', you have to be diligent in data collection".

I found the way he whispered it to be quite humorous, but either way he's right! The software isn't us, it can't know what we want beyond what we tell it
1 like
Comment

Announcement

How does Stata know which variable is the main variable?

Comment

Comment

Comment

Comment