I have a problem where I have a data on firm decision making. A firm can make two decisions, X or Y, or both, or neither. (Both decisions are measured continuously, from 0 to 1). My hypothesis is that they will choose either X or Y instead of both or neither, because there is a reason for them to want at least one, but not both. Hence, X and Y should be somehow negatively correlated. In a simple visualization using an example dataset, assume X and Y are price and mpg of a car. The code
would give me the correlation coefficient, but I'm not sure this is sufficient to my needs. For example, I cannot include any other variable to control for. For example, only in long cars are mpg and price negatively correlated; so how do I include the length in the above code? My best guess is to divide the sample into cars with above-median length and below-median length, and determine the correlation in each sample. With more control variables, this becomes a little much though.
Alternatively, I considered
but the problem I see is this suggests that price has an effect on mpg, whereas the two variables are, so to say, both dependent and of "equal" standing, which is not well represented when one is a dependent and the other an independent variable in a regression.
A colleague suggested to make a regression with no intercept and standardized X and Y (mpg and price) variables. Would this help?
Code:
sysuse auto, clear corr mpg price
Alternatively, I considered
Code:
reg mpg price length
A colleague suggested to make a regression with no intercept and standardized X and Y (mpg and price) variables. Would this help?
Comment