Hello!
I am working on research that explores what factors contribute to differences in travel time (TT) between cars and transit across some of the most populated cities in the United States. For example, with an increase in population density, does the gap in TT between cars and transit increase or decrease? I planned to use linear regression to model this relationship. My independent variables include demographic and socioeconomic characteristics, and my dependent variable is the difference in TT between cars and transit.
Y = TT transit – TT car
Because I am using the difference, there are some areas where the TT of car is less than the transit (positive values) and some areas where the TT for car is greater than transit (negative values). From my understanding, a dependent variable with positive and negative values should be okay with a linear regression as the interpretation would be straightforward. If the coefficient is positive, it suggests that with an increase in X, the difference between TT transit and cars increases, and vice versa with negative coefficients.
However, the general increase and decrease do not distinguish between which areas transit or car TT become longer or shorter. For this relationship, would it be better to log transform the dependent variable or divide the data in half (one for positive values and one for negative values) and develop two different regression models?
I was leaning more towards dividing the data in half to show areas where TT may be longer or shorter, but I was not sure if this would be strange. Any input would be super helpful! Thank you so much for the help in advance!
I am working on research that explores what factors contribute to differences in travel time (TT) between cars and transit across some of the most populated cities in the United States. For example, with an increase in population density, does the gap in TT between cars and transit increase or decrease? I planned to use linear regression to model this relationship. My independent variables include demographic and socioeconomic characteristics, and my dependent variable is the difference in TT between cars and transit.
Y = TT transit – TT car
Because I am using the difference, there are some areas where the TT of car is less than the transit (positive values) and some areas where the TT for car is greater than transit (negative values). From my understanding, a dependent variable with positive and negative values should be okay with a linear regression as the interpretation would be straightforward. If the coefficient is positive, it suggests that with an increase in X, the difference between TT transit and cars increases, and vice versa with negative coefficients.
However, the general increase and decrease do not distinguish between which areas transit or car TT become longer or shorter. For this relationship, would it be better to log transform the dependent variable or divide the data in half (one for positive values and one for negative values) and develop two different regression models?
I was leaning more towards dividing the data in half to show areas where TT may be longer or shorter, but I was not sure if this would be strange. Any input would be super helpful! Thank you so much for the help in advance!
Comment