Hello everyone,
I have a doubt when interpreting the results obtained through OLS, when using a longitudinal dataset, that may be subtle, but I am afraid may be more impactful than I realize.
The dataset I have has information about a set of industries, for the years 2008-2019. The variables I am for now interested in are Percentage_changes_per_industry (the percentage of people that become entrepreneurs) and age_median (the median age of the workers).
The dataset looks something like this
The regression and results I have now are the following:
My doubt then is: how exactly can we interpret the effect age_median has on Percentage_changes_per_industry? Is it:
- The value of the coefficient for Median Age is of -0.037, which indicates that when the Median Age goes up by one year, the expected Rate of New Entrepreneurs decreases by 0.037 percent, on average, everything else held constant.
or
- The value of the coefficient for Median Age is of -0.037, which indicates that when the Median Age of a given industry, in a given year, goes up by one year, the expected Rate of New Entrepreneurs decreases by 0.037 percent, on average, everything else held constant
Basically, when interpreting results that stem from longitudinal datasets, in which the data is grouped (in my case per industry and per year), do we have to be careful to analyze the results also taking that into account, or not?
Thank you,
Rui
I have a doubt when interpreting the results obtained through OLS, when using a longitudinal dataset, that may be subtle, but I am afraid may be more impactful than I realize.
The dataset I have has information about a set of industries, for the years 2008-2019. The variables I am for now interested in are Percentage_changes_per_industry (the percentage of people that become entrepreneurs) and age_median (the median age of the workers).
The dataset looks something like this
| caem2 | year | Percentages_changes | Age_median |
| 1 | 2008 | 0.10029345 | 43 |
| 1 | 2009 | 0.16616431 | 43 |
| 1 | 2010 | 0.62419285 | 43 |
| 1 | 2011 | 0.60629515 | 43 |
| 1 | 2012 | 0.57011572 | 43 |
| 1 | 2013 | 0.62000761 | 43 |
| 1 | 2014 | 0.52445023 | 43 |
| 1 | 2015 | 0.6367258 | 42 |
| 1 | 2016 | 0.65820404 | 42 |
| 1 | 2017 | 0.5906995 | 42 |
| 1 | 2018 | 0.56186026 | 42 |
| 1 | 2019 | 0.50835528 | 41 |
| 2 | 2008 | 0.4870546 | 39 |
| 2 | 2009 | 0.34400635 | 40 |
| 2 | 2010 | 0.76704545 | 40 |
| 2 | 2011 | 1.1684783 | 41 |
| 2 | 2012 | 0.65547981 | 41 |
| 2 | 2013 | 1.1118997 | 41 |
| 2 | 2014 | 1.1496571 | 41 |
| 2 | 2015 | 1.027984 | 42 |
| 2 | 2016 | 0.83689459 | 42 |
| 2 | 2017 | 1.3143872 | 43 |
| 2 | 2018 | 0.95934959 | 43 |
| 2 | 2019 | 1.1663697 | 43 |
| 3 | 2008 | 0.63722259 | 45 |
The regression and results I have now are the following:
Code:
reg Percentage_changes_per_industry age_median
Source | SS df MS Number of obs = 892
-------------+---------------------------------- F(1, 890) = 90.46
Model | 15.6591594 1 15.6591594 Prob > F = 0.0000
Residual | 154.065501 890 .173107305 R-squared = 0.0923
-------------+---------------------------------- Adj R-squared = 0.0912
Total | 169.724661 891 .190487835 Root MSE = .41606
------------------------------------------------------------------------------
Percentage~y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_median | -.0373358 .0039255 -9.51 0.000 -.0450402 -.0296314
_cons | 1.908756 .1557425 12.26 0.000 1.603091 2.214422
------------------------------------------------------------------------------
My doubt then is: how exactly can we interpret the effect age_median has on Percentage_changes_per_industry? Is it:
- The value of the coefficient for Median Age is of -0.037, which indicates that when the Median Age goes up by one year, the expected Rate of New Entrepreneurs decreases by 0.037 percent, on average, everything else held constant.
or
- The value of the coefficient for Median Age is of -0.037, which indicates that when the Median Age of a given industry, in a given year, goes up by one year, the expected Rate of New Entrepreneurs decreases by 0.037 percent, on average, everything else held constant
Basically, when interpreting results that stem from longitudinal datasets, in which the data is grouped (in my case per industry and per year), do we have to be careful to analyze the results also taking that into account, or not?
Thank you,
Rui

Comment