Dear all,
I am writing to ask for your thoughts and advice. I am struggling to address the potential outlier problems in the relationships I found. My dependent variable is firm action and my main independent variable is firm's relative position, which is divided into two depending on whether the focal firm's performance is above or below industry average. I used a spline function to create two variables, positive relative performance and negative relative performance. All quadratic terms including lower order terms are statistically significant, and values of each vertex comfortably lie within the range of X-axis.
I plotted a graph for positive performance.

The value of vertex is about 11 (in million USD). However, the value is too extreme compared to its mean.

The above is a summary stat of positive performance. When the focal firm has negative performance, it is coded as zero for a positive performance variable. Too many zeros indicate that by the nature of this volatile industry, most of firms are suffering from poor performance.
Mean is only .68, and the value of vertex is almost the 99th percentile and far greater than the mean. Following suggestions from colleagues, I winsorized the variable at 1% & 99% or 5% & 95%, and the curvilinear relationship was gone because the turning point disappeared. Log transformation of such variable also makes the quadratic term insignificant.
However, one of distinct properties of this industry is the dominance of extreme events: due to the 'winner-take-all' principle, less than 10% firms take about 80% of gross in this industry. For this reason, It seems to me that I should not consider the extreme values outliers and keep them in my model regardless of their statistical impact on the average.
I am not sure whether I should address the potential outlier problem and report that there is no curvilinear relationship for positive performance. I would greatly appreciate it if anyone can give me advice or suggestions.
Thank you in advance for your help.
Best,
Anna
I am writing to ask for your thoughts and advice. I am struggling to address the potential outlier problems in the relationships I found. My dependent variable is firm action and my main independent variable is firm's relative position, which is divided into two depending on whether the focal firm's performance is above or below industry average. I used a spline function to create two variables, positive relative performance and negative relative performance. All quadratic terms including lower order terms are statistically significant, and values of each vertex comfortably lie within the range of X-axis.
I plotted a graph for positive performance.
The value of vertex is about 11 (in million USD). However, the value is too extreme compared to its mean.
The above is a summary stat of positive performance. When the focal firm has negative performance, it is coded as zero for a positive performance variable. Too many zeros indicate that by the nature of this volatile industry, most of firms are suffering from poor performance.
Mean is only .68, and the value of vertex is almost the 99th percentile and far greater than the mean. Following suggestions from colleagues, I winsorized the variable at 1% & 99% or 5% & 95%, and the curvilinear relationship was gone because the turning point disappeared. Log transformation of such variable also makes the quadratic term insignificant.
However, one of distinct properties of this industry is the dominance of extreme events: due to the 'winner-take-all' principle, less than 10% firms take about 80% of gross in this industry. For this reason, It seems to me that I should not consider the extreme values outliers and keep them in my model regardless of their statistical impact on the average.
I am not sure whether I should address the potential outlier problem and report that there is no curvilinear relationship for positive performance. I would greatly appreciate it if anyone can give me advice or suggestions.
Thank you in advance for your help.
Best,
Anna
Comment