Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extreme values that may drive the curvilinear relationships

    Dear all,

    I am writing to ask for your thoughts and advice. I am struggling to address the potential outlier problems in the relationships I found. My dependent variable is firm action and my main independent variable is firm's relative position, which is divided into two depending on whether the focal firm's performance is above or below industry average. I used a spline function to create two variables, positive relative performance and negative relative performance. All quadratic terms including lower order terms are statistically significant, and values of each vertex comfortably lie within the range of X-axis.

    I plotted a graph for positive performance.
    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	14.2 KB
ID:	1469240


    The value of vertex is about 11 (in million USD). However, the value is too extreme compared to its mean.

    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	24.3 KB
ID:	1469239

    The above is a summary stat of positive performance. When the focal firm has negative performance, it is coded as zero for a positive performance variable. Too many zeros indicate that by the nature of this volatile industry, most of firms are suffering from poor performance.

    Mean is only .68, and the value of vertex is almost the 99th percentile and far greater than the mean. Following suggestions from colleagues, I winsorized the variable at 1% & 99% or 5% & 95%, and the curvilinear relationship was gone because the turning point disappeared. Log transformation of such variable also makes the quadratic term insignificant.

    However, one of distinct properties of this industry is the dominance of extreme events: due to the 'winner-take-all' principle, less than 10% firms take about 80% of gross in this industry. For this reason, It seems to me that I should not consider the extreme values outliers and keep them in my model regardless of their statistical impact on the average.

    I am not sure whether I should address the potential outlier problem and report that there is no curvilinear relationship for positive performance. I would greatly appreciate it if anyone can give me advice or suggestions.


    Thank you in advance for your help.


    Best,


    Anna





  • #2
    You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    I work with similar models. "Too many zeros indicate that by the nature of this volatile industry, most of firms are suffering from poor performance" - this doesn't make sense if you're using industry-year as the reference point. Industries with low performance will have low reference points.

    The literature on outliers is quite problematic. Some disciplines like winsorizing, others like things like cook's d, some use complicated procedures, and others don't like dropping observations at all. I would be a little concerned that a small number of observations are driving your results. One thing you can do is make relative performance into a series of dummies (one dummy for 0-.2, another for .2 - 1, etc.) and see what you get. You might look at a forthcoming article in Strategy Science by Myles Shaver.

    You talk about performance, but don't say what your measure of performance is. I'm not sure it makes sense to use sales as the performance measure in most models - Tesla doesn't feel bad because it has lower sales than Toyota.

    Comment


    • #3
      Dear Prof. Bromiley,

      Thank you so much for your suggestions and comments.

      As for your concerns, about 65% of relative performance [the focal firm's performance - its peers'] has negative values and only 35% of it has positive values. I divided this variable into two using spline functions: As a result, 65% of positive relative performance variable has zeros, whereas 35% of negative relative performance variable has zeros. So, what I meant was, many firms are suffering from poor performance 'relative to their peers'.

      Sales is a widely accepted measure of performance at least in this industry setting. I have read several articles and magazines and found that CEOs or any other managerial decision makers actually care about their relative performance position compared to their peers / competitors. Although I mentioned that I used the industry average as a reference point, more precisely I used a measure based on firm similarity (i.e. their size and market coverage) to define 'peers' given the fact that big companies are less likely to care about revenues of unknown small companies unless they exceptionally outperform in terms of revenues in the markets the big firms are focusing on. Tesla is a bit exceptional case given the breath of its market, but in general managers would raise a question on why their firm is suffering while other firms are doing well.

      Obviously, ROA is a better measure of performance in general, but due to data limitation, I cannot calculate ROA because many observations in my data do not have information on cost.

      As for the outlier problems, I will definitely try what you suggested. The number of extreme values in the positive relative performance is about 6% of total # of non-zero values in it. It's small, but it's not negligible. I checked to see whether the vertex point is always included in the variable each year, and it is. I also found that the maximum value and the 99th percentile value increase by year. The values are extreme, but they are there in my data of industry population, which is why I still believe that they contain valuable information that should not be overlooked.

      Thank you again for your advice!
      (I have been greatly inspired by your works and it's great to have your comments.)


      Best,

      Anna
      Last edited by Anna Pak; 08 Nov 2018, 00:39.

      Comment

      Working...
      X