Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discrete Dependent Variable with Limited Possible Values

    I am running a regression for the outcomes of English Premier League football matches measured in terms of the number of points achieved (3 for a win, 1 for a draw, 0 for a loss.)

    Each match contributes two observations (one for the home team and one for the away team).

    I am really struggling to determine which type of regression to use - I have considered the following:
    1. Ordered logistic regression - the problem with this is that it treats the gap between win and draw as equal to the gap between draw and loss (it does not account for the cardinal nature of the dependent variable).
    2. Negative binomial/poisson regression - the problem with this is that it assumes the possibility that 2 points or 4 points could be scored which is impossible in reality.
    3. Multinomial logistic regression - this seems the only valid approach but has the problem that it is more difficult to interpret as it results in two sets of coefficients (draw vs loss and win vs loss).
    I am wondering if there is any regression which will result in just a single set of coefficients which would still be valid given the very specific nature of my dependent variable?

    Any thoughts at all would be greatly appreciated.

  • #2
    You have an additional problem in that the two observations from any game are not independent. Indeed, given the result for one team, the other is completely specified.

    It would be much easier if you worked with difference in goals scored - you wouldn't have the "no 2's" problem and you'd have one observation per game. What teams actually do is a difference in goals scored - the points achieved is a non-linear function of the difference in goals scored. If you're using team variables to explain these outcomes, difference in goals scored makes more sense than points achieved. This would let you use negative binomial or poisson.

    Sorry I can't be of more assistance - I don't know of a good answer to your question as posed.



    Comment


    • #3
      Many thanks for taking the time to reply, I really appreciate it.

      I am hoping that by using clustered standard errors on MatchId (takes the same value for home and away teams) I will account for the lack of independence of the two observations.

      The reason I am hesitant to use difference in goals scored is very similar to my reservation about using ordered logistic regression.The difference between 0 goal difference and 1 goal difference is the difference between a win and a draw (this is the outcome that really matters to the team) but the goal difference model assumes that this matters equally as the difference between 2 goal difference and 3 goal difference. In reality the former is vastly more important than the latter and so I am not convinced that this model is any better.

      Comment


      • #4
        Jay, can you explain the purpose of the analysis? Is it to predict the actual result of the next game, or to test the significance of a certain variable of interest?

        I ask because if it's to predict the result of the next game you could do the estimation per club (or national team for that matter), which would get rid of the problem of dependence between the two sides of the match.

        A comment about your explanations on the logits. You say that ordered logit won't count for the cardinality of the explained variable. The multinomial logi won't either. If you decide to go the logit way, the ordered logit model is the one that applies here: there is a natural order in the sense that one result is better than another. You also say that the multinomial logit's results are difficult to interpret because they are relative to one outcome (loss in your example). This is not quite true. You can calculate the marginal effects (either AMEs or MEMs) using margins. It will give you the effect on the probability of an outcome. Since the net effect on the total probability of all outcomes has to be zero (since total probability has to equal 1), the sum of the marginal effects on the three possible outcomes for each variable has to be zero. This means that if you get the marginal effects on two possible outcomes (of the three) the marginal effect on the third has to equal the negative sum of the two other.

        I hope my comments help.
        Alfonso Sanchez-Penalver

        Comment


        • #5
          a zero truncated poisson maybe

          Comment


          • #6
            sorry meant poisson regression

            Comment


            • #7
              Dear Jay,

              This sounds like an interesting problem and, as suggested above, it would be good if you could explain what is the purpose of the model.

              Best wishes,

              Joao

              Comment


              • #8
                A poisson may or may not be appropriate. As Jay points out it gives probability to more than three points. It may be a good approximation if you're interested in predicting the points. I personally don't understand why the cardinality of the points is important, because if I predict points of 1.5 what does that mean: half a win? Closer to a draw than to a win? However, as I pointed out before and Joao Santos Silva did later, unless Jay describes the purpose of the analysis and the data he's thinking of using for it, it's all a guessing game really.
                Last edited by Alfonso Sánchez-Peñalver; 10 Sep 2016, 16:14.
                Alfonso Sanchez-Penalver

                Comment

                Working...
                X