Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Football, or soccer, Dataset

    Hi everyone,

    I'm trying to build a statistical football model for fun. I am focussing on one football club. My dependent variable is the result of the match (a win, draw or loss) and my independent variables include amount of possession, Shots on target (for), Shots on target (conceded).

    A sample of my dataset in Stata looks like this: I have 85 observations.
    Result Possession SOT (for) SOT (against)
    W 62.9 5 3
    D 42.2 3 4
    L 58.1 1 4

    I suppose the goal of this model is to see the influence of the three independent variables on the result of the match.

    My dependent variable is obviously a string variable. What is the best way to convert it?

    Is there any other suggestions that you guys would make on how to run this model?


    Thanks

    M

  • #2
    Hi Mike,

    so you want your explained variable to be categorical. I would do something like
    Code:
    gen outcome = 1 * (Result == "W") + 2 * (Result == "D") + 3 * (Result == "L")
    label var outcome "Game outcome"
    label def out   ///
        1 "Win"    ///
        2 "Draw" ///
        3 "Loss"
    or you can use the command encode. See help encode.

    I would use the ordered logit estimator to estimate this model. See help ologit.
    Alfonso Sanchez-Penalver

    Comment


    • #3
      Let me add to Alfonso's advice that using encode without sufficient caution can lead to unexpected results. Consider the example below. In the initial use of encode the outcome string is encoded in alphabetical order D=1 L=2 W=3 which is not the order you'd like for ordered logit estimation. By creating the value label I uncreatively called wdl and using it in conjunction with encode the desired mapping can be obtained as in the second use of encode.

      Code:
      . input str1 outcome_s
      
           outcome_s
        1. W
        2. L
        3. W
        4. D
        5. end
      
      . encode outcome_s, generate(outcome1)
      
      . list, clean nolabel
      
             outcom~s   outcome1  
        1.          W          3  
        2.          L          2  
        3.          W          3  
        4.          D          1  
      
      . label def wdl   ///
      >     1 "W"    ///
      >     2 "D" ///
      >     3 "L"
      
      . encode outcome_s, generate(outcome2) label(wdl)
      
      . list, clean nolabel
      
             outcom~s   outcome1   outcome2  
        1.          W          3          1  
        2.          L          2          3  
        3.          W          3          1  
        4.          D          1          2  
      
      . list, clean
      
             outcom~s   outcome1   outcome2  
        1.          W          W          W  
        2.          L          L          L  
        3.          W          W          W  
        4.          D          D          D  
      
      .

      Comment


      • #4
        William's point is right on which is why many times I use the first approach I sugest, particularly when there aren't many categories, because otherwise you have to add a multiplier of a condition for each alternative. I forgot to add the command after defining the label to use it with the variable I created. So to finish the code
        Code:
        gen outcome = 1 * (Result == "W") + 2 * (Result == "D") + 3 * (Result == "L")
        label var outcome "Game outcome"
        label def out ///
           1 "Win" ///
           2 "Draw" ///
           3 "Loss"
        label val outcome out
        Alfonso Sanchez-Penalver

        Comment

        Working...
        X