Football, or soccer, Dataset

Mike Forest

Join Date: Sep 2016

Posts: 1
#1

Football, or soccer, Dataset

17 Sep 2016, 06:56

Hi everyone,

I'm trying to build a statistical football model for fun. I am focussing on one football club. My dependent variable is the result of the match (a win, draw or loss) and my independent variables include amount of possession, Shots on target (for), Shots on target (conceded).

A sample of my dataset in Stata looks like this: I have 85 observations.

Result Possession SOT (for) SOT (against)

W 62.9 5 3

D 42.2 3 4

L 58.1 1 4

I suppose the goal of this model is to see the influence of the three independent variables on the result of the match.

My dependent variable is obviously a string variable. What is the best way to convert it?

Is there any other suggestions that you guys would make on how to run this model?

Thanks

M
Tags: None
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#2

17 Sep 2016, 07:29

Hi Mike,

so you want your explained variable to be categorical. I would do something like

Code:

gen outcome = 1 * (Result == "W") + 2 * (Result == "D") + 3 * (Result == "L") label var outcome "Game outcome" label def out /// 1 "Win" /// 2 "Draw" /// 3 "Loss"

or you can use the command encode. See help encode.

I would use the ordered logit estimator to estimate this model. See help ologit.

Alfonso Sanchez-Penalver
1 like
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

17 Sep 2016, 09:53

Let me add to Alfonso's advice that using encode without sufficient caution can lead to unexpected results. Consider the example below. In the initial use of encode the outcome string is encoded in alphabetical order D=1 L=2 W=3 which is not the order you'd like for ordered logit estimation. By creating the value label I uncreatively called wdl and using it in conjunction with encode the desired mapping can be obtained as in the second use of encode.

Code:

. input str1 outcome_s

     outcome_s
  1. W
  2. L
  3. W
  4. D
  5. end

. encode outcome_s, generate(outcome1)

. list, clean nolabel

       outcom~s   outcome1  
  1.          W          3  
  2.          L          2  
  3.          W          3  
  4.          D          1  

. label def wdl   ///
>     1 "W"    ///
>     2 "D" ///
>     3 "L"

. encode outcome_s, generate(outcome2) label(wdl)

. list, clean nolabel

       outcom~s   outcome1   outcome2  
  1.          W          3          1  
  2.          L          2          3  
  3.          W          3          1  
  4.          D          1          2  

. list, clean

       outcom~s   outcome1   outcome2  
  1.          W          W          W  
  2.          L          L          L  
  3.          W          W          W  
  4.          D          D          D  

.

Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#4

17 Sep 2016, 10:17

William's point is right on which is why many times I use the first approach I sugest, particularly when there aren't many categories, because otherwise you have to add a multiplier of a condition for each alternative. I forgot to add the command after defining the label to use it with the variable I created. So to finish the code

Code:

gen outcome = 1 * (Result == "W") + 2 * (Result == "D") + 3 * (Result == "L") label var outcome "Game outcome" label def out /// 1 "Win" /// 2 "Draw" /// 3 "Loss" label val outcome out

Alfonso Sanchez-Penalver
Comment

Result	Possession	SOT (for)	SOT (against)
W	62.9	5	3
D	42.2	3	4
L	58.1	1	4

Announcement

Football, or soccer, Dataset

Comment

Comment

Comment