Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel Linear Regression: how to create model with my data in stata

    Hello,

    I am working with Stata for the first time and statistics was not my strongest point so far, so I really need your help... I am analyzing (among other things) how team experience and substitutions affect the performance of soccer teams. For this, I analyze the games of 22 teams of the last 10 years Bundesliga. So for each team, there are several data from different seasons. My tutor told me that I should look for fixed effect / mixed effect modeling because I have to consider the influences of the factors seasons and teams. I have now read a lot on the subject and now I think I know that I probably have to work with a multilevel linear regression model, but I do not really know what I have to enter in SPSS so that it works, because I have not yet found an example that uses similar data as I do. Can you help me here? I have attached an example file and hope that you can help me. exampleData.xlsx

    Thanks in advance!
    Joshua

  • #2
    For help with SPSS, you should go to an SPSS user forum. The activity on this site is about Stata (or general questions about statistics.)

    Also, for future reference when asking Stata questions here, providing example data in a spreadsheet is not helpful. First, many of the people who respond here will not download anything from a stranger. Even then, to make use of the example data, they would have to import it into Stata--and who knows what kind of subsequent cleaning it would require before it was usable. If you don't already have your data in Stata, then it is premature to ask for help with code. If you do, then you should show the example data using Stata's -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hello, sorry for the poor preparation of my post. I made a mistake with SPSS, that's what I used at first, but I've now switched to Stata. I am still very new with everything, please bear with me. I have attached an example of my data here with -dataex-.

      I hope you could help me with the question of whether a multilevel linear regression is suitable here if I want to include the factors of team and season in my regression and how I can best implement this with Stata.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str19 team int season byte matchDay double(sharedExperience_all subs_min85)
      "Augsburg" 2017  4 136.9230769 .666666667
      "Augsburg" 2017 20 151.2307692 .333333333
      "Augsburg" 2017  9       120.5 .666666667
      "Augsburg" 2016 21 192.6428571          1
      "Augsburg" 2016 12 101.3571429 .666666667
      "Augsburg" 2015 25 134.5714286          1
      "Augsburg" 2015  6 99.07142857          1
      "Dortmund" 2017  7      120.75          0
      "Dortmund" 2017 25 91.92857143 .666666667
      "Dortmund" 2016 24 158.0714286          1
      "Dortmund" 2016 19 143.3846154 .666666667
      "Dortmund" 2015 22 140.5384615 .333333333
      "Dortmund" 2015 28 167.8571429 .666666667
      "Freiburg" 2017  9         115          1
      "Freiburg" 2017  3 123.7692308 .666666667
      "Freiburg" 2016  1         148          1
      "Freiburg" 2016 15 200.5714286 .666666667
      "Freiburg" 2016 20         129 .666666667
      "Freiburg" 2015 19         148 .666666667
      "Freiburg" 2015 14 150.2857143          1
      end

      Best, Joshua

      Comment


      • #4
        In Stata, type this into the Command panel at the bottom and submit it:
        Code:
        help mixed
        Then, an online help page should pop up, at about the second row, click on the PDF document (it contains more than the online one, including uses cases.) The ME volume should then be opened, it's ~600 pages of how to run a mixed effects model in Stata. Given your depending variable seems to be continuous, try read up on the section about -mixed- first. Stata PDF manual is very comprehensive, best first place to start.

        If you need more contexts, I can't recommend this dual volumes enough: https://www.stata.com/bookstore/mult...odeling-stata/. Check if you can get access through your school library.

        Comment


        • #5
          Ken Chui gives excellent advice, and I, too, endorse that book.

          Looking at your data and the description, I wonder if a multi-level model is your best bet for incorporating team and year effects. My concern is the lmited numbers of both teams and years. When you are using random-effects in multi-level models you are, in effect, sampling the universes of those effects to estimate outcome variance at that level. While 22 teams is possibly a (barely) adequate sample of team-space to estimate team-level variation, 10 years is probably not. I would be more inclined to do a one-level model and just include i.team and i.year among the explanatory variables. At most, I might go with team as a level for a random effect, but keep i.year on the bottom level of the model. There is also the question of the total sample size. As I don't follow sports, I have no idea how many games a soccer team typically plays in a year. But you don't want your number of explanatory variables to outrun your sample size and leave you overfitting the noise. Different people have different rules of thumb, but if you aren't going to have a minimum of 10 games per team per year, you are not in good shape to model all of those effects and you might have to rethink your approach. (For example, you might look for continous time trends rather than yearly idiosyncracies, or group time into somewhat longer intervals than one year.)

          Comment


          • #6
            Thank you both for your answer. I got the book from the library and have thought a bit about my data: actually, nothing should change over the years/seasons, so I don't need the influence of the seasons (i would test that of course but it shouldn't change something). This would mean that it would be sufficient to include i.team in the normal -regress- command. Isn't that then the same as when I do -xtset team- beforehand and use -xtreg ....,fe -? Here I could include the a continuous time variable with a count of the 340 match days (from 1 to 340, as I have 10 years with 34 matchdays each). But I don't know how exactly that makes sense and is necessary.


            In general, all my variables are time-variant. I also have a ranking value for each team as a control variable, which is updated after each match. I wondered whether I need all this at all since the strength of a team is adjusted according to time.Isn't a normal -regress- without dummy variables for the teams then actually sufficient as a regression?

            Thanks again!

            Comment


            • #7
              This would mean that it would be sufficient to include i.team in the normal -regress- command. Isn't that then the same as when I do -xtset team- beforehand and use -xtreg ....,fe -?
              Yes, it is.

              I won't try to respond to the other questions you raise because they depend on a knowledge of the substance here, and my ignorance in the area of sports is profound.

              Comment

              Working...
              X