Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: unbalanced data or balanced data? (fixed effect)

    Hello,

    Curreny I am writing my master thesis about the impact of financial variables on soccer performance. My dataset consists of 23 clubs which played in the highest Dutch soccer division between 2004-2005 and 2013-2014 (ten seasons). Eighteen clubs participate in one season. In a perfect world the dataset would consist of 180 observations per variable, this dataset consist of 170 observations. In the ten season period investigated there are twelve clubs with an ongoing participation in the division, my data is complete for these twelve clubs.
    My financial variables are: net revenue, net profit, salary expenditure and the transfer results. My sport performance variables are, win percentage, log league points and log standings.

    My question is what data set is the most appropriate to analyze the data. Unbalance or balanced panel data?

    With unbalanced data, it suffers from attrition due to relegation of football clubs. With balanced data (12 clubs) my data suffers from sample selection bias. I use Stata 12.

    Kind regards,

    Jordi van Dijk

  • #2
    Jordi:
    I would prefer unbalanced panel data, unless you can justify in the Methods section of your thesis that attrition is not informative (and so you can defend the 12-club option).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo.

      Is there a Stata command that can deal with attrition, or does xtreg, fe handle the attrition?

      Comment


      • #4
        Jordi:
        Stata takes care of unbalanced panel without further request from the researcher.
        However, if your question concerns tools such as inverse probability weighting (IPW) to correct for units lost at follow-up, no they are not included in Stata -xt- commands suite.
        As pointed out before, the issue seems more methodological than Stata-related: is the attrition informative or not (by the way, IPW would imply that attrition is not informative)?
        Last edited by Carlo Lazzaro; 14 Sep 2015, 03:15.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Ok, I get your point Carlo. In my view the attrition is not informative, because the worst performing clubs are relegated to a lower division. In addition to that, the missing variables are usually from low rank clubs. Some clubs do reenter the highest division due to promotion when they are relegated a couple of years before. I Read the Wooldridge "economic analysis of cross selection and panel data", and if I understand it correctly, I have general attrition. With this in consideration is it still better to go for unbalanced data?

          Thanks for your input.

          Comment


          • #6
            Do you have the information of when the are relegated to the lower division? Because you could always create a dummy variable that takes the value of 1 when the club is in that lower division. That would then allow you to have a balanced panel. Just a thought.
            Alfonso Sanchez-Penalver

            Comment


            • #7
              Jordi:
              it yo can retrieve the iinformation suggested by Alfonso, I would go unbalanced.
              Just a closing-out remark: the fact that, as you wrote,
              missing variables are usually from low rank clubs
              may sound as this kind of attrition is informative.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                My first intention was to investigate both divisions to handle the overlap between relegations and promotions. Unfortunately, I would miss many financial variables (more than half). That is why I focused only on the highest divisions. Thank you for your input.

                Comment


                • #9
                  Jordi:
                  as it often happens with statistics, the key issue is that you can defend your approach against discussant(s). I would provide the reader with a sound explanation of your choice, which falls, in all likelihood, in the widest folder labelled "more research is needed" (to investigate the overlap between relegations and promotions).
                  All the best for you and your dissertation.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thanks Carlo, have a nice day.

                    Kind regards,

                    Jordi

                    Comment


                    • #11
                      Hello everyone,
                      I have a similar question as Jordi. In fact reading these posts already helped me, as I am also writing my Master“s Thesis using a Fixed Effects Model on panel data. I have a large dataset: about 40 000 individual observations and about 8000 households. I do two analyses:
                      The first one is on the household level and I analyse how changes in income affect the consumption of junk food/ empty calories
                      The second one is on individual level and I analyse how changes in consumption affect health indicators.

                      The dataset has three waves and about 10% attrtition each year.
                      Now I am unsure if I should work with a balanced or an unbalanced panel. As far as I understand by now I could work with an unbalanced panel if I have observations for at least two waves.
                      How can I determine if the attrition is informative? It would only be informative if unobserved factors that change over time and are correlated with my outcome variable and my error term systematically lead to the attrition, right? But how can I check that, since the factors are unobservable?
                      If I just do a comparison on baseline characteristics this should not be very informative, since these characteristics are controlled for by the Fixed Effects right?

                      Last question: Do I have to drop observations with only one wave or does stata account for that and excludes them automatically from the analysis?

                      Thank you so much for your help in anticipation.

                      Best regards,
                      Anna-Lena

                      Comment


                      • #12
                        Anna-Lena:
                        welcome to the list.
                        Please, for the future start a new thread. Thanks.
                        - no need to drop observations, with missing data, as Stata does it automatically;
                        - no need to work with a "maked-up" balanced panel dataset, as Stata can handle both balanced and unbalanced pane datasets without any additional effort from your side;
                        - 10% attrition is not, at its face-value, that worrisome (but can affect inference notwithstanding);
                        - as far as investigating the underlying missing mechanism is concerned, you may suspect informativeness if (say) a monotonic missing partner exists for some time-varying variables;
                        - as an aside, if individuals are nested within households, you may want to consider -mixed- (that favours a random effect speficification, though).
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Thank you so much Carlo for the information, it was very helpful! I will consider starting a new thread next time! Best regards, Anna-Lena

                          Comment


                          • #14
                            Hello everyone,
                            I have a similar question,
                            I am applying OLS regression with the interaction term, to identify the impact of a particular act, I already have data (2 years before the act, and 4 years after the act), but I am a little confused, should I use the unbalanced data or should I apply the analysis over (2 years before, 2 years after ) which will be strongly balanced structure, most of the prior studies applied it over 2 years before, 2 years after. but in this case, I will lose data for the 2 years after, the results and inferences would be different, so I was wondering which one is preferred. I am using Stata 14

                            Comment

                            Working...
                            X