Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data with a gap

    Dear Teachers and Researchers,

    I have several questions related to my data. I am currently in the stage of progress report for my Dissertation. I am investigating the institutional effects on emigration in developing countries.

    The dataset which I am working on is with 5 years interval for the period between 1990 and 2010 for the regressand, while my predictor variables are yearly.
    For instance, for 1990, I am taking the average for my predictor variables ( sum(1986;1987;1988;1989) / 4 ) The same technique was used for other respective years using the same method, as it is suggested by my some friends, hence our independent variable predicts the dependent variable.

    So my questions are:

    1) Is it correct to do regression analysis using the included method, or I have to include all 5 years and take the average ( sum(1986;1987;1988;1989;1990 / 5)?
    2) Another question is related to sample selection bias. As I mentioned above I am focusing on developing countries. The data on emigration rate for developing countries is available for 128 countries, while the data for the explanatory variable of primary interest is available for 93 countries.
    Do I need still to keep data for 35 countries, despite of no data available for them? Or is it acceptable to drop them?

    Thank you for your answers in advance,

    Wish you all good health,
    Kind regards

  • #2
    Without knowing exactly your data, it will be difficult to give a good answer. You should also look at the relevant literature to see which methods have been used for this kind of data.

    If I had to do the analysis, then I would not average over the years for which I have gaps for my depedent variable. Instead I would use lagged values of my indepedent variables.
    You should use averaging only if you have good theoretical reasons to do so.

    Regarding your second question: You could keep the data and try to predict the values of the dependent of the countries with missing variables. This could act as a test how could your model works.

    Comment


    • #3
      Dear Sven-Kristjan,

      First of all, thank you for your reply,

      Related to my data for the dependent variable, it is available only for five points in time (1990, 1995, 2000, 2005, 2010). The literature review says that all other variables should be averaged over the 5-year interval to maintain consistency.

      So, my question was do I need to average either in the way of ( sum(1986;1987;1988;1989) / 4 ) or ( sum(1986;1987;1988;1989;1990 / 5) ?

      Once more thank you for your reply!

      Comment


      • #4
        I would need to know your dataset and your planned estimation methods to give you a better answer. If the literature review tells you that you should average, then you should average over all 5 years.
        It all depends on the assumptions you make about when the decision to emigrate is made and executed.
        You could try out different weighting schemes to account for the fact that maybe more recent years matter more for the emigration decision. That's why I would check the autocorrelation of the indepedent variables. Again, it all depends on your assumptions about the underlying processes.
        At the moment, your averaging scheme assumes that all years have the same impact on the decision to emigrate. You probably also assume that the emigration in the years that you do not observe follow a linear trend.
        So there are many things that I would check to make sure that your results are robust.

        Comment

        Working...
        X