Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced panel data with non-random missing observations

    Dear all,

    I am working on an unbalanced panel dataset that reports data of 1110 certified organisations over 7 years (2013 to 2019) (see code below). However, I have a lot of missing values as organisations only enter the dataset once it is certified, so for organisations certified in 2018, I have two years of data (2018 and 2019) - for organisations certified in 2016, I have four years of data (2016, 2017, 2018, 2019). Therefore, I understand that the missing observations (organisation-year) are non-random.

    Code:
    xtset No Year
           panel variable:  No (unbalanced)
            time variable:  Year, 2013 to 2019, but with gaps
                    delta:  1 unit
    Code:
     xtdes
    
          No:  1, 2, ..., 1110                                   n =       1110
        Year:  2013, 2014, ..., 2019                             T =          7
               Delta(Year) = 1 unit
               Span(Year)  = 7 periods
               (No*Year uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       1       1         2         3       5       7
    
         Freq.  Percent    Cum. |  Pattern
     ---------------------------+---------
          446     40.18   40.18 |  ......1
          262     23.60   63.78 |  .....11
           91      8.20   71.98 |  ....111
           71      6.40   78.38 |  ...1111
           33      2.97   81.35 |  ..11111
           28      2.52   83.87 |  ....1.1
           24      2.16   86.04 |  .111111
           24      2.16   88.20 |  1111111
           15      1.35   89.55 |  ...11.1
          116     10.45  100.00 | (other patterns)
     ---------------------------+---------
         1110    100.00         |  XXXXXXX
    I thoroughly looked into the Web and previous posts on Statalist and I often see recommendation of imputing the missing data. However, my advisors asked me to no attempt filling up missing values.
    If I understood well, if my missing observations were random, I shouldn't worry too much. However, since the missing observations are non-random, what can I do, if I can't impute the data?

    Thank you so much in advance for your help!
    I hope I got the posting guidelines correctly.

    I am using Stata 16.1 on Mac.
    Last edited by Jeanne Roche; 23 Sep 2021, 08:58. Reason: I added the output of the xtset command for more clarity on the dataset

  • #2
    Bonjour Jeanne,

    Someone will probably provide a much better solution later. In the interest of time and to help you if ever you're under any time constraint, here is my limited advice.

    What you could try to do is to generate a dummy taking the value of one when one variable is missing. Then perform a logit regression of this dummy on a dummy assuming the value of one in the case of certification and other explanatory variables if theory dictates there are other detemrinants of missingness. Is this feasible?

    Then assess the significance of the odds ratio on the explanatory variables. If the odds ratios are significant, this would be problematic. You may have to include it as a footnote in your findings as a potential source of bias. If they are not, it is less of a problem.

    Your situation might somewhat resemble survivorship bias in finance.

    Comment


    • #3
      Jeanne:
      please note that multiple imputation procedures were actually developed for data missing at random.
      When the missing mechanism in NMAR or MNAR (missing not at random),other methods are recommended (delta-adjustment) (see for instance Stef van Buuren et al. pivotal paper on this topic at https://pubmed.ncbi.nlm.nih.gov/10204197/).
      That said, it remains to be seen whether performng the analysis as requested by your supervisor makes sense (as panel with gaps are really common) in your research field or other takes are suggested by previous papers on the topic you're dealing with.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear Maxence and Carlo,

        Thank you so much for your prompt responses! I started working on Maxence's point and while I find that the "missingness" doesn't affect some of my (candidate) explanatory variables, it does affect others so I need to find a solution.

        I will have a look at delta-adjustment, however, I understand that this is also a multiple imputation solution, correct? But how does it work if the missing values are non-random since I understood that imputation was designed for observations missing at random - or did I understand it all wrong?

        Thanks a lot for your help!
        Best regards,
        Jeanne

        Comment


        • #5
          Jeanne:
          take a look at the paper and everything will be clearer.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X