Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: Check data availability over several previous months - data has jumps and missing values

    Hi guys,

    My problem is the following:

    Data
    • I have panel data sorted by company_id and month_id
    • Let us assume I have only one more variable, var1
    • Example: see table below
    Goal
    • On a monthly basis, I would like to check whether there have been at least 10 non-missing observations of var1 within the past 20 months in order to drop those
    Issue
    • Although most observations follow month on month, there might be a gap of several years between some observations of var for a company_id
    • Var1 can be existing or missing, month_ids might jump by several months/years which indicates that data in between is missing
    company_id month_id var1 data_availability [10 out of 12 past months]
    ... ... ... ...
    1 1988m1 5 yes
    1 1988m2 . yes
    1 1998m3 4 yes
    1 1995m10 6 no
    1 1995m11 7 no
    2 1987m1 5 yes
    2 1987m2 5 yes
    2 1989m5 8 no
    2 1989m6 . no
















    My approaches to build a loop did not yield the expected results, so I would be happy if anybody could give me some creative input.

    Thank you,
    Carlos

  • #2
    Please provide a sample of your data using dataex and the code you tried when building your loop. Read the FAQ (especially #12) for information about dataex and using code delimiters. Thanks!

    Also be very clear. At one point you say 10 out of 20 months, and another you say 10 out of 12. Do you want to drop the year if there are not at least 10 months of data? But just for one company at a time, right? Do the months need to be consecutive? You say within the past 20 months--do you mean that you only want to look at the last 20 months of the series?
    Last edited by Carole J. Wilson; 11 May 2016, 17:45.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Thank you Carole,

      Let's see how it works with the following dataset. As you can see, there are occasions, in which months are skipped - e.g. from month_id 126 to 131 [by the way, how can I format the month_id in %tm in the dataex command? ], as well there are missing values, even if the month is not skipped.
      • For each month_id, I have to check if var1 is available in, let's say 4 out of 5 months [the actual number of months does actually not really matter, can also be 10 out of 12 or 10 out of 20, ... as I have to do it for different availability criteria]
      • This means that at maximum 1 value within the last 5 months for that specific company may be missing or skipped
      • If var1 is available in the past 4 out of 5 months the availability_dummy variable is evaluated TRUE, if not FALSE
      Code:
      clear
      input float(company_id month_id var1 availability_dummy)
      10006 120  46.375  FALSE
      10006 121  48.125  FALSE
      10006 122   49.25  FALSE
      10006 123    45.5  FALSE
      10006 124    45.9  TRUE
      10006 126      38  TRUE
      10006 131  45.625  FALSE
      10006 132  47.625  FALSE
      10006 170      52  FALSE
      10006 171      46  FALSE
      10006 172      42  FALSE
      10006 173          FALSE
      10006 174   36.75  TRUE
      10014 120   9.875  FALSE
      10014 121   9.375  FALSE
      10014 122   7.875  FALSE
      10014 123   5.375  FALSE
      10014 124   8.375  TRUE
      10014 125          TRUE
      10014 126   3.625  TRUE
      10014 127     4.5  TRUE
      10014 128          TRUE
      10014 129   3.875  FALSE
      10014 178   1.125  FALSE
      10014 179          FALSE
      10014 180     1.5  FALSE
      10014 181   1.625  TRUE

      My (failed) approach
      Code:
      xtset company_id month_id
      generate availability_dummy = .
      generate number_consecutive_months =.
      
      levelsof month_id, local(M)
         foreach m of local M {
         replace number_consecutive_months = number_consecutive_months + 1 if month_id = month_id[_n-1]+1
         }
      replace availability_dummy = "TRUE" if number_consecutive_months >= 4
      replace availability_dummy = "FALSE" if number_consecutive_months < 4
      Last edited by Carlos Teigimiz; 13 May 2016, 05:32.

      Comment


      • #4
        See if panelthin (SSC) is what you seek.

        Comment


        • #5
          Hi Nick,

          Thanks for your suggestion, you're a legend among our Stata class!
          I just thought about the possible use of your panelthin command. But actually, I could not come up with an idea on how to use it to count the number of occurrences/non-missing observations in a variable over a specified preceding time period.

          Comment


          • #6
            OK. Your one-sentence summary helps me to understand what you are interested in and to make another suggestion.

            Note that in #3 you seemed to have started with dataex output and then edited it. But your code segment would not run as you have implicit missings and (TRUE, FALSE) as values of a supposedly numeric variable.

            Starting with a version of your example I note that rangestat (SSC) could be used (e.g.) count values in the previous 12 months for each panel. If a variable were sometimes missing, then just use an indicator variable for non-missing values instead.

            Code:
            clear
            input float(company_id month_id var1)
            10006 120  46.375  
            10006 121  48.125  
            10006 122   49.25  
            10006 123    45.5  
            10006 124    45.9  
            10006 126      38  
            10006 131  45.625  
            10006 132  47.625  
            10006 170      52  
            10006 171      46  
            10006 172      42  
            10006 174   36.75  
            10014 120   9.875  
            10014 121   9.375  
            10014 122   7.875  
            10014 123   5.375  
            10014 124   8.375  
            10014 126   3.625  
            10014 127     4.5  
            10014 129   3.875  
            10014 178   1.125  
            10014 180     1.5  
            10014 181   1.625  
            end
            
            rangestat (count) var1, by(company_id) interval(month_id -12 -1)
            
            gen ispresent = !missing(var1)
            
            rangestat (count) nonmiss=ispresent, by(company_id) interval(month_id -12 -1)
            
            list , sepby(company_id)
            
            
            clear
            input float(company_id month_id var1)
            10006 120  46.375  
            10006 121  48.125  
            10006 122   49.25  
            10006 123    45.5  
            10006 124    45.9  
            10006 126      38  
            10006 131  45.625  
            10006 132  47.625  
            10006 170      52  
            10006 171      46  
            10006 172      42  
            10006 174   36.75  
            10014 120   9.875  
            10014 121   9.375  
            10014 122   7.875  
            10014 123   5.375  
            10014 124   8.375  
            10014 126   3.625  
            10014 127     4.5  
            10014 129   3.875  
            10014 178   1.125  
            10014 180     1.5  
            10014 181   1.625  
            end
            
            rangestat (count) var1, by(company_id) interval(month_id -12 -1)
            
            gen ispresent = !missing(var1)
            
            rangestat (count) nonmiss=ispresent, by(company_id) interval(month_id -12 -1)
            
            list , sepby(company_id)
            See e.g.

            http://www.statalist.org/forums/foru...fy-a-benchmark

            http://www.statalist.org/forums/foru...-in-a-by-group

            for very recent mentions of rangestat which in turn cite the original announcement.
            Last edited by Nick Cox; 13 May 2016, 06:32.

            Comment


            • #7
              Thanks, Nick!
              That's an awesome command!

              Comment

              Working...
              X