Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify variables in a range

    Hello,

    Can you please help me with the following issue:
    I would like to identify the ids of the companies that have observations over the whole sample period, i.e. from Jan 2005 to Dec 2017.
    Here is an example of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long date float(mdate id) double ret_eom
    16467 540 14713  .0065657461766821384
    16467 540  8468  -.003093270296692758
    16467 540 17056 -.0016608393468379927
    16467 540 29267  -.001942484988471897
    16467 540 35903   -.01216822099008778
    16467 540 33340 -.0028468637678217734
    16467 540 29323   .007498869742503357
    end
    format %d date
    format %tm mdate
    I tried the following code, but is only counting how many ids satisfy an in range condition, and when I tried to list one of them them, it returned nothing.

    Code:
    egen v1= total(inrange(yr, 2005, 2017)), by (id)
    list if id == 63 & yr == 2006
    Thanks a lot for your help!







  • #2
    Code:
    local n_complete = tm(2017m12) - tm(2005m1) + 1
    
    isid id mdate, sort
    by id: gen byte is_complete = (_N >= `n_complete')
    Note: This code will only work correctly if no id ever has more than one observation in the same month. This condition is verified in the -isid- command, so you will get an error message before anything is calculated if it is inappropriate to proceed.

    Note also that this code is based only on the existence of the observations themselves. It does not check whether the observations have non-missing values for ret_eom.

    Comment


    • #3
      inrange(yr, 2005, 2017) means within the range specified. So inrange(0, -1e9, 1e9) is true because 0 is indeed within the range from -1 billion to 1 billion.

      But it will count observations that are within that range. You don't mention yr in your data example but it's easy to guess what it records. So I don't understand "returned nothing".

      If you mean a complete record then 2005-2017 includes 13 * 12 = 156 months, so

      Code:
      bysort id : gen wanted = _N == 156
      is a direct approach and identifies complete records.

      If you mean that Jan 2005 and Dec 2017 are both in the record, that's a weaker criterion for which code is

      Code:
      bysort id (date) : gen wanted2 = date[1] == ym(2005, 1) & date[_N] == ym(2017, 12)

      Comment


      • #4
        Thank you so much, Clyde, it worked!

        Thank you so much, Nick! I meant the first condition, which also worked! I will do my best to be more precise.

        Comment

        Working...
        X