Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy for whether individuals in panel have at least one missing value for a variable

    Hello,

    I have a panel of banks, each with their unique ids, from the years 1905 to 1910. The panel is unbalanced, so some banks may only have observations for a few consecutive years between 1905 and 1910 (e.g. 1907-1909).

    Each banks reports losses each year, but some values in the variable "loss" are missing. For example, for Bank A in year 1906 "loss" = 123, but for the same bank, "loss" = . in year 1907.

    I want to count how many banks have missing values for "loss" - this would mean, for each unique id_bank, to count whether there is at least one value of "loss" = .

    So far I have managed to create a dummy ("count") to count all the missing values of "loss", but this doesn't work since some banks may have missing values for more than one year.

    I know how to create a variable that counts how many id_banks there are by counting either the first or the last occurrence of each id_bank, e.g.:

    bysort: Id_Bank: gen n_bank = _n == 1

    which counts each first occurrence of each Id_Bank, but I can't use this together with the dummy "count" to create a new dummy = 1 when a bank has a missing value for "loss" because it will disregards the observations where "loss" = . in an observation that is *not* the first one in each bank.

    So my question is:

    How do I create a dummy that accounts for missing values of "loss" only once for each id_bank?

    Apologies if the question is less than clear, but I am not quite sure how else to express it. Thank you very much for any help with this!

    Best wishes,

    Beatrice

  • #2
    Hi Beatrice,
    as far as I have understood, you need something like this:

    Code:
    bysort: Id_Bank: egen miss_loss = max(loss==.)
    Best,
    Raffaele

    Comment


    • #3
      Edit: I *think* I have found a solution, but it is extremely inelegant - any suggestions on how to improve it would be greatly appreciated.

      I have created a dummy for each first observation of every id_bank

      bysort Id_Bank: gen n_bank = _n == 1

      and I have created a dummy "count" for each missing value of "loss"

      gen count = 1 if loss == .
      replace count = 0 if count == .

      Then I have replaced each "1" in n_banks by the sum of "count" within that id_bank

      bysort Id_Bank: replace n_bank = sum(count)


      if this is right, this dummy should have a nonzero value for every bank that has "loss" = 1 at least once.

      So if I then create another dummy = 1 for every value of n_bank that is nonzero, then I should have the result I want:

      gen l_dummy = 1 if n_bank != 0

      Is this right? Is there a better way of doing this?

      Thank you,

      Best,

      Beatrice

      Comment


      • #4
        Originally posted by Raffaele Grotti View Post
        Hi Beatrice,
        as far as I have understood, you need something like this:

        Code:
        bysort: Id_Bank: egen miss_loss = max(loss==.)
        Best,
        Raffaele
        Hi Raffaele,

        Thank you for this, but I think the code

        Code:
        bysort: Id_Bank: egen miss_loss = max(loss==.)
        means that miss_loss = 1 for every year each Id_Bank with missing values for loss is observed, even if "loss" = . only for one of the x years the bank is observed:

        EG

        Id_B Year Loss Miss_Loss

        24 1905 . 1
        24 1906 13 1
        24 1907 . 1

        Whereas I would like:

        Id_B Year Loss Miss_Loss

        24 1905 . 1
        24 1906 13 0
        24 1907 . 0

        The last line being the important one - I would like the miss_loss variable not to count a missing value per Id_B more than once.

        Sorry if that was unclear and thanks for your help in any case!

        Best,

        Beatrice





        Comment


        • #5
          Code:
          bysort Id_Bank (loss) : gen miss_loss = missing(loss[_N])
          egen tag = tag(Id_Bank) 
          count if miss_loss & tag

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Code:
            bysort Id_Bank (loss) : gen miss_loss = missing(loss[_N])
            egen tag = tag(Id_Bank)
            count if miss_loss & tag
            Thank you very much! That worked.

            Best,

            Beatrice

            Comment

            Working...
            X