Dummy for whether individuals in panel have at least one missing value for a variable

Beatrice Faleri

Join Date: Jan 2020

Posts: 7
#1

Dummy for whether individuals in panel have at least one missing value for a variable

16 Jan 2020, 07:32

Hello,

I have a panel of banks, each with their unique ids, from the years 1905 to 1910. The panel is unbalanced, so some banks may only have observations for a few consecutive years between 1905 and 1910 (e.g. 1907-1909).

Each banks reports losses each year, but some values in the variable "loss" are missing. For example, for Bank A in year 1906 "loss" = 123, but for the same bank, "loss" = . in year 1907.

I want to count how many banks have missing values for "loss" - this would mean, for each unique id_bank, to count whether there is at least one value of "loss" = .

So far I have managed to create a dummy ("count") to count all the missing values of "loss", but this doesn't work since some banks may have missing values for more than one year.

I know how to create a variable that counts how many id_banks there are by counting either the first or the last occurrence of each id_bank, e.g.:

bysort: Id_Bank: gen n_bank = _n == 1

which counts each first occurrence of each Id_Bank, but I can't use this together with the dummy "count" to create a new dummy = 1 when a bank has a missing value for "loss" because it will disregards the observations where "loss" = . in an observation that is *not* the first one in each bank.

So my question is:

How do I create a dummy that accounts for missing values of "loss" only once for each id_bank?

Apologies if the question is less than clear, but I am not quite sure how else to express it. Thank you very much for any help with this!

Best wishes,

Beatrice
Tags: None
Raffaele Grotti

Join Date: Feb 2015

Posts: 54
#2

16 Jan 2020, 07:50

Hi Beatrice,
as far as I have understood, you need something like this:

Code:

bysort: Id_Bank: egen miss_loss = max(loss==.)

Best,
Raffaele
Comment
Beatrice Faleri

Join Date: Jan 2020

Posts: 7
#3

16 Jan 2020, 07:56

Edit: I *think* I have found a solution, but it is extremely inelegant - any suggestions on how to improve it would be greatly appreciated.

I have created a dummy for each first observation of every id_bank

bysort Id_Bank: gen n_bank = _n == 1

and I have created a dummy "count" for each missing value of "loss"

gen count = 1 if loss == .
replace count = 0 if count == .

Then I have replaced each "1" in n_banks by the sum of "count" within that id_bank

bysort Id_Bank: replace n_bank = sum(count)

if this is right, this dummy should have a nonzero value for every bank that has "loss" = 1 at least once.

So if I then create another dummy = 1 for every value of n_bank that is nonzero, then I should have the result I want:

gen l_dummy = 1 if n_bank != 0

Is this right? Is there a better way of doing this?

Thank you,

Best,

Beatrice
Comment
Beatrice Faleri

Join Date: Jan 2020

Posts: 7
#4

16 Jan 2020, 08:47

Originally posted by Raffaele Grotti View Post

Hi Beatrice,
as far as I have understood, you need something like this:

Code:

bysort: Id_Bank: egen miss_loss = max(loss==.)

Best,
Raffaele

Hi Raffaele,

Thank you for this, but I think the code

Code:

bysort: Id_Bank: egen miss_loss = max(loss==.)

means that miss_loss = 1 for every year each Id_Bank with missing values for loss is observed, even if "loss" = . only for one of the x years the bank is observed:

EG

Id_B Year Loss Miss_Loss

24 1905 . 1
24 1906 13 1
24 1907 . 1

Whereas I would like:

Id_B Year Loss Miss_Loss

24 1905 . 1
24 1906 13 0
24 1907 . 0

The last line being the important one - I would like the miss_loss variable not to count a missing value per Id_B more than once.

Sorry if that was unclear and thanks for your help in any case!

Best,

Beatrice
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

16 Jan 2020, 09:30

Code:

bysort Id_Bank (loss) : gen miss_loss = missing(loss[_N])
egen tag = tag(Id_Bank) 
count if miss_loss & tag

Comment

Beatrice Faleri

Join Date: Jan 2020

Posts: 7
#6

17 Jan 2020, 02:30

Originally posted by Nick Cox View Post

Code:

bysort Id_Bank (loss) : gen miss_loss = missing(loss[_N]) egen tag = tag(Id_Bank) count if miss_loss & tag

Thank you very much! That worked.

Best,

Beatrice
Comment

Announcement

Dummy for whether individuals in panel have at least one missing value for a variable

Comment

Comment

Comment

Comment

Comment