Hello,
I have a panel of banks, each with their unique ids, from the years 1905 to 1910. The panel is unbalanced, so some banks may only have observations for a few consecutive years between 1905 and 1910 (e.g. 1907-1909).
Each banks reports losses each year, but some values in the variable "loss" are missing. For example, for Bank A in year 1906 "loss" = 123, but for the same bank, "loss" = . in year 1907.
I want to count how many banks have missing values for "loss" - this would mean, for each unique id_bank, to count whether there is at least one value of "loss" = .
So far I have managed to create a dummy ("count") to count all the missing values of "loss", but this doesn't work since some banks may have missing values for more than one year.
I know how to create a variable that counts how many id_banks there are by counting either the first or the last occurrence of each id_bank, e.g.:
bysort: Id_Bank: gen n_bank = _n == 1
which counts each first occurrence of each Id_Bank, but I can't use this together with the dummy "count" to create a new dummy = 1 when a bank has a missing value for "loss" because it will disregards the observations where "loss" = . in an observation that is *not* the first one in each bank.
So my question is:
How do I create a dummy that accounts for missing values of "loss" only once for each id_bank?
Apologies if the question is less than clear, but I am not quite sure how else to express it. Thank you very much for any help with this!
Best wishes,
Beatrice
I have a panel of banks, each with their unique ids, from the years 1905 to 1910. The panel is unbalanced, so some banks may only have observations for a few consecutive years between 1905 and 1910 (e.g. 1907-1909).
Each banks reports losses each year, but some values in the variable "loss" are missing. For example, for Bank A in year 1906 "loss" = 123, but for the same bank, "loss" = . in year 1907.
I want to count how many banks have missing values for "loss" - this would mean, for each unique id_bank, to count whether there is at least one value of "loss" = .
So far I have managed to create a dummy ("count") to count all the missing values of "loss", but this doesn't work since some banks may have missing values for more than one year.
I know how to create a variable that counts how many id_banks there are by counting either the first or the last occurrence of each id_bank, e.g.:
bysort: Id_Bank: gen n_bank = _n == 1
which counts each first occurrence of each Id_Bank, but I can't use this together with the dummy "count" to create a new dummy = 1 when a bank has a missing value for "loss" because it will disregards the observations where "loss" = . in an observation that is *not* the first one in each bank.
So my question is:
How do I create a dummy that accounts for missing values of "loss" only once for each id_bank?
Apologies if the question is less than clear, but I am not quite sure how else to express it. Thank you very much for any help with this!
Best wishes,
Beatrice
Comment