Deleting observations in panel data.

Thushani Guruge

Join Date: Jan 2022

Posts: 7
#16

24 Feb 2022, 20:28

Hi there,

I have a dataset of patients with their medication history. Below, listed a sample dataset of 1084200 observations in my dataset. I have 20+ variables, but below listed are only four variables.

"medicine_code" defines the type of medicine and depends on the strength (ex: X medicine with 20 mg, 40mg strength); there may be several codes for one medicine. However, I'm only interested in a couple of "medicine_codes" (actually, there are 190 "medicine_codes" that I'm interested in), and I want to keep only the patients who have been on those interested medicines and drop the rest of the observations.
ex: if medicine_code=="61332"|medicine_code=="71293"

Highly appreciate if you can help me out to find a suitable command to get a subset of data fulfilling my condition.

Thank you in advance!
Thushani

input str11 Patient_id str1 gender str7 year_month_birth long medicine_code
"1" "M" "AUG1955" 73655
"1" "M" "AUG1955" 61332
"1" "M" "AUG1955" 71293
"1" "M" "AUG1955" 71586
"2" "M" "OCT1920" 61332
"2" "M" "OCT1920" 73655
"2" "M" "OCT1920" 73655
"3" "M" "DEC1928" 61332
"3" "M" "DEC1928" 66400
"3" "M" "DEC1928" 73422
"3" "M" "DEC1928" 67117
"4" "F" "JAN1929" 71293
"4" "F" "JAN1929" 73518
"4" "F" "JAN1929" 73787
"4" "F" "JAN1929" 73605
"5" "M" "MAY1956" 61839
"6" "M" "AUG1930" 72661
"6" "M" "AUG1930" 61332
"6" "M" "AUG1930" 61333
"6" "M" "AUG1930" 61335
"7" "F" "FEB1951" 71293
"7" "F" "FEB1951" 68544
"8" "M" "JUN1938" 71293
"8" "M" "JUN1938" 73033
"8" "M" "JUN1938" 61332
"8" "M" "JUN1938" 73775

Last edited by Thushani Guruge; 24 Feb 2022, 20:31.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30063

#17

24 Feb 2022, 22:18

I'm not certain I've interpreted your question correctly. I understand that you want to keep all the observations of a given patient_id if any of those observations contains a medicine code that is in your list of 190 codes of interest, and drop all observations of a given patient_id if none of those observations contains a medicine code of interest.

With 190 codes that you are interested in, it is not practical to code this with -if- conditions. You need to create a Stata data set that contains medicine_code as the only variable and is a list of the 190 codes of interest, and only those. Then you can use -merge- to find the observations that have those codes. It will look like this:

Code:

clear
input str11 Patient_id str1 gender str7 year_month_birth long medicine_code
"1" "M" "AUG1955" 73655
"1" "M" "AUG1955" 61332
"1" "M" "AUG1955" 71293
"1" "M" "AUG1955" 71586
"2" "M" "OCT1920" 61332
"2" "M" "OCT1920" 73655
"2" "M" "OCT1920" 73655
"3" "M" "DEC1928" 61332
"3" "M" "DEC1928" 66400
"3" "M" "DEC1928" 73422
"3" "M" "DEC1928" 67117
"4" "F" "JAN1929" 71293
"4" "F" "JAN1929" 73518
"4" "F" "JAN1929" 73787
"4" "F" "JAN1929" 73605
"5" "M" "MAY1956" 61839
"6" "M" "AUG1930" 72661
"6" "M" "AUG1930" 61332
"6" "M" "AUG1930" 61333
"6" "M" "AUG1930" 61335
"7" "F" "FEB1951" 71293
"7" "F" "FEB1951" 68544
"8" "M" "JUN1938" 71293
"8" "M" "JUN1938" 73033
"8" "M" "JUN1938" 61332
"8" "M" "JUN1938" 73775
end
tempfile patient_data
save `patient_data'

clear
input long medicine_code
61332
71293
end
tempfile codes_of_interest
save `codes_of_interest'

use `patient_data', clear
merge m:1 medicine_code using `codes_of_interest', keep(master match)
by Patient_id (_merge), sort: keep if _merge[_N] == 3

Comment

Thushani Guruge

Join Date: Jan 2022

Posts: 7
#18

24 Apr 2022, 18:23

Hi Clyde,

Thank you so much for your instructions! It really worked out and I was able to get my results!
Comment
Shahla Akram

Join Date: May 2018

Posts: 24
#19

21 Jul 2022, 10:12

ok

Last edited by Shahla Akram; 21 Jul 2022, 10:42.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#20

21 Jul 2022, 10:14

The question posed in #19 is wildly off topic in this thread. Please repost as a New Topic. Also, to enhance your probability of getting a specific and helpful response to your question, I recommend that you show example data using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Pieter Bonni

Join Date: Aug 2022

Posts: 2
#21

09 Aug 2022, 05:39

Hi there,
for my Masterthesis I study panel data from Belgium firms, but I asked myself how I can easily remove my missing values. My firms have data over five years and I want to delete the whole firm if a value of them is noted as n.a.( this was exported from excel). e.g. you can observe that the row 17 contains a lot of n.b. (missing values) the other 3 observations for company 42517 are then all "bad" how can I delete within Stata these firms, note that I also have created a firm specific number which isn't showed on the screenshot (maybe this can be a start for the solution)
Thanks in advance

Last edited by Pieter Bonni; 09 Aug 2022, 06:12.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#22

09 Aug 2022, 09:00

Your post is confusing. You refer to removing firms when data are "n.a." But your screenshot shows no such values. There are some that are "n.b." Is that what you mean? And do you really mean to delete an entire firm's data if just a single value of one variable is "n.a." I'm not saying that's unreasonable to do, but just trying to clarify here exactly what is wanted, as in what is shown typically a whole range of variables contains n.a. in that observation, not just one.

In any case, before responding to my questions, please read the Forum FAQ, with emphasis on #12, for excellent advice on how to best show example data. Note, in particular, that screenshots are not helpful. It is impossible to tell what data storage types you have, and that affects the code needed. Also, your screenshot doesn't even show the full names of the variables. Finally, screenshots do not allow importation of the data into Stata to test code. The helpful way to show example data is with the -dataex- command. Please use that in your response, and in all posts going forward. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Pieter Bonni

Join Date: Aug 2022

Posts: 2
#23

09 Aug 2022, 13:08

Thank you mr.Schechter to provide me with an answer. I take knowledge of the -dataex- command
the n.b. are indeed these n.a. what I ment in the beginning of my post. I want to exclude these firms out of my dataset because later on I will use the propensity matching method and I really need al the data of al the 6 years that where observed in the original database from my university. My goal is to exclude the firm EURELEC with has the specific corporate_ID 3 and I want to remain with firms who have no "n.b." on the variabelen ebitda, wagecost, dividends, bonus, totalassets, hours.
Thanks in advance
my code:
clear
Input Corporate_ID Name ebitda wagecost dividends bonus totassets hours
"3" "EURELEC" "108" "299" "0" "0" "955" "280"
"3" "EURELEC" "106" "288" "0" "0" "953" "250"
"3" "EURELEC" "102" "250" "4" "10" "960" "260"
"3" "EURELEC" "n.b." "n.b" "n.b." "n.b" "n.b." "n.b."
"3" "EURELEC" "n.b." "n.b" "n.b." "n.b" "n.b." "n.b."
"3" "EURELEC" "100" "253" "0" "1" "894" " "290"
"4" "FORTIS" "108" "299" "0" "0" "955" "280"
"4" "FORTIS" "106" "288" "0" "0" "953" "250"
"4" "FORTIS" "102" "250" "4" "10" "960" "260"
"4" "FORTIS" "400" "256" "6" "3" "705" "345"
"4" "FORTIS" "300" "240" "10" "4" "604" "524."
"4" "FORTIS" "100" "253" "0" "1" "894" " "290"
End
tempfile employees_data
save "employees_data"
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment