Panel data with missing values

Emily Huang

Join Date: Nov 2020

Posts: 34
#1

Panel data with missing values

19 Nov 2020, 06:56

Hi everyone, I am wondering if anyone can suggest a method to analyze data in the following format:
The q1 and q2 were Yes/No text responses to questions, AdhPC is adherence to medication.
The outcome/dependent variable is continuous and the independent/predictor variables are binary (0/1)

I tried xtreg but got the error "insufficient observations" and "note: q2 omitted because of collinearity"

Code:

subject dateday q1 q2 AdhPC 2 08 Jul 15 0 1 . 2 15 Apr 15 1 0 . 2 19 May 15 1 1 . 2 12 Jun 15 0 0 . 2 27 May 15 . . .9333333333 2 23 Jul 15 . . .8965517241 2 24 Jun 15 . . 1.035714286 2 27 Apr 15 . . .9393939394

I think what's slipping me up is the missing values... i.e. that the values are not all aligned

Last edited by Emily Huang; 19 Nov 2020, 07:15.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

19 Nov 2020, 07:07

Emily.
no observations in the example you provided the list with will be part of e(sample) as they have at least one missing value.
That said, you should consider dealing with missing values first.

Kind regards,
Carlo
(Stata 19.0)
Comment
Emily Huang

Join Date: Nov 2020

Posts: 34
#3

19 Nov 2020, 07:30

I was thinking of using something like collapse... is there a way to align the values without losing time information?

Last edited by Emily Huang; 19 Nov 2020, 07:39.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#4

19 Nov 2020, 08:08

Emily:
-collapse- will not preserve the panel structure of your dataset, though.

Kind regards,
Carlo
(Stata 19.0)
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

19 Nov 2020, 08:17

Along with the advice from Carlo, I note that in your panel your four observations of AdhPC occur on different days than your four observations of q1 and q2. How would you propose to match observations of AdhPC to the observations with data for q1 and q2?
1 like
Comment
Max Coleman

Join Date: Mar 2017

Posts: 24
#6

19 Nov 2020, 08:21

It sounds like you just want to collapse each month into one entry rather two entries (e.g., by treating the May 19 and May 27 entries as the same), thereby ensuring that every row has complete data. If that's the case, it shouldn't be too hard, you'd just need to type something like:

Code:

sort dateday // I don't know how these dates will appear in Stata when sorted, but maybe you can try it out by typing "browse" after you do this command replace AdhPC = [_n+1]AdhPC if AdhPC==. // here, I'm assuming that "dateday" is sorted such that the entry with AdhPC is always right below the entry with q1 and q2

This code isn't quite right, but you can experiment with it (or someone else can correct me). The point is that you can simply "move up" the AdhPC data to complete the rows with missing data by copying the cells from subsequent lines.
Comment
Emily Huang

Join Date: Nov 2020

Posts: 34
#7

19 Nov 2020, 09:04

I think it's this:

Code:

replace AdhPC = AdhPC[_n+1] if AdhPC==.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

19 Nov 2020, 09:30

If you actually have monthly data that is recorded on different days in the month, then you could create a monthly date and collapse by the monthly date.

Code:

generate datemon = mofd(dateday) format %tm datemon collapse (firstnm) q1 q2 AdhPC, by(subject datemon)
Comment
Emily Huang

Join Date: Nov 2020

Posts: 34
#9

19 Nov 2020, 09:42

I tried #8, but I think I would lose some of the values in q1, q2 by collapsing? Last time I think I used (mean) to prevent this loss of values by averaging them

Last edited by Emily Huang; 19 Nov 2020, 09:47.
Comment
Emily Huang

Join Date: Nov 2020

Posts: 34
#10

19 Nov 2020, 11:05

Actually, using collapse (sum) works
Comment

Announcement

Panel data with missing values

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment