Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data with missing values

    Hi everyone, I am wondering if anyone can suggest a method to analyze data in the following format:
    The q1 and q2 were Yes/No text responses to questions, AdhPC is adherence to medication.
    The outcome/dependent variable is continuous and the independent/predictor variables are binary (0/1)

    I tried xtreg but got the error "insufficient observations" and "note: q2 omitted because of collinearity"

    Code:
    subject    dateday       q1  q2    AdhPC
    2       08 Jul 15         0   1        .
    2       15 Apr 15         1   0        .
    2       19 May 15         1   1        .
    2       12 Jun 15         0   0        .
    2       27 May 15     .      .      .9333333333
    2       23 Jul 15       .      .      .8965517241
    2       24 Jun 15      .      .    1.035714286
    2       27 Apr 15      .      .      .9393939394
    I think what's slipping me up is the missing values... i.e. that the values are not all aligned
    Last edited by Emily Huang; 19 Nov 2020, 07:15.

  • #2
    Emily.
    no observations in the example you provided the list with will be part of e(sample) as they have at least one missing value.
    That said, you should consider dealing with missing values first.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I was thinking of using something like collapse... is there a way to align the values without losing time information?
      Last edited by Emily Huang; 19 Nov 2020, 07:39.

      Comment


      • #4
        Emily:
        -collapse- will not preserve the panel structure of your dataset, though.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Along with the advice from Carlo, I note that in your panel your four observations of AdhPC occur on different days than your four observations of q1 and q2. How would you propose to match observations of AdhPC to the observations with data for q1 and q2?

          Comment


          • #6
            It sounds like you just want to collapse each month into one entry rather two entries (e.g., by treating the May 19 and May 27 entries as the same), thereby ensuring that every row has complete data. If that's the case, it shouldn't be too hard, you'd just need to type something like:

            Code:
            sort dateday // I don't know how these dates will appear in Stata when sorted, but maybe you can try it out by typing "browse" after you do this command
            replace AdhPC = [_n+1]AdhPC if AdhPC==. // here, I'm assuming that "dateday" is sorted such that the entry with AdhPC is always right below the entry with q1 and q2
            This code isn't quite right, but you can experiment with it (or someone else can correct me). The point is that you can simply "move up" the AdhPC data to complete the rows with missing data by copying the cells from subsequent lines.

            Comment


            • #7
              I think it's this:
              Code:
              replace AdhPC = AdhPC[_n+1] if AdhPC==.

              Comment


              • #8
                If you actually have monthly data that is recorded on different days in the month, then you could create a monthly date and collapse by the monthly date.
                Code:
                generate datemon = mofd(dateday)
                format %tm datemon
                collapse (firstnm) q1 q2 AdhPC, by(subject datemon)

                Comment


                • #9
                  I tried #8, but I think I would lose some of the values in q1, q2 by collapsing? Last time I think I used (mean) to prevent this loss of values by averaging them
                  Last edited by Emily Huang; 19 Nov 2020, 09:47.

                  Comment


                  • #10
                    Actually, using collapse (sum) works

                    Comment

                    Working...
                    X