Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing data in a longitudinal dataset - filling in the gaps

    Hi,

    I have the following categorical variable which has 3 categories denoting the kind of treatment a patient receives. As this is a longitudinal dataset, the treatment type may change over time.

    There is significant amount of missing data which can be replaced with already available data under certain conditions. There is where I need some help in tweaking my Stata syntax.

    Example of the categorical treatment variable, where 'n' indicates measurement occasion:

    Code:
    ID treat n
    1 injec 1
    1  .    2
    1 injec 3
    1 .     4
    1 injec 5
    2 diet  1
    2  .    2
    2 diet  3
    3 .     1
    3 .     2
    4 injec 1
    4  .    2
    4 injec 3
    4 diet  4
    4 diet  5
    I have two scenarios to deal with:

    1. ID 1 & ID 2: only have one treatment category assigned to them. I can safely assume that the missing treatment data can be filled-in with already existing information:

    Code:
    ID  treat n
    1 injec   1
    1 injec   2
    1 injec   3
    1 injec   4
    1 injec   5
    2 diet    1
    2 diet    2
    2 diet    3
    2. ID 4 changes from injections to diet, but the missing data for injection can be filled-in for occasion 2:

    Code:
    4 injec      1
    4 injec      2
    4 injec      3
    4 diet       4
    4 diet       5
    I'm trying to further develop the code below to address the above issues but with not much success. My code replaces all or most missing data for available individuals. How do I make it specific for each scenario above?

    Code:
    by id (n), sort: replace treat=treat[_n-1] if treat==.
    Thanks for any help in advance!

    Regards
    /Amal
    Last edited by Amal Khanolkar; 03 Oct 2017, 07:34.

  • #2
    So, if I understand your question correctly, what you want to do is replace the unspecified values of treat with whatever non-missing value of treat most recently precedes and follows that observation, if those are the same. If those are different, then treat is to be left unspecified.

    I have simplified your data a bit by using Stata's actual missing value for string variables, "", instead of your "." This makes the code considerably simpler.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id str5 treat byte n
    1 "injec" 1
    1 ""      2
    1 "injec" 3
    1 ""      4
    1 "injec" 5
    2 "diet"  1
    2 ""      2
    2 "diet"  3
    3 ""      1
    3 ""      2
    4 "injec" 1
    4 ""      2
    4 "injec" 3
    4 "diet"  4
    4 "diet"  5
    end
    
    sort id n
    by id (n): gen previous = cond(missing(treat), treat[_n-1], treat)
    gsort id -n
    by id: gen next = cond(missing(treat), treat[_n-1], treat)
    sort id n
    replace treat = previous if missing(treat) & previous == next
    In the future, please use the -dataex- command, as I have done here, to present example data. To install the command, run -ssc install dataex-. Then read -help dataex- to see the simple instructions for using it. When you use -dataex-, you enable those who want to help you to create a complete and faithful replica of your Stata example with a simple copy/paste operation. No other method of exhibiting Stata examples is as satisfactory.

    Comment


    • #3
      Hi Clyde

      Thanks for providing the above syntax. It didn't work entirely as I hoped. I want to reply with a specific example, but unfortunately we currently only have Stata stored on our university's highly secure server where I'm unable to manually download and install commands. Is there any better way to input the data?

      Thanks
      /Amal

      Comment


      • #4
        If you are not allowed to install -dataex-, then I guess the next best thing is to do what you did previously. It was somewhat inconvenient, but under the circumstances probably better than any other available alternative. You might want to ask your IT department if they would install -dataex-. They can look at the source code and they will easily see that it cannot possibly harm their system.

        Comment


        • #5
          Installation difficulties aside, two key aspects of Statalist are that questions can interest many people -- and that most questions have been asked before.

          To illustrate both, a search for mentions of stripolate (SSC) here will turn up several related threads (there are more, but that's a start).

          Here's stripolate in action on Clyde's coded example. There are other options than forward


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte id str5 treat byte n
          1 "injec" 1
          1 ""      2
          1 "injec" 3
          1 ""      4
          1 "injec" 5
          2 "diet"  1
          2 ""      2
          2 "diet"  3
          3 ""      1
          3 ""      2
          4 "injec" 1
          4 ""      2
          4 "injec" 3
          4 "diet"  4
          4 "diet"  5
          end
          
          stripolate treat n, gen(treat2) forward by(id)
          
          list, sepby(id)
          
          
               +-------------------------+
               | id   treat   n   treat2 |
               |-------------------------|
            1. |  1   injec   1    injec |
            2. |  1           2    injec |
            3. |  1   injec   3    injec |
            4. |  1           4    injec |
            5. |  1   injec   5    injec |
               |-------------------------|
            6. |  2    diet   1     diet |
            7. |  2           2     diet |
            8. |  2    diet   3     diet |
               |-------------------------|
            9. |  3           1          |
           10. |  3           2          |
               |-------------------------|
           11. |  4   injec   1    injec |
           12. |  4           2    injec |
           13. |  4   injec   3    injec |
           14. |  4    diet   4     diet |
           15. |  4    diet   5     diet |
               +-------------------------+
          Last edited by Nick Cox; 04 Oct 2017, 09:12.

          Comment


          • #6
            Hi Again

            I thought it might help with a screen grab of what happens to the variables when running the code below (treat22 is the same as the original treat variable to enable comparisons):

            Code:
            sort id n
            by id (n): gen prev = cond(missing(treat22), treat22[_n-1], treat22)
            gsort id -n
            by id: gen next = cond(missing(treat22), treat22[_n-1], treat22)
            sort id n
            replace treat22 = prev if missing(treat22) & prev==next
            Click image for larger version

Name:	treat 1.png
Views:	1
Size:	11.0 KB
ID:	1413269




            Click image for larger version

Name:	treat 2.png
Views:	1
Size:	23.8 KB
ID:	1413270



            I tried to modify the syntax below but it didn't help:

            Code:
            by id (n): gen prev = cond(missing(treat22), treat22[_n-1] | treat22[_n+1], treat22)


            Thanks
            /Amal

            Comment


            • #7
              At the risk of sounding like a broken record,

              1. The lack of a data example here is a severe constraint on experimentation. Even without dataex you can use a text editor (you do have at least one, namely Stata's doedit) to try to replicate the example output shown by Clyde in #2. Use copy and paste into the forum software. Screenshots are not nearly so much help.

              2. The error reports here "didn't help" "didn't work entirely as I hoped" "without much success" are frankly too vague to be any use. Some role reversal here may help you see this: if I tell you that my washing machine isn't working well, I am telling you very little. (Don't worry, it's fine.)

              FAQ Advice, especially #12, explains all of this!

              Which code did you try? What happened exactly? Show any error messages you got, If the code "worked" but results were not as desired, what was wrong?

              But if treat is a string variable, as Clyde and I have been assuming without contradiction from you, you can't feed its values to the logical or operator.

              "cat" | "dog" is a good question in everyday life but Stata can't make sense of "string" | "string".

              Other way round, what do you think happens with

              Code:
               
               treat22[_n-1] | treat22[_n+1]
              as even if treat22 is numeric, the result will be 1 or 0 and is unlikely to be what you want to interpolate.

              Comment


              • #8
                I agree with Nick that -stripolate- would be a good tool for the job (though I didn't know about it until seeing that post), I suppose that if Amal cannot install -dataex-, he won't be able to install -stripolate- either.

                My earlier code does not work as advertised. Here is a correction that does:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int id str14 treat byte n
                571 "diet+exer"   1
                571 ""            2
                571 ""            3
                571 ""            4
                571 "diet+exer"   5
                571 "diet+exer"   6
                571 "diet+exer"   7
                571 "diet+exer"   8
                571 "diet+exer"   9
                571 ""           10
                877 "oral hypo"   3
                877 "oral hypo"   4
                877 ""            5
                877 ""            6
                877 "oral hypo"   7
                879 "injections"  1
                879 ""            2
                879 "injections"  3
                879 "injections"  4
                end
                
                
                sort id n
                by id (n): gen previous = treat if _n == 1
                by id (n): replace previous = cond(missing(treat), previous[_n-1], treat)
                gsort id -n
                by id: gen next = treat if _n == 1
                by id: replace next = cond(missing(treat), next[_n-1], treat)
                sort id n
                replace treat = previous if missing(treat) & previous == next
                Amal, in my previous post I said that the next best alternative to -dataex- was what you had done earlier. What I neglected to mention is that of all the possible ways to show data, screenshots are the least helpful. Suppose you were in my shoes and needed to use this example data to test the code. How would you import that data to your Stata setup? In this case I was strongly motivated, so I did just type your data into the Data Editor. But in most situations I would not have taken the time and trouble to do that, and I think most of the people who respond here would not either. I'm sure you meant well: screenshots can show the data. But they don't make the data available. But please, don't ever use a screenshot to show data again.

                Now, a comment on your attempted fix to the code. I think you assumed that Stata would interpret it as something like "if treat22 is missing, set prev equal to either treat22[_n-1] or to treat22[_n+1]." But Stata does not speak English, it speaks Stata syntax. What you actually told Stata is this: "if treat22 is missing, look at treat22[_n-1] and look at treat22[_n+1]--if either of those is non-zero then use 1 as the result of this calculation, but if both are zero use 0 as the result." The underlying principle of Stata syntax that governs this is: logical operators (&, !, |) interpret their arguments as boolean expressions, i.e. expressions that evaluate to true (1) or false (0). When they are given a numerical expression as an argument, they re-interpret the numerical values according to the following rule: 0 is false and anything else (including missing value) is true.

                Added: Crossed with #7. I failed to recall in writing my final paragraph that treat is a string variable! So the application of the boolean operator | to it would just lead to a syntax error. Nevertheless, I probably guessed correctly what you intended by the code, and my comments about why that interpretation is incorrect still stands.

                Last edited by Clyde Schechter; 04 Oct 2017, 09:43.

                Comment


                • #9
                  Hi Both

                  Thanks for the comments and suggestions. Duly noted! Unfortunately, it can take up to a month to get a user written command downloaded and installed! But I will try to get -stripolate- installed. It seems to work wonder of sorts. But until then the longer method is my 'go to', which I will try in a short while.

                  Clyde - apologies. And yes I agree with your comments on the need to include an example dataset.

                  Will get back to let you know if it worked well.

                  Thanks!
                  /Amal

                  Comment

                  Working...
                  X