Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create new rows per id if observations are found in multiple variables

    Hi!
    I would like to create a distinct row for each observation in the diagnosis-variables (= one diagnosis) per id and date.

    My data looks like this:
    id date diagnosis1 diagnosis2 diagnosis3
    1 2023-01-01 A . .
    1 2023-01-02 B . .
    2 2023-02-03 F G .
    2 2023-03-03 A F C
    2 2023-03-04 A . .
    (The real dataset contains 21 diagnosis variables and >1000 unique id)

    I want it to turn out like this:
    id date newvar
    1 2023-01-01 A
    1 2023-01-02 B
    2 2023-02-03 F
    2 2023-02-03 G
    2 2023-03-03 A
    2 2023-03-03 F
    2 2023-03-03 C
    2 2023-03-04 A

    I would really appreciate your help.

    Thank you





  • #2
    It took me a few minutes massaging your data in a readable form before I could execute. see -dataex- and the forum rules before copy/pasting raw data. On a different note, I am not sure what diseases these are, but depending on the context, I was curious if you would need the earliest instance of each diagnosis by patient.



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id str14 date str2(diagnosis1 diagnosis2 diagnosis3)
    1 "2023-01-01" "A" "." "."
    1 "2023-01-02" "B" "." "."
    2 "2023-02-03" "F" "G" "."
    2 "2023-03-03" "A" "F" "C"
    2 "2023-03-04" "A" "." "."
    end
    This gets your what you want:
    Code:
    reshape long diagnosis@, i(id date) string
    drop _j
    drop if diagnosis == "."

    Comment


    • #3
      It does not affect Girish Venkataraman's strategy, which is bang on target, but as a detail note that Stata does not require or give any special interpretation to the string "." in string values. To Stata empty strings "" and missing string values s are one and the same.

      While I am adding nuance I want to underline that using dataex is a request, not a rule. I am as energetic as anyone else in reminding people of the requests we make when paying attention would turn a difficult or impossible question into an easier question. but our wording is different.

      Comment


      • #4
        Girish Venkataraman Thank you so much for you help and sorry for the inconvenience with my example data. As you probably understan am I new to Stata, I promise to try to use the correct format next time.

        Unfortunately the code doesn't work. I get this error message

        "variable id does not uniquely identify the observations
        Your data are currently wide. You are performing a reshape long. You specified I(id date) and j(_j). In the current wide
        form, variable id date should uniquely identify the observations."

        This is true as some individuals received multiple different diagnosis at the same date. In order to be a unique observation id and date isn't enough but diagnosis needs to be included as well. Is that possible?

        Thank you!

        Comment


        • #5
          Thank you Nick Cox for the clarification! And thank you for all your answers in this forum, it has helped me a lot being new to Stata.

          Comment


          • #6
            Originally posted by Hanna Larsson View Post
            Girish Venkataraman Thank you so much for you help and sorry for the inconvenience with my example data. As you probably understan am I new to Stata, I promise to try to use the correct format next time.

            Unfortunately the code doesn't work. I get this error message

            "variable id does not uniquely identify the observations
            Your data are currently wide. You are performing a reshape long. You specified I(id date) and j(_j). In the current wide
            form, variable id date should uniquely identify the observations."

            This is true as some individuals received multiple different diagnosis at the same date. In order to be a unique observation id and date isn't enough but diagnosis needs to be included as well. Is that possible?

            Thank you!
            Hmm...I am guessing that you have two rows with the same date in the same patient further down in your original data. Is this the case? The scope of my above code is limited to one patient having dates that are unique within that patient. It is hard to go further without a sample of the original data via -dataex- or your end goal with reshape.

            Comment


            • #7
              Girish Venkataraman Yes that is the case! Some patients have received multiple different diagnosis on the same date but it is registred on different rows (probably due to different physicians in the same healthcare facility but that information is unfortunately not included in my data). Maybe I can create a variable that tells if that's the case and include that as a factor in the reshape of the data?

              I'm unfortunately not allowed to share a sample of the original data (even if I remodel it) due to very strict rules at my workplace.

              Comment


              • #8
                Girish Venkataraman I tried what I suggested above and it worked. Once again, thank you. I wish you a happy weekend.

                Comment

                Working...
                X