Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a Variable constant with ID

    Hello Experts,

    I have a panel dataset with patients IDs and Time in months ( 25 months). Each patient have a unique condition. There is some inconsistency between patient id and the condition. For example, lets consider patient with ID 1. The patient had lets say Asthma. I have two problems here to solve:

    1- There is an error in the data in terms of conditions. Taking the above example, the data is showing that this patient had Asthma for 10 months and COPD in the other months, which is the error. The questions here: what should I do to replace the COPD to Asthma in the dataset for this patient and other patients with similar problem?

    2- The same problem, but instead of COPD I have missing values, how I can replace those missing values with Asthma for patient with ID 1 and do this for all other patients through out the dataset?


    Thank You

  • #2
    Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data. Explain the rules used to decide which condition is correct. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

    Please create example data in Stata that shows your patient ID 1. Use exactly the same representation for the conditions that you use in your actual data. Is the condition a string variable "COPD" or a numeric variable where 42 is the code for COPD and 3 is the code fo Asthma? Use the dataex command to format your example data for presentation in your reply.

    How do you distinguish the "correct" condition from the "incorrect" coindition(s) reported for the same patient? In your example, the patient has Asthma in 10 months and (apparently) COPD in the other 15 moths of the 25 months total of data for the patient. What makes Asthma correct and COPD incorrect?

    Comment


    • #3
      Thanks for the reply. Below is the example. Column one and two represent id and time respectively. Column 3 is the condition coded as 1 and 2. The correct condition is 1, how do I replace 2 by 1 (Note: there are 8 conditions coded from 1 to 8 and I gave this example just to simplify). Column 3 is the same with missing values instead of 2, how do I replace those missing values by 1? Please note that this varies between individuals across the dataset. For example, other patients might have condition 7 and 8 were 7 is the correct condition. How do I tell stata to correct those contradictions regardless of how I distinguish between correct and incorrect?
      Click image for larger version

Name:	Capture 2.JPG
Views:	1
Size:	20.8 KB
ID:	1613967

      Last edited by Mohammad Mansour; 09 Jun 2021, 11:42.

      Comment


      • #4
        You could use the mode within a person, so the value that occurs most often, like:

        Code:
        bysort ID: egen correct_value = mode(original)
        If there are only a few entries with errors this might work.
        Best wishes

        Stata 18.0 MP | ORCID | Google Scholar

        Comment


        • #5
          Thank you so much Felix. It worked and I highly appreciate your help.

          All the best

          Comment

          Working...
          X