Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to combine multiple line observations into 1 observation with multiple variables

    Hi everyone,

    I am working on my dissertation and need some help. I am working on a FDA Inspection dataset for clinical trials. There is one line per inspection (identified by inspection start/end date & FEI #, which uniquely identifies the firm and address) and FDA inspector. So If an inspection had 3 inspectors, the inspection would show up as 3 observations. I need help combining these inspections into one observation/line with one variable for each inspector.

    Here is an example of an inspection that has two lines for the same inspection, but with two different FDA Employee conducting the inspections, so it shows up twice.

    DISTRICT CENTER INSPNTYPE FORCAUSE INSPNSTART INSPNEND FDA483 CLASSIFICATION EMPLOYEEs FEI FIRMNAME FIRMADDRESS CITY ST ZIP COUNTRY INSPNLENGTH
    DET-DO CDER SPONSOR N 24aug2009 24aug2009 N NAI Olenjack, Dawn C 3004552864 Covance Central Laboratoriy Services Inc. 8211 Scicor Dr Indianapolis IN 46214-2942 United States 1
    DET-DO CDER SPONSOR N 24aug2009 24aug2009 N NAI Austin, Larry K 3004552864 Covance Central Laboratoriy Services Inc. 8211 Scicor Dr Indianapolis IN 46214-2942 United States 1


    Once I combine these, I have a separate database excel sheet from FDA with the observations for each. That I am planning to combine using the 1 to many based on FEI and inspection end date. These are few examples of the variables for the second data set (excel sheet). There. could be no observations, 1, or many based on a given inspection identified by FEI # and inspection end date. Is this the right approach.
    FEI INSPN DATE CFR/FDCA NUM SHORT DESC LONG DESC
    1036465 8/13/2010 21 CFR 812.100 Investigator lack of informed consent Informed consent was not obtained in accordance with the regulations regarding the protection of human subjects.
    1036465 8/13/2010 21 CFR 812.100 Investigator non-compliance with agreement/plan/regulations An investigation was not conducted according to the investigational plan.
    1038833 8/21/2008 21 CFR 812.150(b)(1) Sponsor evaluation rpt not timely, distributed Reports of the results of evaluation of unanticipated adverse device effects were not all submitted to FDA and all reviewing IRBs.
    1038833 8/21/2008 21 CFR 812.25(e) Sponsor's lack of written monitoring procedures There are no written procedures for monitoring an investigational device study.
    1038833 8/21/2008 21 CFR 812.40 Sponsors' general responsibilities For an investigational study, proper monitoring was not ensured.
    1038833 8/21/2008 21 CFR 812.43(c)(5) No financial disclosure info in investigator agreement A signed agreement was not obtained from each participating investigator that includes sufficient accurate financial disclosure information to allow the sponsor to submit a complete and accurate certification or disclosure statement and a commitment to promptly update financial disclosure information if any relevant changes occur during the investigation and for one year following completion of the study.


    Any help would be much appreciated in combining this data. Thank you so much in advance.

    Negin

  • #2
    Please use dataex in the future to present data examples (see FAQ Advice #12).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9(DISTRICT CENTER INSPNTYPE FORCAUSE INSPNSTART INSPNEND FDA483) str29 EMPLOYEEs double FEI
    "DET-DO" "CDER" "SPONSOR" "N" "24aug2009" "24aug2009" "NAI" "Olenjack, Dawn C" 3004552864
    "DET-DO" "CDER" "SPONSOR" "N" "24aug2009" "24aug2009" "NAI" "Austin, Larry K"  3004552864
    end
    
    bys DISTRICT- FDA483 (EMPLOYEEs): gen order=_n
    gen wanted= EMPLOYEEs if order==1
    qui sum order
    forval i=2/ `r(max)'{
       bys DISTRICT- FDA483: replace wanted= wanted[`i'-1]+ "; " + EMPLOYEEs[`i'] if order==`i'
    }
    bys DISTRICT- FDA483 (order): replace wanted= wanted[_N]
    Res.:

    Code:
    . l DISTRICT CENTER INSPNTYPE EMPLOYEEs wanted , notrim
    
           DISTRICT   CENTER   INSPNT~E          EMPLOYEEs                              wanted  
      1.     DET-DO     CDER    SPONSOR    Austin, Larry K   Austin, Larry K; Olenjack, Dawn C  
      2.     DET-DO     CDER    SPONSOR   Olenjack, Dawn C   Austin, Larry K; Olenjack, Dawn C

    Comment


    • #3
      Negin atri ,

      for the first part, see help for Stata's command reshape.
      https://www.stata.com/manuals13/dreshape.pdf
      In your case you reshape from long to wide.

      for the second part, at least in the lines shown, the FEI numbers are very different, so not clear if they are of the same format in each source file you work with. I don't see the need to get the date involved. I would be very surprised to find the same inspection number reused for a different inspection on a different date. So, my expectation is that the FEI is already unique and no need to combine it with the date, but your data may differ from this of course.

      Best, Sergiy

      Comment


      • #4
        Thank you so much Sergiy!

        Comment


        • #5
          Thank you Andrew!

          Comment


          • #6
            Andrew Musau Thank you so much! This did exactly what I was looking for. Is there a way to separate the wanted column (EMPLOYEES) into two separate column rather than it being in one column?

            Thanks,
            Negin

            Comment


            • #7
              Then what you needed in the first place was reshape as advised in #3. However, yes

              Code:
              split wanted, p(;) g(EMPLOYEE)
              drop wanted

              Comment


              • #8
                Hi, I tried to reshape a variable I generated named FDAEmployeeID (a unique ID for each inspector). However, it is not working. I keep getting different errors. Please help!

                I am using stata IC 16.1.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str6 DISTRICT long INSPECTIONID str16 PRODUCTTYPE str21 INSPNTYPE str1 FORCAUSE long INSPNSTART int INSPNEND str1 FDA483 str3 CLASSIFICATION float FDAEmployeeID str125 CombinedEMPOLYEEs
                "DAL-DO" 496556 "Human Drugs" "CLINICAL INVESTIGATOR" "Y" 17483 17549 "N" "NAI"  25 "Annes, Margaret M"                      
                "LOS-DO" 500210 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17519 17540 "Y" "VAI"  28 "Arcy, Gene D"                           
                "SEA-DO" 501868 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17545 17560 "Y" "VAI" 507 "Mattson, Astrida B"                     
                "LOS-DO" 502323 "Human Drugs" "CLINICAL INVESTIGATOR" "Y" 17517 17541 "N" "NAI" 803 "Van Leeuwen, Diane C"                   
                "PHI-DO" 502845 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17541 17547 "N" "NAI" 672 "Rashti, Mike M"                         
                "NYK-DO" 503230 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17547 17549 "N" "NAI" 304 "Hansen, Thomas P"                       
                "NYK-DO" 503235 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17553 17587 "Y" "OAI" 455 "Legaux, Alia"                           
                "LOS-DO" 503295 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17533 17541 "N" "NAI" 100 "Brullo, Raymond W; Kokiatkulkij, Jinnie"
                "LOS-DO" 503295 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17533 17541 "N" "NAI" 419 "Brullo, Raymond W; Kokiatkulkij, Jinnie"
                "DEN-DO" 503297 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17546 17549 "N" "VAI" 578 "Nadeau, Marion W"                       
                "DAL-DO" 503300 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17539 17540 "N" "NAI" 755 "Stone, Patrick D"                       
                "BLT-DO" 503425 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17534 17540 "N" "NAI"   9 "Agrawal, Atul; Shapley, Stephanie L"    
                "BLT-DO" 503425 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17534 17540 "N" "NAI" 716 "Agrawal, Atul; Shapley, Stephanie L"    
                "DAL-DO" 503499 "Biologics"   "CLINICAL INVESTIGATOR" "N" 17546 17548 "N" "NAI" 475 "Lorenz, Robert T"                       
                "CIN-DO" 503618 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17545 17549 "N" "NAI" 406 "Kilker, Stephen J"                      
                "CBER"   503624 "Biologics"   "CLINICAL INVESTIGATOR" "N" 17527 17534 "N" "NAI" 597 "Noe Varga, Michelle M"                  
                "CBER"   503625 "Biologics"   "CLINICAL INVESTIGATOR" "N" 17535 17541 "N" "NAI" 597 "Noe Varga, Michelle M"                  
                "LOS-DO" 503626 "Human Drugs" "CLINICAL INVESTIGATOR" "N" 17545 17555 "Y" "VAI" 648 "Pitkin, Alexandra B"                    
                "ATL-DO" 503865 "Human Drugs" "CLINICAL INVESTIGATOR" "Y" 17553 17553 "N" "VAI" 123 "Chapman, Myla D"                        
                "KAN-DO" 504104 "Biologics"   "CLINICAL INVESTIGATOR" "N" 17554 17554 "N" "NAI" 550 "Montgomery, Carl J"                     
                end
                format %td INSPNSTART
                format %td INSPNEND

                Comment


                • #9
                  I also tried using this code to combine FDAEmployeeID but I got an error as well.

                  sort INSPECTIONID

                  . bys INSPECTIONID (FDAEmployeeID): gen order=_n

                  . gen CombinedIDs = FDAEmployeeID if order==1
                  (1,211 missing values generated)

                  . qui sum order

                  . forval i=2/ `r(max)'{
                  2. bys INSPECTIONID: replace CombinedIDs = CombinedIDs[`i'-1]+ "; " + FDAEmployeeID[`i'] if order==`i'
                  3. }
                  type mismatch
                  r(109);

                  . bys INSPECTIONID (order): replace CombinedIDs = CombinedIDs[_N]
                  (1,038 real changes made, 1,038 to missing)

                  Comment


                  • #10
                    From the fragment you've posted:
                    Code:
                     float FDAEmployeeID
                    Yet you are writing:
                    Code:
                    ... "; " + FDAEmployeeID[`i'] ...
                    This will generate the above mentioned 'type mismatch" error 109. Perhaps use
                    Code:
                    string(FDAEmployeeID[`i'])

                    Comment


                    • #11
                      thanks Sergiy Radyakin!
                      FDAEmployeeID is float.

                      *******************
                      ----------------------------------------------------------------------------------------------------------------------
                      FDAEmployeeID group(EMPLOYEEs)
                      ----------------------------------------------------------------------------------------------------------------------

                      type: numeric (float)

                      range: [1,870] units: 1
                      unique values: 751 missing .: 0/8,792
                      ********************

                      Comment


                      • #12
                        Sergiy Radyakin I am still having trouble. I did want you suggested but get the following errors:

                        . forval i=2/`r(max)'{
                        2. bys INSPECTIONID: replace wanted= wanted[`i'] + "; " + string(FDAEmployeeID[`i'] if order==`i'
                        3. }
                        invalid syntax
                        r(198);

                        Comment


                        • #13
                          Sergiy Radyakin I also reformatted the variables to numeric long and still not working. I would like to combine these so that I can drop the duplicate. Reshape also doesnt work.

                          Thank you so much for your help. I am really stuck.

                          Negin

                          Comment


                          • #14
                            2. bys INSPECTIONID: replace wanted= wanted[`i'] + "; " + string(FDAEmployeeID[`i'] if order==`i'
                            Where is the closing parenthesis [the ")" character]?

                            Reshape also doesnt work.
                            Stata's command reshape does work.
                            It is either 'not helpful in this specific case', or 'the user failed to use it properly'.

                            Comment

                            Working...
                            X