Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning date data from multiple appended datasets

    Hi,
    I have 14 Excel datasets in long format with laboratory data from 904 individuals at multiple time points. I am interested in looking at the date of testing but I am having difficulty formating all date variables the same. The appended stata dataset has 58 date variables each named by the type of test (eg UE) test number (eg T1) and ending in date (eg UET1date). I have tried to import, apend and sort the date data with the code below. I thought it would be easiest to parse the dates and reformat from there but I am getting an error, "too many variables specified r(103);"
    I would appreciate any suggestions on how to do this.
    Many thanks
    Caroline


    Code:
    cd "<file location>"
    save appended_data.dta, empty replaceforeach file of local f {
    import excel "<file location>'", sheet("Data") firstrow clear
    foreach date_var in *date {
    capture: tostring `date_var', replace 
    }
    append using appended_data.dta, force
    save appended_data.dta, replace
    }
    foreach date_var in *date {
    split `date_var', parse(/) //* breaks up the date variable using the / as the separator*/  
    replace `date_var'1="0"+`date_var'1 if strlen(`date_var'1)<2 /*adding leading 0 to the day*/
    replace `date_var'2="0"+`date_var'2 if strlen(`date_var'2)<2 /*adding leading 0 to the month*/
    gen `date_var'_fixed=`date_var'1 + `date_var'2 + `date_var'3
    replace `date_var'=`date_var'_fixed
    drop `date_var'_fixed `date_var'1 `date_var'2 `date_var'3
    }

  • #2
    Probably easiest to provide a dataex of a couple of the unformatted date variables.

    Code:
    dataex datevar1 - datevar4 in 1/20
    where you replace "datevar1" and "datevar4" with the names of your actual date variables.
    Last edited by Andrew Musau; 15 May 2023, 11:07.

    Comment


    • #3
      Thank you. I've included two different dataex samples. I've realised now a lot of the ID numbers were dropped for some of the Excel files as well. They are formattted the same way in Excel so I can't understand why this has happened.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str10(UET1date UET2date UET3date UET4date UET5date)
      "19919" "."     "."     "."     ""
      "20668" "19318" "20215" "18758" ""
      "19509" "."     "."     "."     ""
      "19877" "19905" "."     "."     ""
      "."     "."     "."     "."     ""
      "20199" "20527" "20692" "18906" ""
      "20429" "20069" "18662" "18658" ""
      "19264" "20199" "."     "."     ""
      "20096" "."     "."     "."     ""
      "19332" "19712" "20531" "18632" ""
      "."     "."     "."     "."     ""
      "18877" "18876" "18875" "."     ""
      "20179" "19341" "20670" "19141" ""
      "20165" "19470" "20194" "20683" ""
      "."     "."     "."     "."     ""
      "."     "."     "."     "."     ""
      "."     "."     "."     "."     ""
      "18897" "19096" "19855" "20040" ""
      "19505" "."     "."     "."     ""
      "."     "."     "."     "."     ""
      end
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str10(UET1date UET2date UET3date UET4date UET5date)
      "18780"     "19299"     "19407"     "19528"     "19758"    
      "18772"     "19159"     "19824"     "20404"     "20510"    
      "18592"     "18932"     "19291"     "19668"     "20275"    
      "18572"     "20475"     "."         "."         "."        
      "18907"     "18966"     "19050"     "19740"     "19775"    
      "18700"     "19285"     "19670"     "19998"     "18841"    
      "18458"     "18680"     "19036"     "19379"     "19463"    
      "18966"     "18945"     "."         "."         "."        
      "19096"     "19710"     "."         "."         "."        
      "16feb2016" "21jul2015" "15jul2015" "21apr2015" "20may2014"
      "09sep2016" "03dec2015" "10nov2014" "18aug2014" "16sep2013"
      "24jun2016" "25apr2016" "09mar2016" "30dec2015" "18jun2015"
      "19jun2015" "13feb2015" "28jul2014" "21apr2014" "24oct2013"
      "09mar2012" "27oct2010" ""          ""          ""         
      "10dec2010" ""          ""          ""          ""         
      "29mar2016" "24nov2015" "30jul2015" "12dec2014" "09may2014"
      "13oct2010" "14jan2011" "15apr2011" "13apr2011" "12apr2011"
      "15apr2016" ""          ""          ""          ""         
      "12oct2010" "15jan2013" ""          ""          ""         
      "03feb2016" "10jun2015" "10dec2014" "02dec2014" "27may2014"
      "08mar2016" "09feb2016" "02feb2016" "13oct2015" "25aug2015"
      end

      Comment


      • #4
        At a quick glance, there are four flavours here:

        "."
        ""
        "18780"
        "16feb2016"

        The first two are just missing whatever we do; the third is, we hope, a Stata daily date but as a string; the fourth is a DMY string.

        This script runs on your second example without reporting other flavours.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str10(UET1date UET2date UET3date UET4date UET5date)
        "18780"     "19299"     "19407"     "19528"     "19758"    
        "18772"     "19159"     "19824"     "20404"     "20510"    
        "18592"     "18932"     "19291"     "19668"     "20275"    
        "18572"     "20475"     "."         "."         "."        
        "18907"     "18966"     "19050"     "19740"     "19775"    
        "18700"     "19285"     "19670"     "19998"     "18841"    
        "18458"     "18680"     "19036"     "19379"     "19463"    
        "18966"     "18945"     "."         "."         "."        
        "19096"     "19710"     "."         "."         "."        
        "16feb2016" "21jul2015" "15jul2015" "21apr2015" "20may2014"
        "09sep2016" "03dec2015" "10nov2014" "18aug2014" "16sep2013"
        "24jun2016" "25apr2016" "09mar2016" "30dec2015" "18jun2015"
        "19jun2015" "13feb2015" "28jul2014" "21apr2014" "24oct2013"
        "09mar2012" "27oct2010" ""          ""          ""         
        "10dec2010" ""          ""          ""          ""         
        "29mar2016" "24nov2015" "30jul2015" "12dec2014" "09may2014"
        "13oct2010" "14jan2011" "15apr2011" "13apr2011" "12apr2011"
        "15apr2016" ""          ""          ""          ""         
        "12oct2010" "15jan2013" ""          ""          ""         
        "03feb2016" "10jun2015" "10dec2014" "02dec2014" "27may2014"
        "08mar2016" "09feb2016" "02feb2016" "13oct2015" "25aug2015"
        end
        
        quietly forval j = 1/5 { 
            gen UET`j' = real(UET`j'date)
            replace UET`j' = daily(UET`j'date, "DMY") if missing(UET`j')
            format UET`j' %td 
            
            count if  missing(UET`j') & UET`j'date != "." & UET`j'date != ""  
            if r(N) > 0 { 
                noisily display "problems with UET`j'date:" 
                noisily list UET`j'date if missing(UET`j') & UET`j'date != "." & UET`j'date != ""  
            } 
        }
        
        list UET? 
        
             +-----------------------------------------------------------+
             |      UET1        UET2        UET3        UET4        UET5 |
             |-----------------------------------------------------------|
          1. | 02jun2011   02nov2012   18feb2013   19jun2013   04feb2014 |
          2. | 25may2011   15jun2012   11apr2014   12nov2015   26feb2016 |
          3. | 26nov2010   01nov2011   25oct2012   06nov2013   06jul2015 |
          4. | 06nov2010   22jan2016           .           .           . |
          5. | 07oct2011   05dec2011   27feb2012   17jan2014   21feb2014 |
             |-----------------------------------------------------------|
          6. | 14mar2011   19oct2012   08nov2013   02oct2014   02aug2011 |
          7. | 15jul2010   22feb2011   13feb2012   21jan2013   15apr2013 |
          8. | 05dec2011   14nov2011           .           .           . |
          9. | 13apr2012   18dec2013           .           .           . |
         10. | 16feb2016   21jul2015   15jul2015   21apr2015   20may2014 |
             |-----------------------------------------------------------|
         11. | 09sep2016   03dec2015   10nov2014   18aug2014   16sep2013 |
         12. | 24jun2016   25apr2016   09mar2016   30dec2015   18jun2015 |
         13. | 19jun2015   13feb2015   28jul2014   21apr2014   24oct2013 |
         14. | 09mar2012   27oct2010           .           .           . |
         15. | 10dec2010           .           .           .           . |
             |-----------------------------------------------------------|
         16. | 29mar2016   24nov2015   30jul2015   12dec2014   09may2014 |
         17. | 13oct2010   14jan2011   15apr2011   13apr2011   12apr2011 |
         18. | 15apr2016           .           .           .           . |
         19. | 12oct2010   15jan2013           .           .           . |
         20. | 03feb2016   10jun2015   10dec2014   02dec2014   27may2014 |
             |-----------------------------------------------------------|
         21. | 08mar2016   09feb2016   02feb2016   13oct2015   25aug2015 |
             +-----------------------------------------------------------+

        Comment


        • #5
          Thank you so much Nick. Much appreciated.
          Caroline

          Comment

          Working...
          X