Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with basic data management

    Dear all in Statalist,

    I need general codes for management of my data that can be used for other studies of the same type (we often have the same variable names in our studies). Those codes would be of great help to me and would save a lot of time and minimize the mistakes generated by manual data management in Excel.

    In our studies, we often compare two analyzers by analyzing blood samples with unique Id number in both analyzers. Those samples are analyzed in replicates (2 or 3). We then compare matched analyte measurement results generated by the analyzers. Prior to blood sample analysis, we analyze control blood to make sure that the analyzers work properly. The same controls are analyzed at the end of the day. We also run blanks (backgrounds) to clean/reset the analyzers.

    This is a sample of my data for one analyzer:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str15 Id int SEQ str15 Sampletype str19 DATE double(T3 MoK) int TyM double RaK
    ""                 11 "Background"      "2018-11-15T09:42:21"    0     .   4    0
    ""                 12 "Background"      "2018-11-15T09:43:37"    0     .   1    0
    ""                 13 "Background"      "2018-11-15T09:45:06"    0     .   1    0
    "2180772+"         14 "Normal 2180772+" "2018-11-15T09:47:27" 3.87    82 242  8.6
    "2180772+"         15 "Normal 2180772+" "2018-11-15T09:49:10" 3.92  81.9 240  8.4
    "2180772+"         16 "Normal 2180772+" "2018-11-15T09:50:11" 3.99  82.5 244  8.4
    "2180711+"         30 "Low 2180711+"    "2018-11-15T11:48:26" 2.18  68.5  92  3.6
    "2180711+"         31 "Low 2180711+"    "2018-11-15T11:50:12" 2.18  68.9  84  3.5
    "2180772+"         32 "Normal 2180772+" "2018-11-15T11:54:58"  3.9    81 243  8.2
    "2180772+"         33 "Normal 2180772+" "2018-11-15T12:00:16" 4.23  84.1 252  9.6
    "2180772+"         34 "Normal 2180772+" "2018-11-15T12:01:27" 3.95  81.7 244  8.3
    "2180713+"         35 "High 2180713+"   "2018-11-15T12:03:05" 4.91  92.8 548   21
    "2180713+"         36 "High 2180713+"   "2018-11-15T12:04:30" 4.92  92.8 550   21
    "Auto-background"  37 "Background"      "2018-11-15T14:13:38"    0     .   3    0
    "30261002"         38 "Blood"           "2018-11-15T14:33:07"  3.5  76.6 247    .
    "30261002"         39 "Blood"           "2018-11-15T14:36:17" 3.55  76.6 251  8.3
    "30261003"         40 "Blood"           "2018-11-15T15:20:57" 4.24    83 231  2.9
    "30261003"         41 "Blood"           "2018-11-15T15:25:47"  4.2    83 232  2.8
    "30261004"         42 "Blood"           "2018-11-15T15:30:28" 3.18  72.2 202  8.4
    "30261004"         43 "Blood"           "2018-11-15T15:34:04" 3.32    72 202  8.4
    "30261005"         44 "Blood"           "2018-11-15T15:41:02" 2.66 102.7  84  3.3
    "30261005"         45 "Blood"           "2018-11-15T15:43:39" 2.67 102.1  88  3.2
    "30261006"         46 "Blood"           "2018-11-15T15:51:19" 6.46  68.3 261 13.9
    "30261006"         47 "Blood"           "2018-11-15T15:53:27" 6.51  68.1 272 13.6
    "30261008"         48 "Blood"           "2018-11-15T16:01:14" 2.71  94.3  96  2.9
    "30261008"         49 "Blood"           "2018-11-15T16:05:11" 2.72  93.8  90  2.9
    "30261007"         50 "Blood"           "2018-11-15T16:08:53" 6.34  79.8 222  8.9
    "30261007"         51 "Blood"           "2018-11-15T16:10:27" 6.37  79.4 232  9.1
    "30261009"         52 "Blood"           "2018-11-15T16:22:28" 3.34  92.1 232  2.2
    "30261009"         53 "Blood"           "2018-11-15T16:24:05"  3.3  91.8 236  2.3
    "30261011"         54 "Blood"           "2018-11-15T16:26:02" 2.89  84.7  62 17.5
    "30261011"         55 "Blood"           "2018-11-15T16:27:36" 2.88  84.9  70 17.4
    "30261010"         56 "Blood"           "2018-11-15T16:29:57" 3.49    89 262   63
    "30261010"         57 "Blood"           "2018-11-15T16:31:28" 3.55  88.5 253 63.3
    ""                 58 "Background"      "2018-11-15T16:33:35"    0     .   1   .1
    "2180711+"         59 "Low 2180711+"    "2018-11-15T16:49:15" 2.17  68.1  91  3.5
    "2180711+"         60 "Low 2180711+"    "2018-11-15T16:50:32" 2.21  68.8  77  3.6
    "2180772+"         61 "Normal 2180772+" "2018-11-15T16:51:59" 3.99  81.3 232  8.3
    "2180772+"         62 "Normal 2180772+" "2018-11-15T16:53:07" 3.97  81.6 242  8.4
    "2180713+"         63 "High 2180713+"   "2018-11-15T16:54:45" 4.96  92.5 525 21.2
    "2180713+"         64 "High 2180713+"   "2018-11-15T16:55:46" 4.99  92.8 534 20.6
    "Auto-background"  65 "Background"      "2018-11-16T07:34:05"    0     .   1    0
    "2180711+"         66 "Low 2180711+"    "2018-11-16T10:30:20"  2.2  68.7  86  3.4
    "2180711+"         67 "Low 2180711+"    "2018-11-16T10:31:44" 2.19  69.1  85  3.3
    "2180772+"         68 "Normal 2180772+" "2018-11-16T10:36:42" 3.94  81.6 247  8.5
    "2180772+"         69 "Normal 2180772+" "2018-11-16T10:38:12" 3.96  81.8 241  8.3
    "2180713+"         70 "High 2180713+"   "2018-11-16T10:39:36" 4.89  92.5 541 20.7
    "2180713+"         71 "High 2180713+"   "2018-11-16T10:44:49" 4.93  92.4 528 20.5
    "665"              72 "Blood"           "2018-11-16T11:28:46" 4.15  79.2 165  3.8
    "665"              73 "Blood"           "2018-11-16T11:32:16" 4.09  79.3 170  3.9
    "666"              74 "Blood"           "2018-11-16T11:36:34" 4.18  89.8 203  4.1
    "666"              75 "Blood"           "2018-11-16T11:40:35" 4.16  89.5 192  4.1
    "667"              76 "Blood"           "2018-11-16T11:46:33" 4.91  82.3 216  5.6
    "667"              77 "Blood"           "2018-11-16T11:48:43" 4.94  82.6 215  5.8
    "668"              78 "Blood"           "2018-11-16T11:52:41" 5.58  87.6 259    8
    "668"              79 "Blood"           "2018-11-16T11:55:15" 5.59  87.6 246  8.1
    "664"              80 "Blood"           "2018-11-16T11:57:54" 4.16  91.7 237  4.8
    "664"              81 "Blood"           "2018-11-16T12:00:54" 4.14  91.8 262    5
    "30261012"         82 "Blood"           "2018-11-16T13:58:51" 3.03  95.4  51  2.4
    "30261012"         83 "Blood"           "2018-11-16T14:03:24" 2.97  94.5  44  2.6
    "30261013"         84 "Blood"           "2018-11-16T14:06:24" 3.89  90.9 315  2.5
    "30261013"         85 "Blood"           "2018-11-16T14:09:59"  3.9  90.5 308  2.5
    "30261014"         86 "Blood"           "2018-11-16T14:13:33" 4.78  86.2  85  3.9
    "30261014"         87 "Blood"           "2018-11-16T14:17:16" 4.79  86.1  80  3.9
    "30261015"         88 "Blood"           "2018-11-16T14:20:48" 2.82  83.5 125  8.6
    "30261015"         89 "Blood"           "2018-11-16T14:22:34" 2.87    83 129  8.9
    "30261016"         90 "Blood"           "2018-11-16T14:28:07" 4.13    90 248  2.6
    "30261016"         91 "Blood"           "2018-11-16T14:31:52" 4.18  89.9 258  2.5
    "30261017"         92 "Blood"           "2018-11-16T14:34:09"  2.6  72.4 403  8.4
    "30261017"         93 "Blood"           "2018-11-16T14:36:30" 2.55  72.5 408  8.5
    "30261018"         94 "Blood"           "2018-11-16T14:43:42" 2.85  85.5 189 13.3
    "30261018"         95 "Blood"           "2018-11-16T14:46:25" 2.93  85.8 177 13.3
    "30261019"         96 "Blood"           "2018-11-16T14:50:22" 3.72  87.8 134  .81
    "30261019"         97 "Blood"           "2018-11-16T14:54:44" 3.81  87.5 148   .9
    "30261020"         98 "Blood"           "2018-11-16T14:58:38" 2.95  95.1  44  2.8
    "30261020"         99 "Blood"           "2018-11-16T15:02:29" 2.88  94.7  37  2.9
    "2180711+"        100 "Low 2180711+"    "2018-11-16T15:25:31" 2.22  68.5  84  3.4
    "2180711+"        101 "Low 2180711+"    "2018-11-16T15:26:37" 2.21  69.1  85  3.5
    "2180772+"        102 "Normal 2180772+" "2018-11-16T15:27:37" 3.98  81.6 251  8.7
    "2180772+"        103 "Normal 2180772+" "2018-11-16T15:28:35" 3.98    82 240  8.4
    "2180713+"        104 "High 2180713+"   "2018-11-16T15:29:56" 4.99  92.7 537 20.6
    "2180713+"        105 "High 2180713+"   "2018-11-16T15:30:56" 5.01    93 545 20.6
    ""                106 "Background"      "2018-11-16T15:32:50"  .01     .   2    0
    "Auto-background" 107 "Background"      "2018-11-19T08:03:30"    0     .   1    0
    ""                108 "Background"      "2018-11-19T08:05:51"    0     .   1    0
    "2180711+"        109 "Low 2180711+"    "2018-11-19T09:56:21" 2.22  69.1  86  3.5
    "2180711+"        110 "Low 2180711+"    "2018-11-19T09:57:32" 2.19  69.4  81  3.6
    "2180772+"        111 "Normal 2180772+" "2018-11-19T09:58:36" 3.98  82.6 253  8.3
    "2180772+"        112 "Normal 2180772+" "2018-11-19T09:59:40" 3.93  82.6 242  8.5
    "2180713+"        113 "High 2180713+"   "2018-11-19T10:00:48" 4.95  94.1 543 21.1
    "2180713+"        114 "High 2180713+"   "2018-11-19T10:01:59" 4.94    94 546 21.2
    "30261021"        115 "Blood"           "2018-11-19T13:31:58" 3.44  85.5 673  8.1
    "30261021"        116 "Blood"           "2018-11-19T13:33:16" 3.37  85.5 676  7.8
    "30261022"        117 "Blood"           "2018-11-19T13:34:37" 2.87  96.9 123  1.4
    "30261022"        118 "Blood"           "2018-11-19T13:35:51" 2.83  96.5 118  1.6
    "30261023"        119 "Blood"           "2018-11-19T13:37:10" 4.09  75.6 656 28.6
    "30261023"        120 "Blood"           "2018-11-19T13:39:06" 4.17  75.4 665 28.4
    "30261024"        121 "Blood"           "2018-11-19T13:42:21" 2.87  82.9 154    9
    "30261024"        122 "Blood"           "2018-11-19T13:44:46" 2.92  83.7 153  8.9
    "30261025"        123 "Blood"           "2018-11-19T13:54:49" 4.56    73  38 18.8
    end
    Id is the unique code for each sample. SEQ is the sequence number. Sampletype is the type of the sample; "Blood" is the patient blood samples that we compare in our analyses, "Background" is the blanks analyzed to reset the analyzers, "Low 2180711+", "Normal 2180772+" and "High 2180713+" are three levels of the control blood analyzed before and after patient blood analyses.

    I need help with the following:
    1) Generating a new variable "SampleID" with running/sequential number for patient blood samples (Sampletype Blood) with the unique "Id". The replicates should have the same "SampleID" number. For example, Sample Id "30261002" should have the same "SampleID" (say 1) for both runs.
    2) Creating a new variable "Run" for the replicate runs of the samples (all types). For example, Sample Id "30261002" is analyzed twice (at "2018-11-15T14:33:07" and "2018-11-15T14:36:17") and should have "Run" 1 and 2 depending on the analyzing time (DATE).
    3) Creating a new variable "Day" with running number for the "DATE" variable. For example, 2018-11-15 is "Day" 1, 2018-11-16 is "Day" 2 etc.

    Here is an example of how the data should look like (only one day is presented):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int SampleID byte Run str15 Id int SEQ str15 Sampletype str19 DATE byte Day double(T3 MoK) int TyM double RaK
     . 1 "Auto-background"  65 "Background"      "2018-11-16T07:34:05" 2    0    .   1    0
     . 1 "2180711+"         66 "Low 2180711+"    "2018-11-16T10:30:20" 2  2.2 68.7  86  3.4
     . 2 "2180711+"         67 "Low 2180711+"    "2018-11-16T10:31:44" 2 2.19 69.1  85  3.3
     . 1 "2180772+"         68 "Normal 2180772+" "2018-11-16T10:36:42" 2 3.94 81.6 247  8.5
     . 2 "2180772+"         69 "Normal 2180772+" "2018-11-16T10:38:12" 2 3.96 81.8 241  8.3
     . 1 "2180713+"         70 "High 2180713+"   "2018-11-16T10:39:36" 2 4.89 92.5 541 20.7
     . 2 "2180713+"         71 "High 2180713+"   "2018-11-16T10:44:49" 2 4.93 92.4 528 20.5
    12 1 "665"              72 "Blood"           "2018-11-16T11:28:46" 2 4.15 79.2 165  3.8
    12 2 "665"              73 "Blood"           "2018-11-16T11:32:16" 2 4.09 79.3 170  3.9
    13 1 "666"              74 "Blood"           "2018-11-16T11:36:34" 2 4.18 89.8 203  4.1
    13 2 "666"              75 "Blood"           "2018-11-16T11:40:35" 2 4.16 89.5 192  4.1
    14 1 "667"              76 "Blood"           "2018-11-16T11:46:33" 2 4.91 82.3 216  5.6
    14 2 "667"              77 "Blood"           "2018-11-16T11:48:43" 2 4.94 82.6 215  5.8
    15 1 "668"              78 "Blood"           "2018-11-16T11:52:41" 2 5.58 87.6 259    8
    15 2 "668"              79 "Blood"           "2018-11-16T11:55:15" 2 5.59 87.6 246  8.1
    11 1 "664"              80 "Blood"           "2018-11-16T11:57:54" 2 4.16 91.7 237  4.8
    11 2 "664"              81 "Blood"           "2018-11-16T12:00:54" 2 4.14 91.8 262    5
    16 1 "30261012"         82 "Blood"           "2018-11-16T13:58:51" 2 3.03 95.4  51  2.4
    16 2 "30261012"         83 "Blood"           "2018-11-16T14:03:24" 2 2.97 94.5  44  2.6
    17 1 "30261013"         84 "Blood"           "2018-11-16T14:06:24" 2 3.89 90.9 315  2.5
    17 2 "30261013"         85 "Blood"           "2018-11-16T14:09:59" 2  3.9 90.5 308  2.5
    18 1 "30261014"         86 "Blood"           "2018-11-16T14:13:33" 2 4.78 86.2  85  3.9
    18 2 "30261014"         87 "Blood"           "2018-11-16T14:17:16" 2 4.79 86.1  80  3.9
    19 1 "30261015"         88 "Blood"           "2018-11-16T14:20:48" 2 2.82 83.5 125  8.6
    19 2 "30261015"         89 "Blood"           "2018-11-16T14:22:34" 2 2.87   83 129  8.9
    20 1 "30261016"         90 "Blood"           "2018-11-16T14:28:07" 2 4.13   90 248  2.6
    20 2 "30261016"         91 "Blood"           "2018-11-16T14:31:52" 2 4.18 89.9 258  2.5
    21 1 "30261017"         92 "Blood"           "2018-11-16T14:34:09" 2  2.6 72.4 403  8.4
    21 2 "30261017"         93 "Blood"           "2018-11-16T14:36:30" 2 2.55 72.5 408  8.5
    22 1 "30261018"         94 "Blood"           "2018-11-16T14:43:42" 2 2.85 85.5 189 13.3
    22 2 "30261018"         95 "Blood"           "2018-11-16T14:46:25" 2 2.93 85.8 177 13.3
    23 1 "30261019"         96 "Blood"           "2018-11-16T14:50:22" 2 3.72 87.8 134  .81
    23 2 "30261019"         97 "Blood"           "2018-11-16T14:54:44" 2 3.81 87.5 148   .9
    24 1 "30261020"         98 "Blood"           "2018-11-16T14:58:38" 2 2.95 95.1  44  2.8
    24 2 "30261020"         99 "Blood"           "2018-11-16T15:02:29" 2 2.88 94.7  37  2.9
     . 3 "2180711+"        100 "Low 2180711+"    "2018-11-16T15:25:31" 2 2.22 68.5  84  3.4
     . 4 "2180711+"        101 "Low 2180711+"    "2018-11-16T15:26:37" 2 2.21 69.1  85  3.5
     . 3 "2180772+"        102 "Normal 2180772+" "2018-11-16T15:27:37" 2 3.98 81.6 251  8.7
     . 4 "2180772+"        103 "Normal 2180772+" "2018-11-16T15:28:35" 2 3.98   82 240  8.4
     . 3 "2180713+"        104 "High 2180713+"   "2018-11-16T15:29:56" 2 4.99 92.7 537 20.6
     . 4 "2180713+"        105 "High 2180713+"   "2018-11-16T15:30:56" 2 5.01   93 545 20.6
     . 2 ""                106 "Background"      "2018-11-16T15:32:50" 2  .01    .   2    0
    end
    Please let me know if I am not clear enough.
    Thank you in advance.

  • #2
    Thanks for the data example. You need to convert your string date variable to a Stata date variable first. The rest is just implementing standard Stata commands.


    Code:
    gen datetime= clock(subinstr(DATE, "T"," ", .), "YMDhms")
    format datetime %tc
    sort Id datetime
    egen SID= group(Id) if Sampletype=="Blood"
    bys SID (datetime): gen run= _n if Sampletype=="Blood"
    bys SID (datetime): gen day = cofd(dofc(datetime))- cofd(dofc(datetime[1])) + 1 if Sampletype=="Blood"
    Result:

    Code:
    .
    
    . list datetime SampleID SID Run run day, sepby(SID)
    
         +-------------------------------------------------------+
         |           datetime   SampleID   SID   Run   run   day |
         |-------------------------------------------------------|
      1. | 16nov2018 13:58:51         16     1     1     1     1 |
      2. | 16nov2018 14:03:13         16     1     2     2     1 |
         |-------------------------------------------------------|
      3. | 16nov2018 14:05:24         17     2     1     1     1 |
      4. | 16nov2018 14:09:47         17     2     2     2     1 |
         |-------------------------------------------------------|
      5. | 16nov2018 14:14:09         18     3     1     1     1 |
      6. | 16nov2018 14:16:20         18     3     2     2     1 |
         |-------------------------------------------------------|
      7. | 16nov2018 14:20:42         19     4     1     1     1 |
      8. | 16nov2018 14:22:53         19     4     2     2     1 |
         |-------------------------------------------------------|
      9. | 16nov2018 14:27:15         20     5     1     1     1 |
     10. | 16nov2018 14:31:37         20     5     2     2     1 |
         |-------------------------------------------------------|
     11. | 16nov2018 14:33:48         21     6     1     1     1 |
     12. | 16nov2018 14:35:59         21     6     2     2     1 |
         |-------------------------------------------------------|
     13. | 16nov2018 14:44:44         22     7     1     1     1 |
     14. | 16nov2018 14:46:55         22     7     2     2     1 |
         |-------------------------------------------------------|
     15. | 16nov2018 14:51:17         23     8     1     1     1 |
     16. | 16nov2018 14:55:39         23     8     2     2     1 |
         |-------------------------------------------------------|
     17. | 16nov2018 14:57:50         24     9     1     1     1 |
     18. | 16nov2018 15:02:12         24     9     2     2     1 |
         |-------------------------------------------------------|
     19. | 16nov2018 11:58:42         11    10     1     1     1 |
     20. | 16nov2018 12:00:53         11    10     2     2     1 |
         |-------------------------------------------------------|
     21. | 16nov2018 11:28:07         12    11     1     1     1 |
     22. | 16nov2018 11:32:29         12    11     2     2     1 |
         |-------------------------------------------------------|
     23. | 16nov2018 11:36:51         13    12     1     1     1 |
     24. | 16nov2018 11:41:14         13    12     2     2     1 |
         |-------------------------------------------------------|
     25. | 16nov2018 11:45:36         14    13     1     1     1 |
     26. | 16nov2018 11:47:47         14    13     2     2     1 |
         |-------------------------------------------------------|
     27. | 16nov2018 11:52:09         15    14     1     1     1 |
     28. | 16nov2018 11:54:20         15    14     2     2     1 |
         |-------------------------------------------------------|
     29. | 16nov2018 07:34:22          .     .     1     .     . |
     30. | 16nov2018 10:31:19          .     .     2     .     . |
     31. | 16nov2018 10:31:19          .     .     1     .     . |
     32. | 16nov2018 10:35:41          .     .     1     .     . |
     33. | 16nov2018 10:37:53          .     .     2     .     . |
     34. | 16nov2018 10:40:04          .     .     1     .     . |
     35. | 16nov2018 10:44:26          .     .     2     .     . |
     36. | 16nov2018 15:26:14          .     .     4     .     . |
     37. | 16nov2018 15:26:14          .     .     3     .     . |
     38. | 16nov2018 15:28:25          .     .     4     .     . |
     39. | 16nov2018 15:28:25          .     .     3     .     . |
     40. | 16nov2018 15:30:36          .     .     3     .     . |
     41. | 16nov2018 15:30:36          .     .     4     .     . |
     42. | 16nov2018 15:32:47          .     .     2     .     . |
         +-------------------------------------------------------+
    Last edited by Andrew Musau; 09 Apr 2019, 06:49.

    Comment


    • #3
      24 hours in milliseconds is

      24 (hours) \(\times\) 60 (minutes) \(\times\) 60 (seconds) \(\times\) 1000 (milliseconds)

      Code:
      . di 24*60*60*1000
      86400000
      Therefore, to get days since first occurrence, we need to divide the difference in the last line of the code in #2 by this value.

      Code:
      bys SID (datetime): gen day = ((cofd(dofc(datetime))- cofd(dofc(datetime[1])))/86400000) + 1 if Sampletype=="Blood"

      Comment


      • #4
        #2 is excellent except that datetimes should always be doubles! See help datetime.

        Comment


        • #5
          Thank you Andrew. Your code worked perfectly for the SID and Run. However, the "Day" variable is "1" for all dates (15, 16, and 19/11) using the code in #2 and #3. Am I doing something wrong?

          Thank you Nick. I tried the following code (is the code right?) and it worked as well as the one by Andrew.
          Code:
          gen double datetime = clock(subinstr(DATE, "T"," ", .), "YMDhms")
          format %tc datetime
          gen date = dofc(datetime) 
          format %td date
          How Can I get the Run and Day numbers for the other types of samples at the same time as for the "Blood" group? I cannot figure out how to combine "by(group)" and "if" to make it work.

          Thank you in advance.

          Comment


          • #6
            Isn't day specific to a particular patient? If you just want to designate 15 Nov. 2018 - day 1, 16 Nov. - day 2, and so on, you have

            Code:
            . di %td 21503
            15nov2018
            implying that the problem is one of simple arithmetic

            Code:
            gen date = dofc(datetime)
            *15 Nov. 2018= 1
            gen day= date-21502
            How Can I get the Run and Day numbers for the other types of samples at the same time as for the "Blood" group? I cannot figure out how to combine "by(group)" and "if" to make it work.
            You need to be specific like in # 1, otherwise I cannot follow what is required.

            Comment


            • #7
              Thank you Andrew. I get the right day number for the 15th and 16th November. 19th November got day number 5. I want that day to have day number 3 of the study. Can you help me with that? I am a new user of Stata and it takes me ages to figure out codes. Every single tip is appreciated.

              Regarding the second request, your code helps me to create runs for the "Blood" sample type only. I also want the create run numbers for the other sample types (Background, Low ..., Normal..., High...). See request "2)" in #1.

              Best

              Comment


              • #8
                Just for clarification, SID is needed for the "Blood" sample type only. Run and Day is needed for all sample types.

                Comment


                • #9
                  Thank you Andrew. I get the right day number for the 15th and 16th November. 19th November got day number 5. I want that day to have day number 3 of the study. Can you help me with that? I am a new user of Stata and it takes me ages to figure out codes. Every single tip is appreciated.
                  After you create the date variable

                  Code:
                  egen Date= group(date)
                  will give you a continuous ordering.

                  Regarding the second request, your code helps me to create runs for the "Blood" sample type only. I also want the create run numbers for the other sample types (Background, Low ..., Normal..., High...). See request "2)" in #1.
                  Code:
                  local vars ""background" "low" "normal" "high""
                  foreach var of local vars{
                  bys Id (datetime): gen run`var'= _n if strpos(lower(Sampletype), "`var'")
                  }

                  Comment


                  • #10
                    Thank you Andrew and sorry for the delayed response. I wanted to find a way to modify your codes to better fit my need but did not succeed. I want the loop in your last code to repeat the _n for each day as well. If we take the "Background" sample type for instance, it was analyzed 5 times in Day 1 and twice in Day 2 and 3, respectively. So the "Run" numbers should be 1-5, 1-2, and 1-2, respectively, for the 3 days I posted in the dataex. The same thing should be for the other sample types (except the "Blood" since you already fixed it). I appreciate your help a lot.

                    Comment


                    • #11
                      One issue that you have is that "Id" is not constant if sample type is "Background". However, from your counts, it appears that you consider it to be constant.

                      Code:
                      . list Id Sampletype if strpos(lower(Sampletype), "background"), clean
                      
                                          Id   Sampletype  
                        1.                     Background  
                        2.   Auto-background   Background  
                        3.                     Background  
                        4.                     Background  
                        5.                     Background  
                       42.                     Background  
                       43.   Auto-background   Background  
                       84.   Auto-background   Background  
                       85.                     Background
                      You can start with, for example,

                      Code:
                      replace Id= "Auto-background" if strpos(lower(Sampletype), "background")
                      Then, you just need to sort by Id and Date

                      Code:
                      foreach var of local vars{
                      bys Id Date (datetime): gen run`var'= _n if strpos(lower(Sampletype), "`var'")
                      }

                      Code:
                      Result:
                      
                      . list Id Sampletype datetime Date runbackground if !missing(runbackground), sep
                      > by(Date)
                      
                           +---------------------------------------------------------------------+
                           |              Id   Sampletype             datetime   Date   runbac~d |
                           |---------------------------------------------------------------------|
                       92. | Auto-background   Background   15nov2018 09:42:21      1          1 |
                       93. | Auto-background   Background   15nov2018 09:43:37      1          2 |
                       94. | Auto-background   Background   15nov2018 09:45:06      1          3 |
                       95. | Auto-background   Background   15nov2018 14:13:38      1          4 |
                       96. | Auto-background   Background   15nov2018 16:33:35      1          5 |
                           |---------------------------------------------------------------------|
                       97. | Auto-background   Background   16nov2018 07:34:05      2          1 |
                       98. | Auto-background   Background   16nov2018 15:32:50      2          2 |
                           |---------------------------------------------------------------------|
                       99. | Auto-background   Background   19nov2018 08:03:30      3          1 |
                      100. | Auto-background   Background   19nov2018 08:05:51      3          2 |
                           +---------------------------------------------------------------------+

                      Comment


                      • #12
                        Thank you Andrew. You saved me a lot of time with data management for all my studies. Thank you again for your help.

                        Comment

                        Working...
                        X