Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First occurrence

    Hi all,

    I have a dataset with the launch_date (data_lancio) for each molecule, which is repeated over and over the sample. What I would like to do is to create a variable called "age_molecule" and to do that I need to see which is the first occurrence of the variable molecule with respect to the launch date. In other words, my dataset is made like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int data_lancio str60 molecule
    1924 "ALOE BARBADENSIS"                 
    1929 "PURGATIVE/LAXATIVE"               
    1929 "ACETYLSALICYLIC ACID"             
    1929 "DEXTROMETHORPHAN"                 
    1929 "HYDROGEN PEROXIDE UREA DERIVATIVE"
    1929 "DIPHENHYDRAMINE"                  
    1929 "DIPHENHYDRAMINE"                  
    1929 "MECLOZINE"                        
    1929 "INULIN"                           
    1929 "OXYMETAZOLINE"                    
    1929 "DEXTROMETHORPHAN"                 
    1929 "UREA"                             
    1929 "PARACETAMOL"                      
    1942 "INFANT MILKS"                     
    1942 "TOLTERODINE"                      
    1942 "METHYLPHENIDATE"                  
    1942 "EPLERENONE"                       
    1942 "INFANT MILKS"                     
    1942 "POLYCARBOPHIL"                    
    1942 "INFANT MILKS"                     
    1942 "PHENOBARBITAL"                    
    1942 "IRON FERROUS"                     
    1942 "ASCORBIC ACID"                    
    1942 "BENZOCAINE"                       
    1942 "INFANT MILKS"                     
    1942 "FEEDING BOTTLES"                  
    1942 "MORPHINE"                         
    1942 "NUTRITIONAL SUPPLEMENTS"          
    1942 "INFANT MILKS"                     
    1942 "CALCIUM"                          
    1942 "CALCIUM"                          
    1942 "DEXTROMETHORPHAN"                 
    1942 "KETOPROFEN"                       
    1942 "NUTRITIONAL SUPPLEMENTS"          
    1942 "INFANT MILKS"                     
    1942 "INFANT MILKS"                     
    1942 "INFANT MILKS"                     
    1949 "INFANT MILKS"                     
    1949 "IRINOTECAN"                       
    1949 "GLIPIZIDE"                        
    1949 "DICLOFENAC"                       
    1949 "CLOBETASOL"                       
    1949 "ASCORBIC ACID"                    
    1949 "FLUOCINONIDE"                     
    1949 "FERUMOXIDES"                      
    1949 "PSEUDOEPHEDRINE"                  
    1949 "AMANTADINE"                       
    1949 "DISPOSABLE MEDICAL DEVICES"       
    1949 "ERGOCALCIFEROL"                   
    1949 "NARATRIPTAN"                      
    1949 "BETAXOLOL"                        
    1949 "FOLIC ACID"                       
    1949 "COMPOSITION UNKNOWN"              
    1949 "PSEUDOEPHEDRINE"                  
    1949 "SALICYLIC ACID"                   
    1949 "NAPROXEN"                         
    1949 "GLUCOSE, URINE TESTS"             
    1949 "PROTEIN TESTS"                    
    1949 "GENERAL NUTRIENTS"                
    1949 "ESTRADIOL"                        
    1949 "TESTOSTERONE"                     
    1949 "PROCHLORPERAZINE"                 
    1949 "COMPOSITION UNKNOWN"              
    1949 "NUTRITIONAL SUPPLEMENTS"          
    1949 "APRACLONIDINE"                    
    1953 "ACETAZOLAMIDE"                    
    1953 "NYSTATIN"                         
    1953 "MONTELUKAST"                      
    1953 "GABAPENTIN"                       
    1953 "NOREPINEPHRINE"                   
    1953 "NAPROXEN"                         
    1953 "NYSTATIN"                         
    1953 "PSEUDOEPHEDRINE"                  
    1953 "ROCURONIUM BROMIDE"               
    1953 "COMPOSITION UNKNOWN"              
    1953 "VORICONAZOLE"                     
    1953 "ETOPOSIDE"                        
    1954 "ALFUZOSIN"                        
    1954 "NON-DISPOSABLE MEDICAL DEVICE"    
    1954 "BUSPIRONE"                        
    1954 "DIGOXIN"                          
    1954 "METFORMIN"                        
    1954 "ATROPINE"                         
    1954 "PARAFFIN OIL"                     
    1954 "BUSPIRONE"                        
    1954 "ZONISAMIDE"                       
    1954 "ZONISAMIDE"                       
    1954 "ATROPINE"                         
    1954 "8-QUINOLINOL"                     
    1954 "PRAVASTATIN"                      
    1954 "OTHER CLEANSING AGENTS"           
    1954 "BELATACEPT"                       
    1954 "TENIPOSIDE"                       
    1954 "MISCELLANEOUS URINE TESTS"        
    1954 "CLONIDINE"                        
    1954 "PREDNISOLONE"                     
    1954 "ATROPINE"                         
    1954 "CLONIDINE"                        
    1954 "ALFUZOSIN"                        
    1954 "ATROPINE"                         
    end
    where molecule is not unique for each launch date. I would like to take the minimum launch date for each molecule. So for instance, CALCIUM here appears just for 1942 but in the complete sample it also appears for 2004, 2005, 2012 and 2015. I would like to take the 1942 as launch date for CALCIUM and do the same for all the other molecules.

    Many thanks!

  • #2
    Code:
    table molecule, c(min data_lancio)
    or if you want to modify your data:

    Code:
    collapse (min) data_lancio, by(molecule)
    Last edited by Sergiy Radyakin; 07 Jan 2019, 14:53. Reason: Added code for replacing data in memory with collapsed data.

    Comment


    • #3
      Code:
      by molecule, sort: egen age_molecule = min(data_lancio)
      May I suggest a different name for your variable, since what you are calculating is not an age, it is a date. I would call it something like first_launch_date.

      Added: Crossed with #2, which gives you a way of displaying each molecule and the corresponding first launch date in the Results window and your log file. The code here instead creates a new variable.

      Comment


      • #4
        Crossed with #3. Note that Clyde's solution keeps the original number of observations, mine retains only the unique values of molecules. It's not clear from #1 post above which is desirable.

        Comment


        • #5
          Thank you for your suggestions. I intended to retain the original number of observations, sorry for not having been clear on that.
          Moreover yes, I will definitely change the variable name and create age_molecule later on.
          Thank you very much,

          Federico

          Comment

          Working...
          X