Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping repetitive dates and graphing


    Dear colleagues,
    I am quite new to stata and looking for advise on how to group repeated use of bonds on the same day[Date] with different value dates [ValueDate] in a graph. The [Date] variable should be grouped on the x axis -for instance all days in 02/10/2014 should be in one group but linked to different dates [ValueDate] on the Y axis. suggestions on which types of graphs in stata that could best represent this scenario would be useful.
    please see a sample of the dataset below. I tried egen but I received an error message - variable not found.
    the data is confidential so I cannot provide it in dataex as previously advised. Thanks again for the help and support.

    Date ISIN asset val group id ValueDate BookValue
    02/10/2014 xx1 15 Jum PT1 04/11/2014 145
    02/10/2014 xx2 20 Pfand ex5 11/12/2014 218
    02/10/2014 xx3 25 loans ex10 26/11/2014 16
    02/10/2014 xx4 30 shares ex15 27/10/2014 18
    02/10/2014 xx4 35 securities ex20 16/12/2014 57
    02/10/2014 xx5 40 ABS ex21 22/12/2014 58
    07/10/2014 xx1 48 Jum PT1 06/11/2014 118
    07/10/2014 xx21 36 Pfand ex5 27/10/2014 227
    07/10/2014 xx33 10 loans ex10 07/11/2014 316
    07/10/2014 xx40 18 shares ex15 04/11/2014 211
    07/10/2014 xx42 12 securities ex20 12/11/2014 108
    07/10/2014 xx51 10 ABS ex21 11/12/2014 421
    07/10/2014 xx51 5 jum f12 18/11/2014 14
    14/10/2014 xx1 15 Jum PT1 27/11/2014 145
    14/10/2014 xx2 20 Pfand ex5 19/11/2014 218
    14/10/2014 xx3 25 loans ex10 27/11/2014 16
    14/10/2014 xx4 30 shares ex15 21/11/2014 18
    14/10/2014 xx1 35 securities ex20 19/11/2014 57
    14/10/2014 xx21 40 ABS ex21 23/10/2014 58
    14/10/2014 xx33 8 jum n34 16/12/2014 60

  • #2
    I am afraid that your sample data is difficult to understand - columns run together - and impossible to easily import into Stata to use in preparing an answer.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. In particular, please read FAQ #12 and use dataex when posting sample data to Statalist.

    To install dataex, type ssc install dataex and when that is complete type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

    You can import the sample data you show above into Stata and provide it via dataex. Or you can take your confidential data in Stata and change enough of it so it is like your sample data and has no confidential information.
    Last edited by William Lisowski; 07 Jun 2017, 08:37.

    Comment


    • #3
      Dear William,
      thank you for the reply, and sorry if this was not clear. I only want to group the "Date" variable. For example i want to group in such a way that all dates 02/10/2014 = 1; [in this sample there are 6 days).
      07/10/2014 = 2; [here there are 7 days].

      I tried egen but got an error message "variable not sorted". Then I sorted, but got the same message. Grateful for any help. thanks in advance
      Date

      ISIN asset val group id ValueDate BookValue
      02/10/2014 xx1 15 Jum PT1 04/11/2014 145
      02/10/2014 xx2 20 Pfand ex5 11/12/2014 218
      02/10/2014 xx3 25 loans ex10 26/11/2014 16
      02/10/2014 xx4 30 shares ex15 27/10/2014 18
      02/10/2014 xx4 35 securities ex20 16/12/2014 57
      02/10/2014 xx5 40 ABS ex21 22/12/2014 58
      07/10/2014 xx1 48 Jum PT1 06/11/2014 118
      07/10/2014 xx21 36 Pfand ex5 27/10/2014 227
      07/10/2014 xx33 10 loans ex10 07/11/2014 316
      07/10/2014 xx40 18 shares ex15 04/11/2014 211
      07/10/2014 xx42 12 securities ex20 12/11/2014 108
      07/10/2014 xx51 10 ABS ex21 11/12/2014 421
      07/10/2014 xx51 5 jum f12 18/11/2014 14
      14/10/2014 xx1 15 Jum PT1 27/11/2014 145
      14/10/2014 xx2 20 Pfand ex5 19/11/2014 218
      14/10/2014 xx3 25 loans ex10 27/11/2014 16
      14/10/2014 xx4 30 shares ex15 21/11/2014 18
      14/10/2014 xx1 35 securities ex20 19/11/2014 57
      14/10/2014 xx21 40 ABS ex21 23/10/2014 58
      14/10/2014 xx33 8 jum n34 16/12/2014 60

      Comment


      • #4
        I have a dataset with two identical variables which I want to compare and display the difference as another variable.

        Variable 1 replicates patient (id) a number of times because of several visits to the doctor each year.
        Variable 2 replicates patient (id) a number of times but to a lesser extent - less observations on each date.
        Both variables are string variables.

        I have tried a number of combinations of the "foreach" and for "forevalue" loops but this does not seem to work.

        I am grateful for any urgent help on this matter.
        thanks in advance.

        Comment


        • #5
          Can you provide a data example showing the two variables and what you expect the third variable to look like?

          Comment


          • #6
            thanks for the reply. I want to compare Var_code 1 date 1 with Var_app date_app and list the difference if Var_code 1 date 1 cannot be found in var_app Date_app. thank you

            Under two circumstances.
            1. all variables are in the same database
            2. if variable are in two separate databases
            Var_code1 date1 date_app Var_app
            ES0305085005 28-Dec-17 11/25/2014 XS0528006090
            ES0305085005 28-Sep-17 11/25/2014 ES0374273003
            ES0305085005 29-Jun-17 11/25/2014 IT0004790918
            ES0305085005 30-Mar-17 11/25/2014 IT0004790918
            ES0305085005 24-Sep-15 11/25/2014 IT0004790918
            ES0305085005 29-Sep-16 11/25/2014 IT0004790918
            ES0305085005 31-Dec-15 11/26/2014 ES0374273003
            ES0305085005 31-Mar-16 11/27/2014 XS1135366240
            ES0305085005 30-Jun-16 11/27/2014 XS1135365515
            ES0305085005 29-Dec-16 11/27/2014 XS1135365788
            XS1314233732 29-Sep-16 11/28/2014 IT0004790918
            XS1314233732 29-Jun-17 11/28/2014 IT0004790918
            XS1314233732 29-Dec-16 11/28/2014 IT0004790918
            XS1314233732 28-Dec-17 12/01/2014 XS0572338936
            XS1314233732 30-Mar-17 12/01/2014 XS0572336997
            XS1314233732 30-Jun-16 12/01/2014 XS0572338936

            Comment


            • #7
              Look at the FAQs on how to use dataex in the future. I will assume that all your variables are strings but had you used dataex, I would not have had to make this assumption. I have changed your example slightly so that I include some matches. Regarding 2, I suggest that you combine the datasets either by merging or appending and thereafter, you can build on the technique below.


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str12 var_code1 str9 date1 str10 date_app str12 var_app
              "ES0305085005" "28-Dec-17" "11/25/2014" "XS0528006090"
              "ES0305085005" "28-Sep-17" "11/25/2014" "ES0374273003"
              "ES0305085005" "29-Jun-17" "11/25/2014" "IT0004790918"
              "ES0305085005" "30-Mar-17" "11/25/2014" "IT0004790918"
              "ES0305085005" "24-Sep-15" "11/25/2014" "IT0004790918"
              "ES0305085005" "29-Sep-16" "11/25/2014" "IT0004790918"
              "ES0305085005" "31-Dec-15" "11/26/2014" "ES0374273003"
              "ES0305085005" "31-Mar-16" "11/27/2014" "XS1135366240"
              "ES0305085005" "30-Jun-16" "11/27/2014" "XS1135365515"
              "ES0305085005" "29-Dec-16" "11/27/2014" "XS1135365788"
              "XS1314233732" "29-Sep-16" "11/28/2014" "IT0004790918"
              "XS1314233732" "29-Jun-17" "11/28/2014" "IT0004790918"
              "XS1314233732" "29-Dec-16" "12/31/2015" "ES0305085005"
              "XS1314233732" "28-Dec-17" "12/01/2014" "XS0572338936"
              "XS1314233732" "30-Mar-17" "12/01/2014" "XS0572336997"
              "XS1314233732" "30-Jun-16" "12/28/2017" "ES0305085005"
              end
              
              
              *MAKE DATES CONSISTENT
              gen Date1 = date(date1, "DM20Y")
              format Date1 %td
              gen Date_app = date( date_app , "MDY")
              format Date_app %td
              
              *GENERATE COMBINATIONS OF VARIABLES AND DATES
              gen V1D1= var_code1+string( Date1 )
              gen VADA= var_app +string( Date_app )
              
              *STORE ALL COMBINATIONS OF VAR_APP AND DATE_APP
              levelsof VADA, local(V)
              *LOOK FOR MATCHES
              foreach v in `V'{
              gen `v' = regexm(V1D1, "`v'")
              }
              
              *HERE YOU HAVE TO REFER TO VARIABLE COMBINATIONS FROM DATASET (SPECIFIC TO YOUR DATASET)
              rename (ES030508500520453- XS113536624020054) (match#), addnumber(1)
              egen found= rowtotal(match*)
              drop match*
              list var_code1 Date1 var_app Date_app found, clean noob

              The indicator which I name "found" counts the number of matches of Var_code1 date1 in Var_app date_app and equals 0 if there is no match.

              Code:
              . list var_code1 Date1 var_app Date_app found, clean noob
              
                     var_code1       Date1        var_app    Date_app   found  
                  ES0305085005   28dec2017   XS0528006090   25nov2014       1  
                  ES0305085005   28sep2017   ES0374273003   25nov2014       0  
                  ES0305085005   29jun2017   IT0004790918   25nov2014       0  
                  ES0305085005   30mar2017   IT0004790918   25nov2014       0  
                  ES0305085005   24sep2015   IT0004790918   25nov2014       0  
                  ES0305085005   29sep2016   IT0004790918   25nov2014       0  
                  ES0305085005   31dec2015   ES0374273003   26nov2014       1  
                  ES0305085005   31mar2016   XS1135366240   27nov2014       0  
                  ES0305085005   30jun2016   XS1135365515   27nov2014       0  
                  ES0305085005   29dec2016   XS1135365788   27nov2014       0  
                  XS1314233732   29sep2016   IT0004790918   28nov2014       0  
                  XS1314233732   29jun2017   IT0004790918   28nov2014       0  
                  XS1314233732   29dec2016   ES0305085005   31dec2015       0  
                  XS1314233732   28dec2017   XS0572338936   01dec2014       0  
                  XS1314233732   30mar2017   XS0572336997   01dec2014       0  
                  XS1314233732   30jun2016   ES0305085005   28dec2017       0

              Comment


              • #8
                There ia a little bit of reinventing of the wheel in #7. Thinking about it some more, a simple merge will do and will handle both your scenarios 1 and 2 in #6


                Code:
                *MAKE DATES CONSISTENT
                gen Date1 = date(date1, "DM20Y")
                format Date1 %td
                gen Date_app = date( date_app , "MDY")
                format Date_app %td
                
                
                *GENERATE COMBINATIONS OF VARIABLES AND DATES
                gen V1D1= var_code1+string( Date1 )
                gen VADA= var_app +string( Date_app )
                
                
                *GENERATE SEPARATE DATASET WITH ALTERNATE VARIABLE, RENAME AND MERGE BACK
                preserve
                keep VADA
                rename VADA V1D1
                contract V1D1
                sort V1D1
                tempfile VADA
                save `VADA'
                restore
                merge 1:m V1D1 using `VADA'
                list var_code1 Date1 _freq if _merge==3

                Code:
                . list var_code1 Date1 _freq if _merge==3
                
                     +----------------------------------+
                     |    var_code1       Date1   _freq |
                     |----------------------------------|
                  2. | ES0305085005   31dec2015       1 |
                 10. | ES0305085005   28dec2017       1 |
                     +----------------------------------+
                Last edited by Andrew Musau; 28 Apr 2018, 16:06.

                Comment


                • #9
                  Dear Andrew,
                  thank you for taking the time to provide some advise. unfortunately I still have some problems. I tried the merge option and the error message was
                  variable bothapp does not uniquely identify observations in the master data. Variable bothapp is a combination of
                  Valuedate isin using the following command:
                  egen bothapp = concat(Valuedate isin), decode p(" ").

                  This works
                  *STORE ALL COMBINATIONS OF VAR_APP AND DATE_APP
                  levelsof VADA, local(V)

                  *LOOK FOR MATCHES
                  foreach v in `V'{
                  gen `v' = regexm(V1D1, "`v'")
                  }

                  how do l list the matches? the database is huge therefore looking manually is very tedious.
                  thanks again
                  Liz

                  Comment


                  • #10
                    Search the forum as well as Stata's online documentation on merge to see how to resolve the error. #8 is the easiest way to address your problem. If you follow #7, you can just browse the dataset, constraining the displayed results to those where the variable -found- is positive to view matches. I will automate the generation of the -found- variable so that you do not need to look at the first and last elements in the local macro.


                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str12 var_code1 str9 date1 str10 date_app str12 var_app
                    "ES0305085005" "28-Dec-17" "11/25/2014" "XS0528006090"
                    "ES0305085005" "28-Sep-17" "11/25/2014" "ES0374273003"
                    "ES0305085005" "29-Jun-17" "11/25/2014" "IT0004790918"
                    "ES0305085005" "30-Mar-17" "11/25/2014" "IT0004790918"
                    "ES0305085005" "24-Sep-15" "11/25/2014" "IT0004790918"
                    "ES0305085005" "29-Sep-16" "11/25/2014" "IT0004790918"
                    "ES0305085005" "31-Dec-15" "11/26/2014" "ES0374273003"
                    "ES0305085005" "31-Mar-16" "11/27/2014" "XS1135366240"
                    "ES0305085005" "30-Jun-16" "11/27/2014" "XS1135365515"
                    "ES0305085005" "29-Dec-16" "11/27/2014" "XS1135365788"
                    "XS1314233732" "29-Sep-16" "11/28/2014" "IT0004790918"
                    "XS1314233732" "29-Jun-17" "11/28/2014" "IT0004790918"
                    "XS1314233732" "29-Dec-16" "12/31/2015" "ES0305085005"
                    "XS1314233732" "28-Dec-17" "12/01/2014" "XS0572338936"
                    "XS1314233732" "30-Mar-17" "12/01/2014" "XS0572336997"
                    "XS1314233732" "30-Jun-16" "12/28/2017" "ES0305085005"
                    end
                    
                    
                    *MAKE DATES CONSISTENT
                    gen Date1 = date(date1, "DM20Y")
                    format Date1 %td
                    gen Date_app = date( date_app , "MDY")
                    format Date_app %td
                    
                    *GENERATE COMBINATIONS OF VARIABLES AND DATES
                    gen V1D1= var_code1+string( Date1 )
                    gen VADA= var_app +string( Date_app )
                    
                    *STORE ALL COMBINATIONS OF VAR_APP AND DATE_APP
                    levelsof VADA, local(V)
                    local n: word count `V'
                    local first: word 1 of `V'
                    local last: word `n' of `V'
                    *LOOK FOR MATCHES
                    foreach v in `V'{
                    gen `v' = regexm(V1D1, "`v'")
                    }
                    
                    rename (`first' - `last') (match#), addnumber(1)
                    egen found= rowtotal(match*)
                    drop match*
                    *VIEW MATCHES
                    browse if found>0

                    Comment


                    • #11
                      Dear Andrew,
                      Thank you very much. I am sorry I still cannot solve the problem with the merge command so I want to use the last instructions that you kindly provided. but when I run the following command:

                      local last: word `n' of `V'

                      I receive the following stata error message - invalid syntax r(198).
                      grateful if you can please help me. thanks a lot.

                      Comment


                      • #12
                        Can you show the full commands that you enter up to and including the error. You need to maintain the full sequence

                        Code:
                        clear
                        input str17 VADA
                        "XS052800609020052"
                        "ES037427300320052"
                        "IT000479091820052"
                        "IT000479091820052"
                        "IT000479091820052"
                        "IT000479091820052"
                        "ES037427300320053"
                        "XS113536624020054"
                        "XS113536551520054"
                        "XS113536578820054"
                        "IT000479091820055"
                        "IT000479091820055"
                        "ES030508500520453"
                        "XS057233893620058"
                        "XS057233699720058"
                        "ES030508500521181"
                        end
                        
                        
                        levelsof VADA, local(V)
                        local n: word count `V'
                        local first: word 1 of `V'
                        local last: word `n' of `V'
                        di "`first'" "     " "`last'"

                        Code:
                         local n: word count `V'
                        
                        . 
                        . local first: word 1 of `V'
                        
                        . 
                        . local last: word `n' of `V'
                        
                        . 
                        . di "`first'" "     " "`last'"
                        ES030508500520453     XS113536624020054
                        
                        .

                        Comment


                        • #13
                          Dear Andrew,
                          thanks very much again for your time. this part works:


                          *****
                          levelsof VADA, local(V)
                          local n: word count `V'
                          local first: word 1 of `V'
                          local last: word `n' of `V'
                          di "`first'" " " "`last'"

                          foreach v in `V'{
                          gen `v' = regexm(V1D1, "`v'")
                          }

                          *******
                          this part does not work - to list of differences between VADA and V1D1.
                          rename (`first' - `last') (match#), addnumber(1)
                          egen found= rowtotal(match*)
                          drop match*
                          *VIEW MATCHES
                          browse if found>0

                          I am grateful for the help.
                          Liz

                          Comment


                          • #14
                            sorry the error message is the following:

                            egen found= rowtotal(match*)
                            variable match* not found

                            thanks

                            Comment


                            • #15
                              It is even unnecessary to rename the variables now that we have identified the first and last elements. Try

                              Code:
                              levelsof VADA, local(V)
                              local n: word count `V'
                              local first: word 1 of `V'
                              local last: word `n' of `V'
                              di "`first'" " " "`last'"
                              
                              foreach v in `V'{
                              gen `v' = regexm(V1D1, "`v'")
                              }
                              egen found= rowtotal(`first' - `last')
                              drop `first' - `last'
                              browse if found>0

                              Comment

                              Working...
                              X