Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Dear Andrew,
    thanks again. I tried the following but do not understand why now it is returning a syntax error:
    Thanks for your support and guidance. I really need this to work. Thanks again

    **************
    . local n: word count `V'
    . local first: word 1 of `V'
    . local last: word `n' of `V'
    . di "`first'" " " "`last'"
    18087 DE000EH094Y1 21272 IT0005283491 [
    . foreach v in `V'{
    2. gen `v' = regexm(V1D1, "`v'")
    3. }
    18087 invalid name
    r(198);
    end of do-file
    r(198);

    18087 invalid name - this comes from the combination of dates and isin: 18087 in effect is a transformed date from numeric to string to facilitate the match with isin which is also string

    egen VADA = concat(Valuedate isin), decode p(" ")
    Now the data looks like this:

    V1D1 VADA
    AT0000136213 17897 18087 XS0250267647
    AT0000136288 17897 18087 DE000EH094Y1
    AT0000136312 17897 18087 ES0413770019
    AT0000137088 17520 18087 XS0250267647
    AT0000137088 17129 18088 ES0414840274
    AT0000137088 17402 18088 XS0286031777
    AT0000137088 17647 18088 ES0347858005
    AT0000137088 17804 18088 ES0413790009
    AT0000137088 17897 18091 ES0347859003
    AT0000137088 16765 18091 XS0250267647
    AT0000137203 17897 18091 FR0010770529

    Comment


    • #17
      You do not want spaces in the combined variable. Doesn't this work?

      Code:
      gen VADA= isin+string(Valuedate)
      Also make sure you have the same for V1D1 where the order is string id + date

      Comment


      • #18
        Thanks Andrew. it seems to work now. thanks but I receive the following error message: so I think the structure of the output could be revised? thanks again. set maxvar is already set so I have no room. thanks
        ***
        error message
        no room to add more variables
        Up to 5,000 variables are currently allowed, although you could reset the maximum using set maxvar; see help memory.

        Comment


        • #19
          That is the drawback with this approach, you are creating indicators for each element in the macro. Using the merge command is therefore efficient. How many elements do you have in total? The value of `n' in the following,

          Code:
          . local n: word count `V'
          . local first: word 1 of `V'
          . local last: word `n' of `V'
          di `n'

          Comment


          • #20
            Thanks again Andrew. I have about 42,000 observations. But the current match is on two variables. the code you provided returns the following error message:
            ******
            local n: word count `V'
            local first: word 1 of `V'
            local last: word `n' of `V'
            di `n'
            invalid syntax
            r(198);

            Comment


            • #21
              Run the entire code, only adding the one line

              Code:
              levelsof VADA, local(V)
              local n: word count `V'
              di `n'

              Comment


              • #22
                Thanks very much. please see output:
                Successful

                . di `n'
                25831
                thanks

                Comment


                • #23
                  Thanks. Add this as the first line before opening the dataset containing your variables

                  Code:
                  set maxvar 27000

                  Comment


                  • #24
                    set maxvar 27000

                    error message:
                    no; data in memory would be lost

                    I saved all data on disc. thanks again Andrew.

                    Comment


                    • #25
                      Before you load any data. Save the commands and have the original datasets

                      Code:
                      clear
                      set maxvar 27000
                      and then the rest of the commands.

                      Comment


                      • #26
                        Thank you very much Andrew. the command works very well - no syntax errors- but found zero observations which is not correct.
                        there should be some V1D1 found in VADA. Grateful if you could please advise, thanks.


                        ****
                        levelsof VADA, local(V)
                        local n: word count `V'
                        local first: word 1 of `V'
                        local last: word `n' of `V'
                        di "`first'" " " "`last'"
                        foreach v in `V'{
                        gen `v' = regexm(V1D1, "`v'")
                        }
                        egen found= rowtotal(`first' - `last')
                        drop `first' - `last'
                        browse if found>0

                        end

                        Comment


                        • #27
                          From #16, it appears that your combinations are not consistent

                          Code:
                          V1D1 VADA
                          AT0000136213 17897 18087 XS0250267647
                          AT0000136288 17897 18087 DE000EH094Y1
                          AT0000136312 17897 18087 ES0413770019
                          AT0000137088 17520 18087 XS0250267647
                          AT0000137088 17129 18088 ES0414840274
                          AT0000137088 17402 18088 XS0286031777
                          AT0000137088 17647 18088 ES0347858005
                          AT0000137088 17804 18088 ES0413790009
                          AT0000137088 17897 18091 ES0347859003
                          AT0000137088 16765 18091 XS0250267647
                          AT0000137203 17897 18091 FR0010770529
                          For V1D1, you have ID-DATE and for VADA you have DATE-ID. This could be the issue. Just look at how the example I provide is set up and check whether there are inconsistencies in your set-up.

                          Comment


                          • #28
                            Dear Andrew,
                            thanks for the help again. I checked the order of the variables are they are consistent. however it is strange that no V1D1 can be found in VADA. V1D1 is a subset of VADA.
                            Thanks again for all your support and help. I am surprised that it is that complicated to match variables in stata....
                            Best Liz
                            ******

                            VADA V1D1
                            XS025026764718087 AT000013621317897
                            DE000EH094Y118087 AT000013628817897
                            ES041377001918087 AT000013631217897
                            XS025026764718087 AT000013708817520
                            ES041484027418088 AT000013708817129
                            XS028603177718088 AT000013708817402
                            ES034785800518088 AT000013708817647
                            ES041379000918088 AT000013708817804
                            ES034785900318091 AT000013708817897

                            Comment


                            • #29
                              This routine will pick up matches in terms of both ID and date. Below

                              Code:
                              input str12 var_code1 str9 date1 str10 date_app str12 var_app
                              "ES0305085005" "28-Dec-17" "11/25/2014" "XS0528006090"
                              "ES0305085005" "28-Sep-17" "11/25/2014" "ES0374273003"
                              "ES0305085005" "29-Jun-17" "11/25/2014" "IT0004790918"
                              "ES0305085005" "30-Mar-17" "11/25/2014" "IT0004790918"
                              "ES0305085005" "24-Sep-15" "11/25/2014" "IT0004790918"
                              "ES0305085005" "29-Sep-16" "11/25/2014" "IT0004790918"
                              "ES0305085005" "31-Dec-15" "11/26/2014" "ES0374273003"
                              "ES0305085005" "31-Mar-16" "11/27/2014" "XS1135366240"
                              "ES0305085005" "30-Jun-16" "11/27/2014" "XS1135365515"
                              "ES0305085005" "29-Dec-16" "11/27/2014" "XS1135365788"
                              "XS1314233732" "29-Sep-16" "11/28/2014" "IT0004790918"
                              "XS1314233732" "29-Jun-17" "11/28/2014" "IT0004790918"
                              "XS1314233732" "29-Dec-16" "12/31/2015" "ES0305085005"
                              "XS1314233732" "28-Dec-17" "12/01/2014" "XS0572338936"
                              "XS1314233732" "30-Mar-17" "12/01/2014" "XS0572336997"
                              "XS1314233732" "30-Jun-16" "12/28/2017" "ES0305085005"
                              end
                              both ID and date in one combination are in the same observation and the matched ID and date in the other combination is in the same observation. Maybe you have matches in terms of ID but not both ID and date. Can you manually pick out an observation for one combination that matches with an observation for the other combination? If you need matches only in terms of ID, then you don't need to create a variable that combines ID and date.
                              Last edited by Andrew Musau; 30 Apr 2018, 12:38.

                              Comment

                              Working...
                              X