Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge Dataset

    Hello everyone,
    Please I downloaded a dataset in excel format from BankFocus' website for 1455 banks within 19 EU countries and used it to estimate competition using Lerner index and Boone indicator and I've saved the result (Lerner and Boone) in dta format.

    Moreover, I have another dataset (macroeconomic data) in excel format from world bank's website for the same 19 EU countries but there are no banks or firms attached to this new dataset.

    Now I want to merge or join these two datasets to help me estimate the impact of the competition measures (Lerner and Boone) on one of the macroeconomic variables (GDP) using regression. So how do I merge or combine or join these two datasets?

    Or differently put, how do I add the competition measures (Lerner and Boone) to the macroeconomic dataset? Which stata function can help me to merge these two datasets?
    I would be grateful for your assistance, thanks.




  • #2
    It would have simplified matters greatly had you used -dataex- and shown example data for both data sets. As best I can see it from your description, the World Bank data set which contains the macroeconomic data is uniquely identified by country, whereas the BankFocus data set is uniquely identified by country and bank.

    So, it would be:
    Code:
    use lerner_and_boone, clear
    merge m:1 country using world_bank_dataset
    That said, a study based on just observations of 19 countries with no time dimension to it is unusually small. Which makes me wonder whether your description of the data sets failed to mention a time dimension. Maybe the world bank data set is really identified by country and year? And maybe the Lerner and Boone data also has observations over years? In that case, the -merge- command would be -merge m:1 country year using world_bank_dataset-.

    If none of this works out, I would say that it is not possible to answer your question without a better explanation and, even more important, example data from both data sets.

    If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hello Clyde Schechter,
      Please you were right with the description of both datasets and there is a time variable in both datasets. Anyway I tried to merge them with your code -merge m:1 country year using world_bank_dataset- and it has worked in my data. I'm very grateful, thanks so much.

      Comment


      • #4
        Again, most of my variables in the world bank dataset that are numeric such as GDP, inflation, interest, etc appear as string as their type. Please how do I change the type from string to numeric?
        Destring or encode them? Kindly help me with the code, thanks.

        Comment


        • #5
          Variables that are supposed to be numbers, but are stored as strings that look like numbers, can be converted to numeric variables with the -destring- command. -encode- should only be used to create numeric values to associate with string variables that name discrete categories like dog/cat/bird/fish or red/blue/green/yellow or the like. There is a third situation to be aware of: date variables that are stored as strings. For these you need to actually create a new variable using the Stata date function that is appropriate to the type of date (daily(), monthly(), quarterly(), halfyearly(), yearly()).

          For variables like gdp, interest, and inflation your code would be based on:
          Code:
          foreach v of varlist gdp interest inflation {
              destring `v', replace
          }
          However, without seeing example data, I cannot be sure the above will work for you. For example, if the GDP variable includes a currency symbol, or commas setting off groups of 3 digits, or in some cases is unavailable and marked "N/A", things like this can get in the way. So I would say try the above code. But if you get messages from Stata that the replacement was not done, then you might have to use the -ignore()- option, specified with the offending characters, to make it work. Do read -help destring-.

          Comment


          • #6
            Hello Clyde Schechter,
            Thanks so much for the clarification, it's very helpful to me. The macroeconomic variables in my dataset do not include a currency symbol, therefore your destring code above worked for me, thank you once again, I'm very grateful.

            Comment

            Working...
            X