Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compare variables of two different data set

    Hi.

    I am working on two datasets, and I am trying to merge both datasets usng SPSS but some variables won't merge, so i decided to use STATA and I still encountered same problem (I decided to do a manual check on the variables that will not merge and I noticed there were entry errors in both data set). How can I compare the variables for both data sets in other to identify the mismatch, also, is there another way I can merge both datasets without having same issue.

    Thanks
    Kola

  • #2
    Kola:
    welcome to the list.
    You may want to take a look at -assert- and -_merge==- in -help merge- and -merge- entry in stata .pdf manual.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Kola,

      Try the cf Stata command

      Comment


      • #4
        It is unclear what you mean by "won't merge." It would be better to show the -merge- command you gave and what output Stata gave you. If by "won't merge" you mean that you expected every observation in dataset 1 to have a match in dataset 2 and vice versa, but that turned out not to be the case, then you can identify the problem cases by looking at the -merge- variable that the -merge- command creates. If after the -merge- you -drop if _merge == 3-, you will be left with only those observations that did not find a match. Those with _merge = 1 were in the data originally in memory but found no match in the -using- dataset. Those with _merge = 2 were the other way around.

        If that doesn't solve your problem, please post back with more specific information. In particular, show your command, the exact output, and some examples of your problematic data sets. (Use the -dataex- command to post the example data. Install -dataex- with -ssc install dataex-. Then -help dataex- provides the simple instructions for using it.)

        Comment


        • #5
          Carlo Lazzaro thanks for the welcome message.

          Francisco Mejia Carlo Lazzaro Thanks for the response to my post. I think I did not spell out my question well. I am trying to compare variable names on both data sets. see below for example. Also I am a beginner with STATA and opened to learning more.

          id
          idate
          ira
          ilocation
          idemo
          idemo1

          Thanks
          Kola

          Comment


          • #6
            Clyde Schechter thank you for your post. I will do that right away.

            thanks
            Kola

            Comment


            • #7
              Clyde Schechter See below the message created .
              Thanks

              variable itlfbdate is long in master but str24
              in using data
              You could specify merge's force option to
              ignore this numeric/string mismatch. The
              using variable would then be treated as if
              it contained numeric missing value.

              I am not too sure how to do a merge force as indicated in the message.

              Comment


              • #8
                I am not too sure how to do a merge force as indicated in the message.
                Whew! That's a good thing--because you shouldn't do it!

                The message means exactly what it says. Your variable itifbdate is stored differently in the two data sets. In the data in memory it is a numeric long variable. (I'm guessing from its name that it's a date variable.) But in the using data set it's a string variable. Using a -force- merge will not solve this problem; it will sweep it under the rug, and in the most destructive way possible: all of the values of itifbdate coming from the using data set would be clobbered and replaced by missing values! If that's OK, then go ahead, but that is rarely helpful. [More generally, -force- options in Stata should be avoided unless you have carefully considered their usually quite destructive implications and thoughtfully decided that they are the least bad way to proceed.]

                What you need to do is fix your data sets so this variable is harmonized. String dates are rarely useful for anything: they don't sort correctly and they can'be used for computations. So it is probably more sensible to make them both numeric.

                Now, a numeric variable that represents a date can be tricky. The best way to do this is to have Stata internal-format dates (where, for example, today's date, 6 Feb 2017, is represented by 20856). But sometimes you have a date variable that takes on numeric values like 20170206 for 6 February 2017. These are numeric, but they are not Stata internal-format dates and are not very useful either.

                At the level of general advice, you need to read the Stata help file and manual section on datetime variables. There is a lot to swallow there, and it can be confusing the first several times you use them. Even experienced users often have to refer back to them for a refresher on some of the details. So if after reading that you still have questions feel free to post back. But if you do so, you absolutely must post examples from your data set in order to get useful advice. Description will inevitably fail to convey the necessary level of detail. The way to get specific advice is to use -dataex-, as explained in #4.
                Last edited by Clyde Schechter; 06 Feb 2017, 12:40.

                Comment


                • #9
                  Clyde Schechter Thank you so much. I just followed your instruction and I was able to complete the merge. I worked on the date.

                  Best
                  Kola

                  Comment

                  Working...
                  X