Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I check if two different variables on the same dataset contain the same values?

    Hi everyone,

    I just have a very short question. How could I check if two variables have the same values.
    In my case, I have two variables : -power_p1- and -power_p2-.

    I want to check if they have the same values. In fact, they should be the same. For example: If -power_p1- is 3500 for one observation, it should be true that -power_p2- should contain 3500 as well.

    I have observed that they do not match sometimes, but I contain 1,500,000 observations. Thus, doing it "by hand" is cumbersome and a pain. For example, sometimes they are miswriting typos, e.g. -power_p1- equals 2000, and -power_p2- equals 2.

    Thanks for your feedback.
    Best,

    Michael

  • #2
    If you just want a warning when this requirement is not true, you can use assert power_p1 == power_p2 . This will give an error message when this statement is false, and return nothing when it is true.

    If you want to inspect cases that are different, you can do something like this: list power_p1 power_p2 if power_p1 != power_p2
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hi Maarten Buis,

      Thank you for the answer. Very useful. Thanks to -list- command I've been able to identify some typos.

      Is there a way to delete typos (in my case, -power_p1- is 100,000 and -power_p2- is 100, or -power_p1- is 250,000 and -power_p2- is 250) directly using -list- ?

      Thanks.

      Michael

      Comment


      • #4
        No; list is only a reporting tool. You are best advised to use replace and document changes.

        Comment


        • #5
          There is no way to do so in the list command. The command list is just there to list the content of variables in the output window. Moreover, how would list know which one is the typo? You have to decide which one is the typo. Stata can't do that for you.

          You could work with the observation number, but that relies on knowing for certain that the sort order in the data does not change when you rerun your do file. That is not guaranteed, so I would not do that. Instead I would do something like this:

          Code:
          list power_p1 power_p2 if power_p1 != power_p2
          replace power_p2 = 100000 if power_p1 == 100000 & power_p2 == 100
          replace power_p2 = 250000 if power_p1 == 250000 & power_p2 == 250
          Since I don't trust myself when counting more than two 0s, I would probably type this as follows:

          Code:
          list power_p1 power_p2 if power_p1 != power_p2
          replace power_p2 = 100e3 if power_p1 == 100e3 & power_p2 == 100
          replace power_p2 = 250e3 if power_p1 == 250e3 & power_p2 == 250
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Hi Nick Cox:

            Thank you for your feedback.

            Have a beautiful day.
            Best regards,

            Michael

            Comment


            • #7
              Hi Maarten Buis:

              Your help is really appreciated!

              I didn't know that I could written 100,000 or any number in scientific notation. I've learned something new, thank you.

              I wish you also a beautiful day.
              Best wishes,

              Michael

              Comment

              Working...
              X