Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing values holding other variables constant

    Hi all,
    I have the following kind of data:

    Code:
    Owner_id                Name                            Value
    "00091701"           "ADFC-NewsApp-Mono"                    3
    "00091701"           "ADFC-NewsApp-Mono"                    1
    "01org"                 "cloudeebus"                        3
    "01org"                 "cloudeebus"                        4
    "01org"                 "cloudeebus"                        2
    I need to compare the values among the observations with the same Owner_id and Name and leave only those with the highest value (others can be dropped). For example, the dataset should result as follows:
    Code:
    Owner_id               Name                            Value
    "00091701"           "ADFC-NewsApp-Mono"                 3
    "01org"                 "cloudeebus"                     4
    The dataset is really large and it's simply impossible to do all the comparisons manually. The only step I've done is I sorted the dataset according to the Owner_id and the Name. I don't know what to do next. Please, help me.
    Last edited by Ivan Dmitrovic; 06 Mar 2016, 20:53.

  • #2
    Let's start by creating your dataset.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 Owner_id str17 Name byte Value
    "00091701" "ADFC-NewsApp-Mono" 3
    "00091701" "ADFC-NewsApp-Mono" 1
    "01org"    "cloudeebus"        3
    "01org"    "cloudeebus"        4
    "01org"    "cloudeebus"        2
    end
    To keep the observations with the highest value, run these commands:
    Code:
    gsort Owner_id Name -Value
    by Owner_id Name: keep if _n==1
    This is the result:
    Code:
    . list
    
         +--------------------------------------+
         | Owner_id                Name   Value |
         |--------------------------------------|
      1. | 00091701   ADFC-NewsApp-Mono       3 |
      2. |    01org          cloudeebus       4 |
         +--------------------------------------+
    You should never have to do anything manually in Stata.

    Comment


    • #3
      Code:
      by Owner_id Name (Value), sort: keep if _n == _N
      Note: This assumes, as appears to be the case here, that Value is a numeric variable, not a string. It also assumes that Value is never missing. (If there are observations where Value is missing, you can precede the code above with -drop if missing(Value)- first.)

      You should learn and master the use of Stata's -by:- prefix and how _n and _N work with it. Many data management problems are solved using these mechanisms. Do read the -by:- chapter of the [D] volume of the User's manual.

      In future posts, when providing example data, please use the -dataex- program. That way, those who would work on your problem would not have to make assumptions about things like whether Value is numeric or string, because we would be able to easily and immediately re-create your data set in faithful detail. If you have not already installed -dataex-, run -ssc install dataex-. Then read -help dataex- to see the simple instructions for using it.

      Comment


      • #4
        Seems like working. Could you briefly explain what exactly those commands do?

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Code:
          by Owner_id Name (Value), sort: keep if _n == _N
          Note: This assumes, as appears to be the case here, that Value is a numeric variable, not a string. It also assumes that Value is never missing. (If there are observations where Value is missing, you can precede the code above with -drop if missing(Value)- first.)

          You should learn and master the use of Stata's -by:- prefix and how _n and _N work with it. Many data management problems are solved using these mechanisms. Do read the -by:- chapter of the [D] volume of the User's manual.

          In future posts, when providing example data, please use the -dataex- program. That way, those who would work on your problem would not have to make assumptions about things like whether Value is numeric or string, because we would be able to easily and immediately re-create your data set in faithful detail. If you have not already installed -dataex-, run -ssc install dataex-. Then read -help dataex- to see the simple instructions for using it.
          Sorry for that, I did install the package, I used the command and copied the appeared text. Apparently I did something wrong

          Comment


          • #6
            Re #5: the first code block in #2 shows what it looks like when -dataex- is used. My best guess is that you ran -dataex- and instead of copying and pasting what Stata responded with, you then went on to -list- the data and copied the results of the -list-. Anyway, next time you use -dataex-, look for something like what Friedrich Huebler posted.

            You will find a fuller and more general explanation of my code in the -by:- chapter of the [D] User's manual. But briefly, the command begins by sorting the data on Owner_id and Name, and within those by Value. So all the observations with a given Owner_id and Name are grouped together, with the one having the smallest Value first and the one having the largest Value last, and all in increasing order in between. When used with -by:- _n refers to position of any observation within such a group, and _N refers to the last position within the group. So the -keep if _n == _N- part says if the current observation (observation _n) is the last observation (observation _N), keep that observation. And because of the way the sorting was done, the last observation, the one we keep, is the one with the largest Value. That is how your goal is accomplished by this command.

            If you are going to be using Stata more than once in a blue moon, you will need to do things like this very often. I cannot over-emphasize how important it is to learn the use of _n and _N with -by:-. It will take some time and effort, but that investment will have a huge return.

            Comment


            • #7
              See also http://www.stata-journal.com/sjpdf.h...iclenum=pr0004

              Comment

              Working...
              X