Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting up a complex counter

    Hi all,

    please consider the following table for my question. The first two variables are given in my data set and I would like to add the third column in my data set. The numbers of the variables C are determined as follows: Variable C is a counter for the number of observations of variable B for a given variable A.
    This means that:
    - Variable C is always a 1 for every first entry of a new ID of variable A
    - If an ID exists multiple times in variable A, then variable C is a counter for the number of observations of variable B
    I hope the table makes it clear, what I am trying to do. Does anyone know how to set this up in Stata? I only know how to do this in Excel and am currently struggling to do the same in Stata.
    A B C
    1 Tom 1
    1 Tom 2
    1 Martin 1
    1 Tom 3
    1 Chris 1
    1 Martin 2
    1 Tom 4
    2 Tom 1
    2 Sophie 1
    2 Tom 2
    3 Martin 1
    4 Tom 1
    4 Tom 2
    5 Tom 1
    5 Martin 1
    5 Martin 2
    5 Martin 3
    Thank you for your help!


  • #2
    Code:
    sort A B, stable
    by A B: gen C = _n
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Great thank you! I have one follow-up question. It is a similar scenario but slightly different. If we consider the following table (I tried to use dataex, is this a correct use of it?):

      input int(A B)
      1 1 1
      1 2 2
      1 2 2
      1 3 3
      1 3 3
      1 3 3
      1 4 4
      1 4 4
      2 5 1
      2 5 1
      2 6 2
      2 5 2
      2 7 3

      The first two variables are as given in my dataset. The third variable was added manually by me and is the variable I would like to create. It is supposed to show the number of unique counts of observations of variable B for a given variable A. For example: Variable A takes on the value 1 in the first eight observations. For these observations I would like to start a counter starting at 1 increasing incrementally for each new unique observation in variable B. The same process starts again with the next observation 2 in variable A. Again I would like to start a counter starting at 1 increasing incrementally for each new unique observation in variable B.

      I would appreciate your input. Thank you so much.

      Comment


      • #4
        Concerning your use of -datatex-, thank you for trying. It is not quite correct. You need to include the -clear- command that is near the beginning of the -dataex- output, and the -end- that is at the end. Also, you can't edit in a new variable once you have run -dataex- (or, if you do, you also need to edit a name for that variable into the -input- statement. Here's what it would look like:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(A B wanted)
        1 1 1
        1 2 2
        1 2 2
        1 3 3
        1 3 3
        1 3 3
        1 4 4
        1 4 4
        2 5 1
        2 5 1
        2 6 2
        2 5 2
        2 7 3
        end
        Now, there appears to be an error in this input. Notice the second to last observation where A = 2 and B = 5. First, it is out of order (the rest of the output is sorted on A and B within A.) Next, there are two other observations with A = 2 and B = 5, so I would expect that the desired value of your variable would be 1, not 2. If I have this right, the code you want is:

        Code:
        by A B, sort: gen C = (_n == 1)
        by A (B): replace C = sum(C)
        Alternatively, perhaps what you want is different. Perhaps the given order of the observations is important and you want C to increment each time B changes (even if the same value of B has previously occurred with this A.) In that case, I would expect the wanted value of C here to be 3, and that in the final observation to be 4.

        Code:
        gen long obs_no = _n
        by A (obs_no), sort: gen C = (B != B[_n-1])
        by A (obs_no): replace C = sum(C)
        If neither of these is what you want, please clarify how you arrived at your results and I will try again.



        Comment


        • #5
          Clyde Schechter

          ​​​​​​Filipp asked the same question twice, once here and once as a separate thread. Really not a good idea! See the thread https://www.statalist.org/forums/for...econd-variable

          Fortunately the answers converge, modulo the uncertainty about the precise rules.

          Comment


          • #6
            Oh, I didn't see this one before I responded to the other one.

            Comment


            • #7
              It's Filipp's responsibility (a) not to post duplicate questions or (b) to tell us if he does.

              Comment

              Working...
              X