Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Add some kind of identificator to observations when some of them has the same value

    How could add an identificator to an observation when some of them have the same value?

    For example,

    idhh rel
    11 Jefe(a)
    11 Conyuge
    11 Hijo
    11 Hijo
    11 Nieto(a)
    11 Nieto(a)

    For example, idhh = Household ID, and rel = relationship. So I need to add a suffix or tag to those observations that are the same. For example, Hijo01, Hijo02, or Nieto(a)01, Nieto(a)02. I can“t do it one by one, because my dataset contains 83,000 observations, and many households have the same characteristic.

  • #2
    Enrique:
    welcome to this forum.
    Do you mean something along the following lines?
    Code:
    . bysort idhl: gen wanted=1 if rel=="Hijo" | rel=="Nieto" | rel=="Nieto(a)"
    
    
    . list
    
         +-------------------------+
         | idhl       rel   wanted |
         |-------------------------|
      1. |   11      Jefe        . |
      2. |   11   Conyuge        . |
      3. |   11      Hijo        1 |
      4. |   11      Hijo        1 |
      5. |   11     Nieto        1 |
         |-------------------------|
      6. |   11     Nieto        1 |
         +-------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      See also

      Code:
      help duplicates

      Comment


      • #4
        Thank you Carlo and Nick!! But how could I differentiate the observations when they are the same? For example how could I get:

        From this:
        idhh rel

        11 Jefe(a)
        11 Conyuge
        11 Hijo
        11 Hijo
        11 Nieto(a)
        11 Nieto(a)


        To this:

        idhh rel
        11 Jefe(a)
        11 Conyuge
        11 Hijo01
        11 Hijo02
        11 Nieto(a)01
        11 Nieto(a)02



        Is there a way that I could add any kind of suffix such as: 01,02,03, when I have the same observations?
        Last edited by enrique labrada; 11 Nov 2023, 17:13.

        Comment


        • #5
          Enrique:
          do you mean something along the following lines?
          Code:
          . bysort idhh: gen wanted=1 if rel=="Hijo" | rel=="Nieto" | rel=="Nieto(a)"
          
          . bysort idhh rel: gen count=sum( wanted)
          
          . egen final=concat( rel count)
          
          . list
          
               +------------------------------------------+
               | idhh      rel   wanted   count     final |
               |------------------------------------------|
            1. |   11   Conyug        .       0   Conyug0 |
            2. |   11     Hijo        1       1     Hijo1 |
            3. |   11     Hijo        1       2     Hijo2 |
            4. |   11     Jefe        .       0     Jefe0 |
            5. |   11    Nieto        1       1    Nieto1 |
               |------------------------------------------|
            6. |   11    Nieto        1       2    Nieto2 |
               +------------------------------------------+
          
          
          drop wanted count
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you very much Carlo!! I think with that syntax I can solve my problem!!

            Comment


            • #7
              Hi hello!! How could I drop the "." of the observations?


              idhh relationship
              11 Conyuge..
              11 Hijo1.
              11 Hijo2.
              11 Jefe(a)..
              11 Nieto.1
              11 Nieto.2

              I could do it in the editor in a manual form, but I have 83,000 observations. So it would be hard to do it in that way.

              thank you!!


              Comment


              • #8
                because of the way you provided your example (please read about -dataex- in the FAQ), I can't be sure that the following is what you want but see
                Code:
                help subinstr()

                Comment


                • #9
                  Enrique:
                  I echo Rich's wise advise to be clearer about your issue(s) at your first post, so that interested listers can help you out with a comprehensive solution (if any exists).
                  Just for fun: in some parts of Italy we say that "things should not be presented by instalments" .
                  That said, you may want to try something along the following lines:
                  Code:
                  . split relationship, p(.)
                  
                  . list
                  
                       +----------------------------------------+
                       | idhh   relatio~p   relati~1   relati~2 |
                       |----------------------------------------|
                    1. |   11   Conyuge..    Conyuge            |
                    2. |   11      Hijo1.      Hijo1            |
                    3. |   11      Hijo2.      Hijo2            |
                    4. |   11        Jefe       Jefe            |
                    5. |   11     Nieto.1      Nieto          1 |
                       |----------------------------------------|
                    6. |   11     Nieto.2      Nieto          2 |
                       +----------------------------------------+
                  
                  . drop relationship relationship2 relationship
                  
                  . list
                  
                       +-----------------+
                       | idhh   relati~1 |
                       |-----------------|
                    1. |   11    Conyuge |
                    2. |   11      Hijo1 |
                    3. |   11      Hijo2 |
                    4. |   11       Jefe |
                    5. |   11      Nieto |
                       |-----------------|
                    6. |   11      Nieto |
                       +-----------------+
                  
                  .
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I agree with Carlo Lazzaro and Rich Goldstein that your existing and desired variables are not very clear. This may help, but if your rel is really a numeric variable with value labels, you will need something else.

                    Indeed #1 and #2 ask for suffixes such as 01 and 02, but #7 asks for 1 and 2.

                    Code:
                    clear
                    input dhh str42 rel
                    11 Jefe(a)
                    11 Conyuge
                    11 Hijo
                    11 Hijo
                    11 Nieto(a)
                    11 Nieto(a)
                    end
                    
                    bysort dhh rel : gen rel2 = cond(_N == 1, rel, rel + strofreal(_n, "%02.0f"))  
                    
                    list, sepby(rel)
                    
                         +-------------------------+
                         | dhh       rel      rel2 |
                         |-------------------------|
                      1. |  11   Conyuge   Conyuge |
                         |-------------------------|
                      2. |  11      Hijo    Hijo01 |
                      3. |  11      Hijo    Hijo02 |
                         |-------------------------|
                      4. |  11      Jefe      Jefe |
                         |-------------------------|
                      5. |  11     Nieto   Nieto01 |
                      6. |  11     Nieto   Nieto02 |
                         +-------------------------+
                    If you don't want the leading zeros, just add strofreal(_n). If you want an extra space, which in my view would look better, add " " + strofreal(_n) -- and so on.

                    Leading zeros have a point for large families as otherwise Nieto10 would sort before Nieto2 and so on.
                    Last edited by Nick Cox; 13 Nov 2023, 02:14.

                    Comment

                    Working...
                    X