Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • To count observations based on another variables

    Hello everyone,

    I have tried to find a way to know how many candidates repeat and don't repeat in the same place through time. The thing is I have an append with lots of observations and these observations can be the same id in a different year.

    I know that using egen countnum=count( id ), by( id ) is a correct way to know whether the id is repeated, but what I do not know is whether this is a correct way to count whether the id repeat in the same place through time.

    Could anybody lend me a hand?

    Thank you


    Click image for larger version

Name:	ex.JPG
Views:	1
Size:	20.2 KB
ID:	1503454
    Last edited by Harold Rodriguez; 15 Jun 2019, 22:03.

  • #2
    I had a mistake, The real count is about "how many candidates repeat in the same/different place through time"

    Comment


    • #3
      I have two questions.

      1) It is not clear if the two counts are meant to be exclusive, so they add up to the total number of cases. ID 333 could be treated as "repeats in the same place" (2011 and 2012) or as "repeats in a different place" (2009 and 2010) or as both, so it would be counted twice. Which do you want?

      2) Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

      It's particularly helpful to use the dataex command to provide sample data in a form that can be easily read into Stata, as described in section 12 of the FAQ. I have some code in mind, but I would want to test it, and Stata cannot read your picture of your example data in post #1.

      Comment


      • #4
        Hi Willian,

        1. If the ID 333 changes the place any year it is count as a change.

        2. I will read it. I apologize for overlooking that FAQ.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double id str18 place long cod_dan int year
          43710 "ANTIOQUIA"    5 2012
        1041148 "MEDELLIN"  5001 2012
          37934 "MEDELLIN"  5001 2012
        1017138 "ANTIOQUIA"    5 2012
          54255 "ANTIOQUIA"    5 2012
          98530 "MEDELLIN"  5001 2012
          43588 "ANTIOQUIA"    5 2012
          11708 "ANTIOQUIA"    5 2012
          43262 "MEDELLIN"  5001 2012
        1017156 "MEDELLIN"  5001 2012
          43649 "MEDELLIN"  5001 2012
          11801 "MEDELLIN"  5001 2012
          43564 "ANTIOQUIA"    5 2012
          71339 "ANTIOQUIA"    5 2012
          71791 "MEDELLIN"  5001 2012
          92503 "BELLO"     5088 2012
        1122399 "ENVIGADO"  5266 2012
          43254 "BELLO"     5088 2012
        1048015 "BELLO"     5088 2012
          50872 "BELLO"     5088 2012
          43906 "APARTADO"  5045 2012
        end
        ------------------ copy up to and including the previous line ------------------

        To take into account here:
        1. the year is not only 2012, there are more than one year and the id can be repeated. So, basically only the id who are repeated are kept BUT once they are repeated in years, now what really matters is whether they repeat in the same place or they don't.
        2. Place is a string variable but each obs has a numeric identifier which is cod_dan.


        Comment


        • #5
          Originally posted by William Lisowski View Post
          I have two questions.

          1) It is not clear if the two counts are meant to be exclusive, so they add up to the total number of cases. ID 333 could be treated as "repeats in the same place" (2011 and 2012) or as "repeats in a different place" (2009 and 2010) or as both, so it would be counted twice. Which do you want?

          2) Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

          It's particularly helpful to use the dataex command to provide sample data in a form that can be easily read into Stata, as described in section 12 of the FAQ. I have some code in mind, but I would want to test it, and Stata cannot read your picture of your example data in post #1.
          So, for example... here we have this case:

          Code:
          clear
          input id str18 place long cod_dan int year
          "3154" "BOGOTA"       11001 2004
          "3154" "CUNDINAMARCA"    25 2006
          "3154" "BOGOTA"       11001 2009
          "3154" "BOGOTA"       11001 2012
          end
          This id 3154 changed his place from 2004 to 2006. Even when he came back to the same place in the following years, he already changed. So, he should be counted
          Last edited by Harold Rodriguez; 16 Jun 2019, 13:44.

          Comment


          • #6
            Here I've expanded on your data from post #5 to present sample code that I think does what you want. In particular, I assume you want each id counted one time
            Code:
            cls
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double id str18 place long cod_dan int year
            3154 "BOGOTA"       11001 2004
            3154 "CUNDINAMARCA"    25 2006
            3154 "BOGOTA"       11001 2009
            3154 "BOGOTA"       11001 2012
            4001 "BOGOTA"       11001 2004
            4001 "BOGOTA"       11001 2006
            4001 "BOGOTA"       11001 2009
            4001 "BOGOTA"       11001 2012
            5001 "BOGOTA"       11001 2004
            end
            
            by id (cod_dan), sort: generate change = cod_dan[1]!=cod_dan[_N]
            by id (year), sort: generate repeat = _N>1
            by id (year), sort: generate year1 = _n==1
            label define yesno 1 "Yes" 0 "No"
            label values repeat change yesno
            tab change repeat if year1
            list, noobs sepby(id)
            Code:
            . tab change repeat if year1
            
                       |        repeat
                change |        No        Yes |     Total
            -----------+----------------------+----------
                    No |         1          1 |         2 
                   Yes |         0          1 |         1 
            -----------+----------------------+----------
                 Total |         1          2 |         3 
            
            . list, noobs sepby(id)
            
              +----------------------------------------------------------------+
              |   id          place   cod_dan   year   change   repeat   year1 |
              |----------------------------------------------------------------|
              | 3154         BOGOTA     11001   2004      Yes      Yes       1 |
              | 3154   CUNDINAMARCA        25   2006      Yes      Yes       0 |
              | 3154         BOGOTA     11001   2009      Yes      Yes       0 |
              | 3154         BOGOTA     11001   2012      Yes      Yes       0 |
              |----------------------------------------------------------------|
              | 4001         BOGOTA     11001   2004       No      Yes       1 |
              | 4001         BOGOTA     11001   2006       No      Yes       0 |
              | 4001         BOGOTA     11001   2009       No      Yes       0 |
              | 4001         BOGOTA     11001   2012       No      Yes       0 |
              |----------------------------------------------------------------|
              | 5001         BOGOTA     11001   2004       No       No       1 |
              +----------------------------------------------------------------+

            Comment


            • #7
              Originally posted by William Lisowski View Post
              Here I've expanded on your data from post #5 to present sample code that I think does what you want. In particular, I assume you want each id counted one time
              Hey! Thank you so much. I have somehow managed to count those values and it does not have any difference, so, I can say it's perfectly correct.

              Code:
              bys id: gen x = _N==1
              drop if x ==1
              egen cities = nvals( cod_dan ), by(id)
              gen mcity = 1 if cities>1
              replace mcity=0 if mcity==.
              codebook id if mcity ==0
              codebook id if mcity ==1
              All I can say now is thanks!!! :D

              Last edited by Harold Rodriguez; 16 Jun 2019, 20:21.

              Comment

              Working...
              X