Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to screen data with two conditions

    Dear Statalist,

    I have a set of data. The first column is the number of different projects, the second column is the name of the project leader, the third one is the date when this project was initiated, and the fourth is the date when this project received various feedback. The same project might receive more than one comment.

    I would like to screen the data and only want the rows with each leader's first project's number, name, date and feedback date. For each leader, there might be more than one row.
    As an example, for leader A, I only want the first four rows and for B, I only want the sixth to eighth rows.

    Looking forward to your reply and thanks a lot!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte project str1 Name int(date1 date2)
    1 "A" 22046 22046
    1 "A" 22046 22046
    1 "A" 22046 22046
    1 "A" 22046 22047
    2 "A" 22049 22049
    3 "B" 20125 20125
    3 "B" 20125 20126
    3 "B" 20125 20127
    4 "B" 20692 20692
    4 "B" 20692 20693
    5 "B" 20801 20801
    6 "C" 21003 21003
    6 "C" 21003 21003
    6 "C" 21003 21004
    7 "C" 21187 21187
    end
    format %tdnn/dd/CCYY date1
    format %tdnn/dd/CCYY date2
    Thank,
    Iris

  • #2
    I'm confused by your data and request.

    First, in the fourth observation, date2 is different from what is seen in the first three observations, yet you say you want to keep all of the first four observations. This seems to contradict your criterion that all of them should be the same.

    Second, what do you mean by first? Do you mean chronologically earliest? If you mean earliest, should this determination be based on date1 or date 2? What if there is a tie? Or do you mean appearing first in the current sort order of the data set. (Perhaps in your real data set these are the same thing--they are in the example data, but the code should not rely on this without your confirmation.)

    Please clarify.

    Comment


    • #3
      Hi Clyde,

      Thanks for your reply.

      The date 2 is the date when other people gave feedback to the focal project. Although in the fourth observation, date2 is different, it still belongs to the first (earliest) project of leader A. The project number of the fourth observation is still 1.

      And yes, first means chronologically earliest and it should be based on date1, the date when this project was initiated. Since the date2 is the feedback date from other people, it should be the same date as date1 or later than it.

      Each project is independent and contains different contents although there might be some situations that a leader posted two projects on the same day. So I also provided the number of projects to distinguish the projects.

      Thanks!

      (Please use the updated dataex below)
      * Example generated by -dataex-. To install: ssc install dataex clear input byte project str1 Name int(date1 date2) 1 "A" 22046 22046 1 "A" 22046 22046 1 "A" 22046 22046 1 "A" 22046 22047 2 "A" 22049 22049 3 "B" 20125 20125 3 "B" 20125 20126 3 "B" 20125 20127 4 "B" 20692 20692 4 "B" 20692 20693 5 "B" 20801 20801 6 "C" 21003 21003 6 "C" 21003 21003 7 "C" 21003 21004 8 "C" 21187 21187 end format %tdnn/dd/CCYY date1 format %tdnn/dd/CCYY date2

      Comment


      • #4
        Sorry for the format in the last post. Please check this one.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte project str1 Name int(date1 date2)
        1 "A" 22046 22046
        1 "A" 22046 22046
        1 "A" 22046 22046
        1 "A" 22046 22047
        2 "A" 22049 22049
        3 "B" 20125 20125
        3 "B" 20125 20126
        3 "B" 20125 20127
        4 "B" 20692 20692
        4 "B" 20692 20693
        5 "B" 20801 20801
        6 "C" 21003 21003
        6 "C" 21003 21003
        7 "C" 21003 21004
        8 "C" 21187 21187
        end
        format %tdnn/dd/CCYY date1
        format %tdnn/dd/CCYY date2

        Comment


        • #5
          Code:
          bys Name (date1): gen wanted = date1 == date1[1]
          list if wanted
          Code:
          . list if wanted
          
               +-------------------------------------------------+
               | project   Name       date1       date2   wanted |
               |-------------------------------------------------|
            1. |       1      A   5/11/2020   5/11/2020        1 |
            2. |       1      A   5/11/2020   5/11/2020        1 |
            3. |       1      A   5/11/2020   5/11/2020        1 |
            4. |       1      A   5/11/2020   5/12/2020        1 |
            6. |       3      B    2/6/2015    2/6/2015        1 |
               |-------------------------------------------------|
            7. |       3      B    2/6/2015    2/7/2015        1 |
            8. |       3      B    2/6/2015    2/8/2015        1 |
           12. |       6      C    7/3/2017    7/3/2017        1 |
           13. |       6      C    7/3/2017    7/3/2017        1 |
               +-------------------------------------------------+
          Last edited by Fei Wang; 17 Nov 2021, 21:45.

          Comment


          • #6
            Dear Fei,

            Thanks a lot. Your results are indeed what I want.

            However, when I put the same code as yours in my Stata, the results contain one more row as "7 C 7/3/2017 7/4/2017", as it did not screen out the third observation of Leader C. This observation is leader's C second project which should not be left although it was also posted on the same day as C's first project (earliest).

            Could you please explain more about how I can I screen out this one?

            Thanks,
            Iris

            Comment


            • #7
              Sorry Iris, missing the line of "7 C" is an error of #5 (should be there with my code). The reason is that projects 6 and 7 in the example data of #4 started on the same day, and information of date1 itself cannot distinguish which is the first project of that day. If earlier projects of a leader are always assigned smaller project numbers, then the solution should be as below.

              Code:
              bys Name (project): gen wanted = project == project[1]
              list if wanted
              Code:
              . list if wanted
              
                   +-------------------------------------------------+
                   | project   Name       date1       date2   wanted |
                   |-------------------------------------------------|
                1. |       1      A   5/11/2020   5/11/2020        1 |
                2. |       1      A   5/11/2020   5/11/2020        1 |
                3. |       1      A   5/11/2020   5/11/2020        1 |
                4. |       1      A   5/11/2020   5/12/2020        1 |
                6. |       3      B    2/6/2015    2/6/2015        1 |
                   |-------------------------------------------------|
                7. |       3      B    2/6/2015    2/7/2015        1 |
                8. |       3      B    2/6/2015    2/8/2015        1 |
               12. |       6      C    7/3/2017    7/3/2017        1 |
               13. |       6      C    7/3/2017    7/3/2017        1 |
                   +-------------------------------------------------+

              Comment


              • #8
                Thanks a lot Fei! I'll try it and see if it works.

                Comment

                Working...
                X