No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep IDs (in a bi-monthly panel data) presenting a sequential series of values in binary variables


    I am kindly asking you some help inherent one thing, which I cannot get through.

    I have a panel data composed of individuals over a bi-monthly time framework of 1 year, only counting the months of January and February (i.e., ID_1 (2000m1, 2000m2), ID_2 (2000m1, 2000m2), ..., ID_N (2000m1, 2000m2).

    Now, for every individual, and for each of these two months, I have a binary variable (one binary variable (varA) for January and one binary variable (varB) for February) which can be either 1 or 0; also, there can be a missing value too (so in this case the binary variable = .). Also, varA =. for all the cells corresponding to the month of February (and viceversa varB =. for all the cells corresponding to the month of January).

    My goal is to do in a way to keep only the individuals which have the binary variable for January equal to 0 (i.e., varA=0) and the binary variable for February equal to 1 (i.e., varB=0).

    Of course, I cannot write "keep if varA==0 & varB==1", because I need only to select the cases for which this happens sequentially.

    Whether someone would kindly decide to help me with this, I would be extremely grateful.

    Thank you very much!

    With best regards,


  • #2
    I gather that, with more than 50 posts, you may have understood the importance of using the best approach to share data/command/output in this forum.

    Just for the sake of clarifying it enough, I decided to share important information from the FAQ, exactly on this matter:

    12.2 What to say about your data

    We can understand your dataset only to the extent that you explain it clearly.

    The best way to explain it is to show an example. The community-contributed command dataex makes it easy to give simple example datasets in postings. It was written to support Statalist and its use is strongly recommended. Usually a copy of 20 or so observations from your dataset is enough to show your problem. See help dataex for details.

    As from Stata 15.1 (and 14.2 from 19 December 2017), dataex is included with the official Stata distribution. Users of Stata 15 (or 14) must update to benefit from this.

    Users of earlier versions of Stata must install dataex from SSC before they can use it. Type ssc install dataex in your Stata.

    The merits of dataex are that we see your data as you do in your Stata. We see whether variables are numeric or string, whether you have value labels defined and what is a consequence of a particular display format. This is especially important if you have date variables. We can copy and paste easily into our own Stata to work with your data.

    If your dataset is confidential, then provide a fake example instead.

    The second best way to explain your situation is to use one of Stata's own datasets and adapt it to your problem. Examples are the auto data and the Grunfeld data (a simple panel dataset). That may be more work for you and you may not find an analog of your problem with such a dataset.

    The worst way to explain your situation is to describe your data vaguely without a concrete example. Note that it doesn't help us much even to be given your variable names. Often that leaves unclear both your data structure and whether variables are numeric or string or their exact contents. If you explain only vaguely, quick answers to your question, or even any answers at all, are less likely.

    12.3 How to use CODE delimiters

    Stata code (i.e. the exact commands issued) is very much easier to read if presented as such.

    When you are editing an answer you should see a # button in the toolbar above the text area. Click on # to insert CODE] and /CODE] mark-up. Write your code between, paying particular attention to linebreaks and indentation.

    If you do not see that button, then click on the “Toggle Advanced Editor” button (an underlined A) in the area above to show the toolbar.

    If you do not have access to the Advanced Editor in your interface, you can just insert those mark-ups manually before, or indeed after, you insert your code. Many people fast at typing do that any way.

    Examples of your data (or of realistic similar datasets) are also much easier to read if presented as CODE. dataex, explained just above, automatically generates text including CODE delimiters, which can be copied and pasted into Statalist posts.

    What is valuable with presenting code or data as CODE is that other members can easily copy and paste what you post to play with in their Stata installation.
    Last edited by Marcos Almeida; 14 Feb 2018, 04:26.
    Best regards,



    • #3
      Marcos is right; nevertheless it's possible to make suggestions. The answer lies in subscripting.

      bysort ID (date) : keep if _N == 2 & varA[1] == 0 & varB[2] == 0
      The condition if _N == 2 may seem redundant but in a dataset [bettter wording than "a data"] of any size you might get singletons or duplicates, so

      bysort ID : gen tocheck = _N != 2
      edit if tocheck
      might be a good idea before you do that.
      Last edited by Nick Cox; 14 Feb 2018, 04:47.


      • #4
        Nick.. if Stata was a person, that would be you!!

        the code you provided worked perfectly, thanks a lot.

        As always, thank you for saving my day!




        • #5
          Thanks for the thanks!

          William Gould, who started Stata and remains chief developer and company President, would win most of the Stata gold medals, although like me he would be fairly useless at the several varieties of falling downhill rapidly that are currently on display.


          • #6
            It is true that Mr. Gould is very good, but even if he's the one who started Stata, it doesn't mean that he's the best. To me, you are and will always be the true god of Stata!


            • #7
              Your intentions are flattering and I am flattered, but there's no comparison between the person who created Stata and someone who's just a relatively experienced and visible user.
              Last edited by Nick Cox; 14 Feb 2018, 07:52.


              • #8
                mmm... whatever


                • #9
                  Might I add the observation that without William Gould, Nick would have had to find some other niche in which to become the leading guru. And without Nick to help the users along, it's not clear how Stata would have fared in the marketplace. It does take a village, as they say.


                  • #10
                    that was an acute observation! lol