Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Delete case from panel data if variable has negative value

    Hello,

    I have a question regarding the following:

    My dataset is panel data, which includes the following:

    - FirmID (which is the ID of each firm)

    - Year (which are logically the years...)

    - Prf (which are the profits)

    What I want to do, is to delete the entire case (all observation for a firm, which is identified by FirmID), when a firm has at least one negative value for Prf.
    My code looks like the following:

    . bysort firmID (Prf) drop if Prf < 0


    But it gives me an error, is there something wrong with my code? If so, can someone help?
    Last edited by Jonathan Smits; 27 Apr 2017, 12:25.

  • #2
    Welcome to Statalist, Jonathan

    I think you want the following, which for every observation with the same firmID examines just the first value of Prf for the firmID, which after sorting will be the lowest value.
    Code:
    bysort FirmID (Prf) : drop if Prf[1] < 0
    sort FirmID Year
    If you don't tell us the error message, we can't definitively tell you what the cause was. But there are several possibilities.
    1. You omitted the required colon after the bysort clause
    2. You've told us the variable is FirmID, but your code uses firmID
    3. Something else entirely because there are an untold number of errors possible in Stata, spoken as someone whose made a new one every day, or so it feels

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Welcome to Statalist, Jonathan

      I think you want the following, which for every observation with the same firmID examines just the first value of Prf for the firmID, which after sorting will be the lowest value.
      Code:
      bysort FirmID (Prf) : drop if Prf[1] < 0
      sort FirmID Year
      If you don't tell us the error message, we can't definitively tell you what the cause was. But there are several possibilities.
      1. You omitted the required colon after the bysort clause
      2. You've told us the variable is FirmID, but your code uses firmID
      3. Something else entirely because there are an untold number of errors possible in Stata, spoken as someone whose made a new one every day, or so it feels
      Thank you, it was the required colon after the bysort clause that I had omitted.

      May I ask what the intuition is of putting an [1] behind Prf ?


      You are right, I need to be more careful stating the variables with(out) uppercases. It is FirmID.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Welcome to Statalist, Jonathan

        I think you want the following, which for every observation with the same firmID examines just the first value of Prf for the firmID, which after sorting will be the lowest value.
        Code:
        bysort FirmID (Prf) : drop if Prf[1] < 0
        sort FirmID Year
        If you don't tell us the error message, we can't definitively tell you what the cause was. But there are several possibilities.
        1. You omitted the required colon after the bysort clause
        2. You've told us the variable is FirmID, but your code uses firmID
        3. Something else entirely because there are an untold number of errors possible in Stata, spoken as someone whose made a new one every day, or so it feels
        I think it needs to be _N instead of 1

        Code:
        bysort FirmID (Prf) : drop if Prf[_N] < 0
        sort FirmID Year
        I must say I do know the meaning of [1] but not really what _N means. The difference is that _N looks to all rows, right?
        Last edited by Jonathan Smits; 27 Apr 2017, 14:07.

        Comment


        • #5
          William was right.

          If you sort on Prf within panels then lowest values come first and highest values come last within each panel.

          So if any value is negative the first will be too and so checking Prf[1] is sufficient for you to know to whether to drop any panel.

          Conversely the last value Prf[_N] will be negative if and only if all values are negative. Checking Prf[_N] is necessary if the criterion is that all values must be negative. to drop any panel.

          _N is the observation number of the last observation, but under by: that means within the groups defined by by:.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            William was right.

            If you sort on Prf within panels then lowest values come first and highest values come last within each panel.

            So if any value is negative the first will be too and so checking Prf[1] is sufficient for you to know to whether to drop any panel.

            Conversely the last value Prf[_N] will be negative if and only if all values are negative. Checking Prf[_N] is necessary if the criterion is that all values must be negative. to drop any panel.

            _N is the observation number of the last observation, but under by: that means within the groups defined by by:.
            Thank you very very much!

            Comment

            Working...
            X