Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop companies which have missing data on a variable for all years

    I have panel data of company year-observations. I want to drop companies where there is missing information for all years on whether they interacted with a research and technology organisation, but keep the missing company-year observations if there is data for that company in other years. My data looks something like this, where the variable id is a company identifier, year is the year and rto is 1 if the company interacted with a company that year, 0 if they did not, and . if there is no information on whether they interacted:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id year rto)
    1 1 0
    1 2 1
    1 3 0
    2 1 1
    2 2 1
    2 3 .
    3 1 .
    3 2 .
    3 3 .
    end
    What I need is a command that will drop company 3 from the dataset but not the observation for company 2 in year 3, because there is data on the company's RTO-interaction for the other years.

    Any help much appreciated!

  • #2
    Emil:
    you may want to try:
    Code:
    . bysort id (year): gen wanted=sum(rto)
    
    . drop if wanted==0
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Emil:
      you may want to try:
      Code:
      . bysort id (year): gen wanted=sum(rto)
      
      . drop if wanted==0
      In my data, I did not include an example of a company where there was data on the RTO-interaction for all years, but where the company had no RTO interaction in any year:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(id year rto)
      1 1 0
      1 2 1
      1 3 0
      2 1 1
      2 2 1
      2 3 .
      3 1 .
      3 2 .
      3 3 .
      4 1 0
      4 2 0
      4 3 0
      end
      Code:
      bysort id (year): gen wanted=sum(rto)  
      drop if wanted==0 
      Will drop company 4 because it has had no RTO-interaction. However, I see that the problem can be circumvened by creating a new variable:

      Code:
      gen temp = rto!=.
      bysort id (year): gen wanted=sum(temp)
      drop if wanted==0
      Last edited by Emil Alnor; 24 Feb 2023, 02:46. Reason: formatting

      Comment


      • #4
        Here is another way to do it that shows that a new variable is not needed.

        Code:
        bysort id (rto) : drop if missing(rto[1]) & missing(rto[_N])

        If after sorting the first value and the last value for an identifier are both missing, then they all are.

        This idea is discussed at greater length in an FAQ https://www.stata.com/support/faqs/d...ions-in-group/

        Comment

        Working...
        X