Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Origin Dest Airline Data

    Dear stata users, per year quarter I've information on origin and destination for the airlines in the US. I want to generate a new variable (frequency) that counts the number of destinations from (for example ATL) from a specific airline per quarter of the year.

    I was thinking about using the egen command or collapse and than merge both datasets.

    Create number of destinations from the airports
    egen numdest = group(dest)

    or:
    collapse numdest, by(origin year quarter)
    merge one to one

    It looks like this: T100 domestic data
    Year Quarter Origin Dest Airline
    2005 Q1 ATL DWT Delta
    2007 Q2 ATL MEM Continental





  • #2
    If I understand your data organization correctly, the following will do it:
    Code:
    isid Origin Dest Year Quarter Airline
    
    by Airline Year Quarter Origin: gen num_dst = _N
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you Clyde.

      input str3(origin dest) str41 unique_carrier_name int departures_performed long passengers int year byte quarter

      "ATL" "BNA" "Regions Air, Inc." 47 218 2005 1
      "ATL" "BNA" "Regions Air, Inc." 47 292 2005 1
      "ATL" "MIA" "American Airlines Inc." 165 13314 2005 1
      "ATL" "ORD" "American Airlines Inc." 102 9202 2005 1
      "ATL" "DFW" "American Airlines Inc." 319 29358 2005 1


      It may seem that ATL-BNA for example per airline is given twice with the same year quarter combination, however the number of passengers is different from those observations.

      The isid code gives this error: variables origin dest year quarter unique_carrier do not uniquely identify the observations

      Comment


      • #4
        OK. Then the code I proposed in #2 is not suitable for your data. That's why a data example that fully represents the real data set is so important. The following code will work even though you have multiple observations for the same combination of airlines, year, quarter, origin and destination.

        Code:
        clear
        input str3(origin dest) str41 unique_carrier_name int departures_performed long passengers int year byte quarter
        "ATL" "BNA" "Regions Air, Inc." 47 218 2005 1
        "ATL" "BNA" "Regions Air, Inc." 47 292 2005 1
        "ATL" "MIA" "American Airlines Inc." 165 13314 2005 1
        "ATL" "ORD" "American Airlines Inc." 102 9202 2005 1
        "ATL" "DFW" "American Airlines Inc." 319 29358 2005 1
        end
        
        by unique_carrier_name year quarter origin dest, sort: gen num_dist = 1 if _n == 1
        by unique_carrier_name year quarter origin (dest): replace num_dist = sum(num_dist)
        by unique_carrier_name year quarter origin: replace num_dist = num_dist[_N]
        By the way, it was precisely because I couldn't be sure that airline year quarter origin and destination would really uniquely identify observations in your data that I included the -isid- command to check that assumption. I know that the code that followed would give incorrect answers if the assumption failed, which, it turns out, it did. This is, in general, a good coding practice: where your code can produce plausible-looking but incorrect results if some assumptions fail, test those assumptions before the code. Better to abort with an error message than to blunder on with unnoticed errors.
        Last edited by Clyde Schechter; 25 May 2019, 10:12.

        Comment


        • #5
          Thank you Clyde! I think this works

          Comment

          Working...
          X