Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse with (first) and (max)

    Hello,

    I'm trying to understand a line of code that collapses a dataset I'm working with (I've inherited the do-file that I'm trying to understand). My dataset has US imports data, disaggregated by product (with a corresponding product code called ic), exporter and unit (numbers, pounds, etc). I've given an example below:

    product exporter unit free dut ic value quantity miss_q_raw
    Cattle UK NUMBER 1 1 0010 21506 556 0
    Jute Austria 0 1 3363 54243 . 1


    And so on. free is a dummy that is 1 when there is no import tariff (and 0 otherwise); dut is a dummy that is 1 when there is an import tariff (and 0 otherwise); miss_q_raw is a dummy that is 1 when the quantity is missing, and 0 otherwise. The code I'm trying to understand is:

    Code:
    collapse (sum) value quantity (first) free dut product (max) miss_q_raw, by(ic exporter unit)
    order ic exporter value unit product miss_q_raw
    replace unit = "N/A" if miss_q_raw == 1
    By my understanding, collapse should produce the sum of value and quantity for each combination of exporter-ic-unit, and keep the max value of miss_q_raw, which is 1. But that is not what is produced in the output, which is:

    ic exporter value unit product miss_q_raw quantity free dut
    0010 Canada 57010 NUMBER Cattle 0 4475 1 0
    3351 Austria 1918 Apparel N/A 1 . 0 1


    I obviously don't understand what the collapse command is exactly doing here. What is the code doing?

    Regards,
    Saunok

  • #2
    I apologize for the misalignment when I tried to provide an example of what my data looks like. If there's a way for me to upload a part of my data so that it clarifies my problem, I'd be happy to do so.

    Regards,
    Saunok

    Comment


    • #3
      If there's a way for me to upload a part of my data so that it clarifies my problem, I'd be happy to do so.
      Use the dataex command to do this. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      When asking for help with code, always show example data. When showing example data, always use the dataex command.

      By default dataex will output the first 100 observations, but this can be changed using if and in and the obs() option. The output of dataex will look something like the following.
      Code:
      ----------------------- copy starting from the next line -----------------------
      [CODE]
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(x1 x2 x3) float x4 int x5 byte x6
       4195 24 1   2 10 0
      10371 16 3 3.5 17 0
       4647 28 3   2 11 0
      ...
       5079 24 4 2.5  8 1
       8129 21 4 2.5  8 1
       4296 21 3 2.5 16 1
      end
      label values x6 yesno
      label def yesno 0 "No", modify
      label def yesno 1 "Yes", modify
      [/CODE]
      ------------------ copy up to and including the previous line ------------------
      In your dataex output you will select the lines between, but not including, "copy starting from the next line" and "copy up to and including the previous line" and then paste that into your reply. The result presented in your post will look something like the following.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(x1 x2 x3) float x4 int x5 byte x6
       4195 24 1   2 10 0
      10371 16 3 3.5 17 0
       4647 28 3   2 11 0
      ...
       5079 24 4 2.5  8 1
       8129 21 4 2.5  8 1
       4296 21 3 2.5 16 1
      end
      label values x6 yesno
      label def yesno 0 "No", modify
      label def yesno 1 "Yes", modify

      Comment


      • #4
        By my understanding, collapse should produce the sum of value and quantity for each combination of exporter-ic-unit, and keep the max value of miss_q_raw
        collapse will produce one observation ffor each combination of exporter, ic, and unit. The observation will include the following variables.
        • exporter
        • ic
        • unit
        • value, containing the sum of value for all observations with that combination of exporter, id, and unit
        • quantity, containing the sum of quantity for all observations with that combination of exporter, id, and unit
        • free, containing free found in the first observation with that combination of exporter, id, and unit
        • dut, containing dut found in the first observation with that combination of exporter, id, and unit
        • miss_q_raw, containing the maximum of miss_q_raw for all observations with that combination of exporter, id, and unit
        I do not understand why you assert that the maximum value of miss_q_raw should be 1. Your two observations have two different combinations of exporter, id, and unit - one with miss_q_raw==0 and one with miss_q_raw==1, which is what your results show.

        Comment


        • #5
          William, thank you for your patience and the very detailed explanation. Using dataex, this is what my pre-collapsed data looks like:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str124 product str31 exporter str11 unit str1(free dut) str4 ic float(value quantity miss_q_raw)
          "Cattle" "United Kingdom" "NUMBER" "1" "0" "0010" 215066  556 0
          "Cattle" "Canada"         "NUMBER" "1" "0" "0010" 570101 4475 0
          "Cattle" "Mexico"         "NUMBER" "1" "0" "0010" 165635 2192 0
          end

          And this is what my data looks like, post-collapse (and re-ordering of some variables):

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str4 ic str31 exporter double value str11 unit str124 product float miss_q_raw double quantity str1(free dut)
          "0010" "Canada"         570101 "NUMBER" "Cattle" 0 4475 "1" "0"
          "0010" "Mexico"         165635 "NUMBER" "Cattle" 0 2192 "1" "0"
          "0010" "United Kingdom" 215066 "NUMBER" "Cattle" 0  556 "1" "0"
          end

          I now understand your explanation of collapse and why miss_q_raw can differ between two different combinations. Thank you!

          Regards,
          Saunok

          Comment

          Working...
          X