Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating additional observations within the dataset

    Hello everyone,

    I am having a problem with how to use the fillin command (and whether it is the best command to use) to achieve the following:

    I have a dataset with four main dimensions: Year, Firm id, Exported Product, and Destination Country. Each firm i exports a certain value of product p to destination j in year t. I want that for each Year-Firm-Product combination to generate observations for all destinations countries that ever appeared for this product p in the dataset.

    This an example of how my dataset looks like if we assume that we have 2 firms, 1 product and 3 destinations:
    Year Firm ID Product Destination Exported Value
    2005 1 1 FRA 100
    2005 2 1 FRA 90
    2005 2 1 EGY 70
    2005 2 1 TUN 60

    So, Product 1 is being exported overall to 3 destinations (FRA, EGY, and TUN). So, I want that all of these destinations appear as well for Firm 1 and have zero as exported value as below:
    Year Firm ID Product Destination Exported Value
    2005 1 1 FRA 100
    2005 1 1 EGY 0
    2005 1 1 TUN 0
    2005 2 1 FRA 90
    2005 2 1 EGY 70
    2005 2 1 TUN 60
    Bearing in mind that my Year variable take values from 2005 to 2016, so I want this to be repeated for all years (i.e for each firm-product, I want to have all destinations to which this product was exported).

    I would appreciate any recommendation on how to achieve this, knowing that I have around 6 million observations in my dataset, so using "fillin Year Firm Product Destination" command (and then dropping what's not needed) is not possible to produce on my computer due to memory limits since it goes up to more than 200 million observations in this case.

    Many thanks in advance.

    Nada
    Last edited by Nada Hazem; 14 Apr 2023, 07:14.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year byte(firmid product) str3 destination byte exportedvalue
    2005 1 1 "FRA" 100
    2005 2 1 "FRA"  90
    2005 2 1 "EGY"  70
    2005 2 1 "TUN"  60
    end
    
    . fillin year firmid product destination
    
    . l
    
         +---------------------------------------------------------+
         | year   firmid   product   destin~n   export~e   _fillin |
         |---------------------------------------------------------|
      1. | 2005        1         1        EGY          .         1 |
      2. | 2005        1         1        FRA        100         0 |
      3. | 2005        1         1        TUN          .         1 |
      4. | 2005        2         1        EGY         70         0 |
      5. | 2005        2         1        FRA         90         0 |
         |---------------------------------------------------------|
      6. | 2005        2         1        TUN         60         0 |
         +---------------------------------------------------------+
    
    . replace exportedvalue = 0 if _fillin
    (2 real changes made)
    
    . l
    
         +---------------------------------------------------------+
         | year   firmid   product   destin~n   export~e   _fillin |
         |---------------------------------------------------------|
      1. | 2005        1         1        EGY          0         1 |
      2. | 2005        1         1        FRA        100         0 |
      3. | 2005        1         1        TUN          0         1 |
      4. | 2005        2         1        EGY         70         0 |
      5. | 2005        2         1        FRA         90         0 |
         |---------------------------------------------------------|
      6. | 2005        2         1        TUN         60         0 |
         +---------------------------------------------------------+

    Comment


    • #3
      Many thanks Nick Cox for your reply. The problem is that I have 6 million observations with more than 7000 firms and 1300 products (per year) for 12 years, and hence using "fillin year firmid product destination" in this case generates all possible combinations with these four dimensions and not only destinations for each firm-product-year. Here is a clearer example of my dataset:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int Year double Trader_ID str6 Product_HS6 str5 Cntry_Dest_Code double ExpVal_USD
      2005 2266785080 "010110" "KWT"                 0
      2005 2266785080 "010110" "SAU"                 0
      2006 2266785080 "010110" "KWT"                 0
      2006 2266785080 "010110" "SAU"                 0
      2007 2266785080 "010110" "KWT"                 0
      2007 2266785080 "010110" "SAU"                 0
      2008 2266785080 "010110" "KWT" 13089.39351116235
      2008 2266785080 "010110" "SAU" 114991.3446275556
      2009 2266785080 "010110" "KWT"  71542.8650417093
      2009 2266785080 "010110" "SAU"  228687.312141506
      2010 2266785080 "010110" "KWT"  76940.6785678685
      2010 2266785080 "010110" "SAU"  176044.882570945
      2011 2266785080 "010110" "KWT"   283993.92805548
      2011 2266785080 "010110" "SAU"                 0
      2012 2266785080 "010110" "KWT"                 0
      2012 2266785080 "010110" "SAU"                 0
      2013 2266785080 "010110" "KWT"  52000.0008424655
      2013 2266785080 "010110" "SAU"                 0
      2014 2266785080 "010110" "KWT"                 0
      2014 2266785080 "010110" "SAU"                 0
      2015 2266785080 "010110" "KWT"                 0
      2015 2266785080 "010110" "SAU"                 0
      2016 2266785080 "010110" "KWT"                 0
      2016 2266785080 "010110" "SAU"                 0
      2005 4580194280 "010110" "BEL"                 0
      2005 4580194280 "010110" "JOR"                 0
      2006 4580194280 "010110" "BEL"                 0
      2006 4580194280 "010110" "JOR"                 0
      2007 4580194280 "010110" "BEL"                 0
      2007 4580194280 "010110" "JOR"                 0
      2008 4580194280 "010110" "BEL"  151391.984849334
      2008 4580194280 "010110" "JOR"                 0
      2009 4580194280 "010110" "BEL"  11586.1201338052
      2009 4580194280 "010110" "JOR"                 0
      2010 4580194280 "010110" "BEL"                 0
      2010 4580194280 "010110" "JOR"                 0
      2011 4580194280 "010110" "BEL"                 0
      2011 4580194280 "010110" "JOR"                 0
      2012 4580194280 "010110" "BEL"                 0
      2012 4580194280 "010110" "JOR"                 0
      2013 4580194280 "010110" "BEL"                 0
      2013 4580194280 "010110" "JOR"                 0
      2014 4580194280 "010110" "BEL"                 0
      2014 4580194280 "010110" "JOR"  263000.003616491
      2015 4580194280 "010110" "BEL"                 0
      2015 4580194280 "010110" "JOR"  122000.001082295
      2016 4580194280 "010110" "BEL"                 0
      2016 4580194280 "010110" "JOR"                 0
      2005 4893066696 "010110" "ARE"                 0
      2005 4893066696 "010110" "BEL"                 0
      2005 4893066696 "010110" "JOR"                 0
      2005 4893066696 "010110" "KWT"                 0
      2005 4893066696 "010110" "QAT"                 0
      2005 4893066696 "010110" "SAU"                 0
      2006 4893066696 "010110" "ARE"                 0
      2006 4893066696 "010110" "BEL"                 0
      2006 4893066696 "010110" "JOR"                 0
      2006 4893066696 "010110" "KWT"                 0
      2006 4893066696 "010110" "QAT"                 0
      2006 4893066696 "010110" "SAU"                 0
      2007 4893066696 "010110" "ARE"                 0
      2007 4893066696 "010110" "BEL"                 0
      2007 4893066696 "010110" "JOR"                 0
      2007 4893066696 "010110" "KWT"                 0
      2007 4893066696 "010110" "QAT"                 0
      2007 4893066696 "010110" "SAU"                 0
      2008 4893066696 "010110" "ARE"                 0
      2008 4893066696 "010110" "BEL"                 0
      2008 4893066696 "010110" "JOR"                 0
      2008 4893066696 "010110" "KWT"                 0
      2008 4893066696 "010110" "QAT"                 0
      2008 4893066696 "010110" "SAU"                 0
      2009 4893066696 "010110" "ARE"  434.992164669016
      2009 4893066696 "010110" "BEL"  257550.531914311
      2009 4893066696 "010110" "JOR"  145129.144770675
      2009 4893066696 "010110" "KWT"  25006.8654177825
      2009 4893066696 "010110" "QAT"  55329.3130452888
      2009 4893066696 "010110" "SAU"  110085.782108112
      2010 4893066696 "010110" "ARE"  114927.218597978
      2010 4893066696 "010110" "BEL"  69695.4828076722
      2010 4893066696 "010110" "JOR"                 0
      2010 4893066696 "010110" "KWT"  20.0229479137962
      2010 4893066696 "010110" "QAT"  113832.961763919
      2010 4893066696 "010110" "SAU"  114244.856745985
      2011 4893066696 "010110" "ARE"                 0
      2011 4893066696 "010110" "BEL"                 0
      2011 4893066696 "010110" "JOR"                 0
      2011 4893066696 "010110" "KWT"  422723.340433584
      2011 4893066696 "010110" "QAT"                 0
      2011 4893066696 "010110" "SAU"  369759.174828908
      2012 4893066696 "010110" "ARE"                 0
      2012 4893066696 "010110" "BEL"                 0
      2012 4893066696 "010110" "JOR" 315915.3953798725
      2012 4893066696 "010110" "KWT"  9892.39508508646
      2012 4893066696 "010110" "QAT"                 0
      2012 4893066696 "010110" "SAU"                 0
      2013 4893066696 "010110" "ARE"                 0
      2013 4893066696 "010110" "BEL"                 0
      2013 4893066696 "010110" "JOR"  89999.9971009953
      2013 4893066696 "010110" "KWT"                 0
      end
      Last edited by Nada Hazem; 14 Apr 2023, 08:35.

      Comment


      • #4
        So, you should specify fewer variables.

        Comment

        Working...
        X