Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop for collapse + save file

    I'm trying to write a loop where I have reporter and partner countries and country groupings as separate variables and want to collapse and create new variables. The local is the country groupings and then I've tried to loop but get "var not found". Anyone have advice? Thanks in advance.

    local country_groups EU advanced advanced_no_US_EU advanced_no_Taiwan EMs EMs_no_PRC_RUS EMs_no_PRC developing other_middle_income non_advanced non_advanced_no_PRC middle_income middle_income_no_PRC

    foreach var in local country_groups {
    preserve
    collapse (sum) trade_world trade_PRC trade_US trade_PRC_US total_trade total_trade_HS if x==1, by(HS_chapter year)
    gen reporter="var"
    *China trade share
    gen trade_share_PRC=trade_PRC/trade_world
    *US trade share
    gen trade_share_US=trade_US/trade_world
    *China trade/China+US trade
    gen trade_share_PRC_US=trade_PRC/trade_PRC_US
    save `x'_US_PRC, replace
    }

  • #2
    As you do not show example data, it is not immediately clear what is going on here. I can think of a few possibilities.

    However, in all cases, I can see a couple of problems with the code in #1 which, regardless, you will need to address.

    1. You -preserve- the data before collapsing it at the top of the loop, but you never restore it, so on your second iteration you are trying to collapse the already collapsed data -if x == 1-. But whatever x may have been in the original data, the -collapse- command eliminates it, so Stata will, at that point, complain that it cannot find x. Actually, I have a hunch that even in the original data there is no variable named x, and that what you meant to do is select for those observations where `var' == 1. Doing that would make each iteration of the loop work with different data, which makes sense. As written, with x, the loop, were it to run, would reuse exactly the same data on each iteration, which is pointless.

    2. The command -gen reporter = "var"- creates a new variable and sets its value, in every observation, to the string "var". It does not set it to the value of the iterator var in your loop. While this is perfectly legal, it is hard to imagine this is what you want. It seems rather pointless, at best.

    3. Your -save- command at the bottom of the loop is going to simply overwrite a file named _US_PRC at each iteration. Since that means that everything except the last iteration's worth of data processing will be lost, the loop could be just as effectively accomplished with no loop at all and just running the commands once with local var set to middle_income_no_prc. This is happening because the filename begins with `x', but there is no definition of local macro x, so it is interpreted as an empty string.

    All in all, I think what you want is:

    Code:
    local country_groups EU advanced advanced_no_US_EU advanced_no_Taiwan EMs EMs_no_PRC_RUS EMs_no_PRC developing other_middle_income non_advanced non_advanced_no_PRC middle_income middle_income_no_PRC
    
    preserve
    foreach var in local country_groups {
        collapse (sum) trade_world trade_PRC trade_US trade_PRC_US total_trade total_trade_HS ///
            if `var'==1, by(HS_chapter year)
        gen reporter="`var'"
        *China trade share
        gen trade_share_PRC=trade_PRC/trade_world
        *US trade share
        gen trade_share_US=trade_US/trade_world
        *China trade/China+US trade
        gen trade_share_PRC_US=trade_PRC/trade_PRC_US
        save `var'_US_PRC, replace
        restore, preserve
    }
    If this does not do what you want, please post back and include:

    1. Example data. Use the -dataex- command to do this. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    2. The exact code that you are actually running. Do this by copy/paste from your Results window or log file directly to the Forum editor. Do not edit in any way. There is no such thing as an unimportant change.

    3. A display of the results you are getting, and, unless it would be blatantly obvious to any literate adult, a description of how the results differ from what you want to get.

    Comment


    • #3
      Thanks for your response Clyde. Here's the example data (I excluded the country groups because it was saying it exceeded the linesize limit but not really necessary here - I just grouped countries based on different incomes)

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str32 reporter str44 partner byte commoditycode str253 commodity double tradevalueus str11 HS_chapter
      "Cook Isds" "China" 91 "Clocks and watches and parts thereof"                                                                2197 "Machinery"
      "Nepal"     "China" 58 "Fabrics; special woven fabrics, tufted textile fabrics, lace, tapestries, trimmings, embroidery"  2367884 "Textiles" 
      "Romania"   "China" 68 "Stone, plaster, cement, asbestos, mica, etc articles"                                              249000 "Other"    
      "Israel"    "China" 94 "Furniture; bedding, mattresses, cushions and similar stuffed furnishing"                         31234000 "Machinery"
      "Zambia"    "China" 61 "Apparel and clothing accessories; knitted or crocheted"                                           3128304 "Other"    
      end


      Here's the code I'm running:

      ************************************************** *********
      ///******************** GROSS TRADE ********************///
      ************************************************** *********

      egen all_commodities=total(tradevalueus), by(reporter partner year)

      *Total commodities
      egen total_trade=total(tradevalueus), by(year reporter)
      egen total_trade_HS=total(tradevalueus), by(year reporter commoditycode)

      *Trade w/ World
      egen trade_world=total(tradevalueus) if partner=="World", by(reporter year commoditycode)

      *Trade w/ PRC
      egen trade_PRC=total(tradevalueus) if partner=="China", by(reporter year commoditycode)

      *Trade w/ U.S.
      egen trade_US=total(tradevalueus) if partner=="USA", by(reporter year commoditycode)

      *Trade w/ PRC + U.S.
      egen trade_PRC_US=total(tradevalueus) if partner=="China" | partner=="USA", by(reporter year commoditycode)

      destring commoditycode, replace

      save PRC_US_Gross_Trade_Subset_071922, replace
      use PRC_US_Gross_Trade_Subset_071922, clear

      local country_groups EU advanced advanced_no_US_EU advanced_no_Taiwan EMs EMs_no_PRC_RUS EMs_no_PRC developing other_middle_income non_advanced non_advanced_no_PRC middle_income middle_income_no_PRC

      foreach var in `country_groups' {
      preserve
      collapse (sum) trade_world trade_PRC trade_US trade_PRC_US total_trade total_trade_HS ///
      if `var'==1, by(HS_chapter year)
      gen reporter="`var'"
      *China trade share
      gen trade_share_PRC=trade_PRC/trade_world
      *US trade share
      gen trade_share_US=trade_US/trade_world
      *China trade/China+US trade
      gen trade_share_PRC_US=trade_PRC/trade_PRC_US
      save `var'_US_PRC_gross_trade, replace
      restore
      }


      When I run this, I get:


      . foreach var in `country_groups' {
      2. preserve
      3. collapse (sum) trade_world trade_PRC trade_US trade_PRC_US total_trade total_trade_HS ///
      > if `var'==1, by(HS_chapter year)
      4. gen reporter="`var'"
      5. *China trade share
      . gen trade_share_PRC=trade_PRC/trade_world
      6. *US trade share
      . gen trade_share_US=trade_US/trade_world
      7. *China trade/China+US trade
      . gen trade_share_PRC_US=trade_PRC/trade_PRC_US
      8. save `var'_US_PRC_gross_trade
      9. restore
      10. }

      .
      end of do-file


      But it doesn't seem like any new files are being saved. It seems to me that your corrections should have solved my issues - I think what you corrected for is what I want. I have these different country groupings and instead of collapsing for each group and creating the three variables, I'd like to just run it as a loop and save the subsets as separate .dta files. What I really want to do is to append each subset to my original dataset, but I'm not sure how to do that so I figured I'd just loop and then separately write a loop to append. But if you have ideas on how to combine, I'd be very grateful.

      Thanks again.

      Comment


      • #4
        But it doesn't seem like any new files are being saved.
        The code you say you ran is not consistent with the output you say you got. There is no way that Stata will abruptly remove the -, replace- option in your -save- command when it echoes the loop's commands to the output. Nor will it arbitrarily insert a blank before the final }. So I'm left in the position of having to guess what's going on. I can think of various situations in which the code, which looks otherwise correct, might fail to run and produce the files you seek. But all of those situations would also throw error messages. Perhaps you got error messages and didn't report them here?

        But here is what I think is more likely, because it would not produce any error messages. I think you are not running the entire code in one fell swoop, but are running it line by line or in chunks. In particular, if the command that originally defines local macro country_groups is run separately from the loop, here is what happens. The local macro country_groups gets defined. However, because you are executing only that line, or that line plus some others before the loop is reached, country_groups disappears immediately after that line (those lines) are run. When you then separately run the loop, and Stata sees a reference to country_groups, it is interpreted as an empty string. So Stata sees you saying: do this block of code for each of these variables, except there are no variables! So since there is nothing to do, Stata skips over the entire loop, and you get no computations and no files. If you run the entire code from beginning to end without interruption, you will not have this problem.

        What I really want to do is to append each subset to my original dataset
        That's a really terrible idea. The resulting data set will be a mish-mash of individual level and aggregated observations and will be essentially unusable for any kind of statistical analysis. At best you will have to write complicated and inefficient code to work around that inappropriate organization of the data. At worst you will get garbage out from the garbage going in. I know this kind of data organization is not uncommon in spreadsheets. But Stata is not a spreadsheet, and many things that are useful in spreadsheets (which, after all, are mainly designed for displaying data in ways that are helpful to human visual perception--the first commercial spreadsheet program was called Visi-Calc) are recipes for trouble in Stata. This is one of them.

        Appending all the newly created files to each other, however, is a very good idea. The code to do that is only slightly different from what I showed you in #2:

        Code:
        local country_groups EU advanced advanced_no_US_EU advanced_no_Taiwan EMs EMs_no_PRC_RUS EMs_no_PRC developing other_middle_income non_advanced non_advanced_no_PRC middle_income middle_income_no_PRC
        
        preserve
        clear
        tempfile building
        save `building', emptyok
        
        foreach var of local country_groups {
            restore, preserve
            collapse (sum) trade_world trade_PRC trade_US trade_PRC_US total_trade total_trade_HS ///
                if `var'==1, by(HS_chapter year)
            gen reporter="`var'"
            *China trade share
            gen trade_share_PRC=trade_PRC/trade_world
            *US trade share
            gen trade_share_US=trade_US/trade_world
            *China trade/China+US trade
            gen trade_share_PRC_US=trade_PRC/trade_PRC_US
            append using `building'
            save `"`building'"', replace
        }
        use `building', clear
        save combined_US_PRC_gross_trade, replace
        I notice that rather than -preserve-ing the data before the loop, and then running -restore, preserve- at the bottom of the loop as I did in #2, you are running -preserve- at the top of the loop and -restore- at the bottom. In the end, you will get the same results either way. But you are probably doubling the execution time as a result. The way you are doing it, Stata must write out the entire data set at the top of the loop each time through, and then read it in again at the bottom. So that's one read and one write for each iteration of the loop. If you do it as shown in #2 (and again here), you only write the data set out once, at the very beginning, and then it gets re-read in during each iteration. So that's one read per iteration, but no writes other than one just before the loop. Writing files is the slowest thing you can do in computing. The time spent doing those operations is undoubtedly as must as, and likely more than, the time spent on everything else in the entire do-file.


        Comment


        • #5
          Ah okay, I think you're right - I don't think I was running it all together. This code worked, and thank you for discussing how to append just the subsets together, as well as your detailed response on the processes and how exactly lines of code translate. This has been really helpful, very much appreciate it!

          Comment

          Working...
          X