Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replicating a code, spotting an error in a loop

    Hello Stata Users,

    I am trying to reproduce the results of a recent paper : Gendron-Carrier et al. 2022. "Subways and Urban Air Pollution." AEJ : Applied Economics.

    They have one main do-file running other do-files and I have some questions, I am using Stata 17 SE.
    I must admit that I am not avery experienced Stata-user and, although I thoroughly searched for answers to these problems, I could not find any satisfying solution.

    1) I am having some trouble to locate precisely the errors of the code when I run it, especially in which do-file they are located as I have one do-file running the others. I saw some posts about the command set trace on but it does not seem to do anything in my case. Would you have any advice?

    2) I have a r(199) error in a loop because of a misplaced "/" but I don't where it could possibly be..
    Code:
    if 1 == 1
    {;
        global samples "subways";
        foreach sample of global samples
        {;
            ${output}${date}_validation_dataset.dta, clear;
            
            if "`sample'" == "subways"
            {;
                *keep only subway cities;
                keep if sample_subways == 1;
            };
            
            if "`sample'" == "opening"
            {;
                *keep only subway cities;
                keep if sample_opening == 1;
            };
            
            label var aod_mean_${ring}_Terra "AOD";
            label var aod_mean_${ring}_Aqua "AOD";
    
            foreach sat in "Aqua" "Terra"
            {;
                foreach PM in "PM10" "PM25"
                {;
                    preserve;
                    keep if aod_mean_${ring}_`sat' !=. & `PM' !=.;
                    sort urbancode;
                    by urbancode: gen count_cities = _n;
                    replace count_cities = 0 if count_cities >1;
                    gen count_city_years = 1;
                    collapse (mean) year `PM' aod_mean_${ring}_`sat' ${${pre_x_}C2_${ring}} ${${pre_x_}C4} aod_count_${ring}_`sat' (sum) count_city_years count_cities;
                    gen year2 = int(year);
                    drop year;
                    rename year2 year;
                    rename `PM' pm;
                    rename aod_mean_${ring}_`sat' aod;
                    rename aod_count_${ring}_`sat' count;
                    gen PM_str = "`PM'";
                    gen sat_str = "`sat'";
                    gen ring_str = "$ring";
                    order PM_str sat_str ring_str count_cities count_city_years year pm aod count;
                    save temp3/`PM'_`sat'_${ring}.dta, replace;
                    restore;
                };
            };
            foreach sat in "Aqua" "Terra"
            {;
                foreach PM in "PM10" "PM25"
                {;
                    if "`sat'" == "Aqua" & "`PM'" == "PM10"
                    {;
                        use temp3/`PM'_`sat'_${ring}.dta, clear;
                    };
                    else
                    {;
                        append using temp3/`PM'_`sat'_${ring}.dta;
                    };
                };
            };
            egen label = concat(PM_str sat_st ring_str);
            gen table_sort = 0;
            replace table_sort = 1 if label == "PM10Terra${ring}";
            replace table_sort = 2 if label == "PM10Aqua${ring}";
            replace table_sort = 3 if label == "PM25Terra${ring}";
            replace table_sort = 4 if label == "PM25Aqua${ring}";
            sort table_sort;
            order table_sort;
            drop label PM_str sat_st ring_str;
            xpose, clear varname;
            compress;
            set trace on
            outsheet using ${output}/`sample'_${ring}_summary_stats_aod_validation${foot}.xls,replace;
        };
    };if 1 == 1
    {;
        global samples "subways";
        foreach sample of global samples
        {;
            ${output}${date}_validation_dataset.dta, clear;
            
            if "`sample'" == "subways"
            {;
                *keep only subway cities;
                keep if sample_subways == 1;
            };
            
            if "`sample'" == "opening"
            {;
                *keep only subway cities;
                keep if sample_opening == 1;
            };
            
            label var aod_mean_${ring}_Terra "AOD";
            label var aod_mean_${ring}_Aqua "AOD";
    
            foreach sat in "Aqua" "Terra"
            {;
                foreach PM in "PM10" "PM25"
                {;
                    preserve;
                    keep if aod_mean_${ring}_`sat' !=. & `PM' !=.;
                    sort urbancode;
                    by urbancode: gen count_cities = _n;
                    replace count_cities = 0 if count_cities >1;
                    gen count_city_years = 1;
                    collapse (mean) year `PM' aod_mean_${ring}_`sat' ${${pre_x_}C2_${ring}} ${${pre_x_}C4} aod_count_${ring}_`sat' (sum) count_city_years count_cities;
                    gen year2 = int(year);
                    drop year;
                    rename year2 year;
                    rename `PM' pm;
                    rename aod_mean_${ring}_`sat' aod;
                    rename aod_count_${ring}_`sat' count;
                    gen PM_str = "`PM'";
                    gen sat_str = "`sat'";
                    gen ring_str = "$ring";
                    order PM_str sat_str ring_str count_cities count_city_years year pm aod count;
                    save temp3/`PM'_`sat'_${ring}.dta, replace;
                    restore;
                };
            };
            foreach sat in "Aqua" "Terra"
            {;
                foreach PM in "PM10" "PM25"
                {;
                    if "`sat'" == "Aqua" & "`PM'" == "PM10"
                    {;
                        use temp3/`PM'_`sat'_${ring}.dta, clear;
                    };
                    else
                    {;
                        append using temp3/`PM'_`sat'_${ring}.dta;
                    };
                };
            };
            egen label = concat(PM_str sat_st ring_str);
            gen table_sort = 0;
            replace table_sort = 1 if label == "PM10Terra${ring}";
            replace table_sort = 2 if label == "PM10Aqua${ring}";
            replace table_sort = 3 if label == "PM25Terra${ring}";
            replace table_sort = 4 if label == "PM25Aqua${ring}";
            sort table_sort;
            order table_sort;
            drop label PM_str sat_st ring_str;
            xpose, clear varname;
            compress;
            set trace on
            outsheet using ${output}/`sample'_${ring}_summary_stats_aod_validation${foot}.xls,replace;
        };
    };if 1 == 1
    {;
        global samples "subways";
        foreach sample of global samples
        {;
            ${output}${date}_validation_dataset.dta, clear;
            
            if "`sample'" == "subways"
            {;
                *keep only subway cities;
                keep if sample_subways == 1;
            };
            
            if "`sample'" == "opening"
            {;
                *keep only subway cities;
                keep if sample_opening == 1;
            };
            
            label var aod_mean_${ring}_Terra "AOD";
            label var aod_mean_${ring}_Aqua "AOD";
    
            foreach sat in "Aqua" "Terra"
            {;
                foreach PM in "PM10" "PM25"
                {;
                    preserve;
                    keep if aod_mean_${ring}_`sat' !=. & `PM' !=.;
                    sort urbancode;
                    by urbancode: gen count_cities = _n;
                    replace count_cities = 0 if count_cities >1;
                    gen count_city_years = 1;
                    collapse (mean) year `PM' aod_mean_${ring}_`sat' ${${pre_x_}C2_${ring}} ${${pre_x_}C4} aod_count_${ring}_`sat' (sum) count_city_years count_cities;
                    gen year2 = int(year);
                    drop year;
                    rename year2 year;
                    rename `PM' pm;
                    rename aod_mean_${ring}_`sat' aod;
                    rename aod_count_${ring}_`sat' count;
                    gen PM_str = "`PM'";
                    gen sat_str = "`sat'";
                    gen ring_str = "$ring";
                    order PM_str sat_str ring_str count_cities count_city_years year pm aod count;
                    save temp3/`PM'_`sat'_${ring}.dta, replace;
                    restore;
                };
            };
            foreach sat in "Aqua" "Terra"
            {;
                foreach PM in "PM10" "PM25"
                {;
                    if "`sat'" == "Aqua" & "`PM'" == "PM10"
                    {;
                        use temp3/`PM'_`sat'_${ring}.dta, clear;
                    };
                    else
                    {;
                        append using temp3/`PM'_`sat'_${ring}.dta;
                    };
                };
            };
            egen label = concat(PM_str sat_st ring_str);
            gen table_sort = 0;
            replace table_sort = 1 if label == "PM10Terra${ring}";
            replace table_sort = 2 if label == "PM10Aqua${ring}";
            replace table_sort = 3 if label == "PM25Terra${ring}";
            replace table_sort = 4 if label == "PM25Aqua${ring}";
            sort table_sort;
            order table_sort;
            drop label PM_str sat_st ring_str;
            xpose, clear varname;
            compress;
            set trace on
            outsheet using ${output}/`sample'_${ring}_summary_stats_aod_validation${foot}.xls,replace;
        };
    };
    Thanks for reading and please allow me to present my excuses if my questions seem trivial to you.

  • #2
    Perhaps
    Code:
            ${output}${date}_validation_dataset.dta, clear;
    should be
    Code:
            use ${output}${date}_validation_dataset.dta, clear;
    in each place that it appears, although because you do not show us how your global macros such as $output and $date are defined, it is not possible to be sure.

    Comment


    • #3
      I haven't bothered to understand what the program is trying to do., but there are at least 2 instances where the executed command is not delimited by a semicolon, which I think are the cause of your problems.

      Code:
      set trace on
      Should be

      Code:
      set trace on;
      On a stylistic note, it is considered bad style to change the delimiter for an entire program (-help #delimit-) such that each command ends with a semicolon rather than the default new line. It makes the syntax look more cluttered, but as you have found out, if you miss one, it can cause cryptic errors that do not make clear what the underlying problem is. None of the commands you have shown (at a skim) are written over multiple lines, except for the desire to structure code blocks. To me this makes the use of semicolon delimiter especially unnecessary. If you need to spread a command over multiple lines to enhance readability (as is often the case for graphing commands), that is perfectly reasonable. In these cases, a better approach would be to temporarily change the delimiter or use the -///- to flag the compiler to keep reading on the following line (see the technical note in -help delimit-). If it were me, I would go back and reformat your do-files to use the standard delimiter to enhance readability.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Perhaps
        Code:
        ${output}${date}_validation_dataset.dta, clear;
        should be
        Code:
        use ${output}${date}_validation_dataset.dta, clear;
        in each place that it appears, although because you do not show us how your global macros such as $output and $date are defined, it is not possible to be sure.
        Thank you William Lisowski , indeed it was a part of the problem ! The code is pretty large but I defined previously global macros.

        Leonardo Guizzetti, I replicate most of the code by just modifying the regressions and adding a few elements. I do not have chosen the #delimit parameters as I am working on the code of the authors, moreover I guess it must be useful in the other do-files. Thanks though I will check this out, indeed the code would be much more readable..

        Comment

        Working...
        X