Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating 9 months pollution exposure

    Hi, I am working on the topic of the impact of air pollution on Child Health, combining the data from the Demographic Health Survey (DHS) and NASA satellite data imagery. I have used the location from DHS data to calculate the mean PM2.5 for each month and each cluster.

    I have to construct a trimester pollution exposure by calculating mean PM2.5 for three month periods preceding month m childbirth. I have to construct a nine-month pollution exposure by calculating mean PM2.5 for nine-month periods preceding month m of childbirth.

    how I calculate trimester pollution exposure and nine-month pollution exposure.

  • #2
    When asking for help with code, it is almost always necessary to provide example data, because the code is likely to differ depending on details of the data itself and how it is organized. So please use the -dataex- program and post back with example data. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    In addition, please clarify what you mean be "preceding month m childbirth." Childbirth is, perhaps not an instant, but usually a matter of just a few hours, or at worst a couple of days. So I cannot imagine what "month m childbirth" means. Please enlighten me. Also indicate where it is found in your example data.

    Finally, because I am an epidemiologist and I also have some background in environmental medicine, I know what PM2.5 is. But this is a multi-disciplinary, international forum, and I am confident that most Forum mebers are not familiar with this term. It is always best to avoid jargon here: always explain your questions in terms that anybody with a college education and a minimal statistics background would understand.

    Comment


    • #3
      Hi, I am working on the topic of the impact of air pollution on Child Health, combining the data from the Demographic Health Survey (DHS) and NASA satellite data imagery.
      Child health and mother’s health characteristics comes from DHS data, whereas the air pollution comes from the NASA satellite data imagery.
      I have used the location from DHS data to calculate the mean air pollution for each month and each cluster. I have the air pollution level for each month from March 2003 to July 2018. I have a excel from for each month containing the air pollution data of each cluster.

      I have to construct a trimester pollution exposure by calculating mean air pollution for three-month periods preceding month m childbirth. I have to construct a nine-month pollution exposure by calculating mean air pollution for nine-month periods preceding month m of childbirth.

      how I calculate trimester pollution exposure and nine-month pollution exposure.

      b1 is the month of childbirth and b2 is the year of childbirth. So, I have to have the average pollution level before the month of child birth. Suppose the child is born in June 2005, so I need the average air pollution for the month of March 2005, April 2005 and May 2005.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(v001 v002) byte v009 int v010 byte(v012 v025 v131 v133 v136 v151 v701 v702) int(v704 v716) byte(v730 bord b1) int b2 byte(b4 b5 b8) float(agedays_death nm im cm)
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 8 10 1994 1 1 12 . 0 0 0
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 7 11 1993 1 0  . 0 1 1 1
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 6  9 1988 2 1 18 . 0 0 0
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 5  8 1986 2 1 20 . 0 0 0
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 4  4 1985 2 1 21 . 0 0 0
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 3  6 1984 1 0  . 0 1 1 1
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 2  6 1983 1 0  . 0 1 1 1
      1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 1  5 1982 2 1 24 . 0 0 0
      1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 4  9 2001 2 1  5 . 0 0 0
      1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 3 12 1992 1 0  . 5 1 1 1
      1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 2 12 1979 1 1 27 . 0 0 0
      1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 1  3 1978 1 1 28 . 0 0 0
      1 27  5 1971 35 1 2 10 6 1 2 5 39 67 41 2  2 2000 1 1  7 . 0 0 0
      1 27  5 1971 35 1 2 10 6 1 2 5 39 67 41 1  6 1996 2 1 10 . 0 0 0
      1 37 10 1970 36 1 2 10 5 1 3 6 39 67 38 3  7 2005 2 1  1 . 0 0 0
      1 37 10 1970 36 1 2 10 5 1 3 6 39 67 38 2  1 1999 1 1  8 . 0 0 0
      1 37 10 1970 36 1 2 10 5 1 3 6 39 67 38 1  6 1995 1 1 11 . 0 0 0
      1 47  3 1977 29 1 2  0 6 1 0 . 36 67 42 4 11 2004 1 1  2 . 0 0 0
      1 47  3 1977 29 1 2  0 6 1 0 . 36 67 42 3  6 2000 1 1  6 . 0 0 0
      1 47  3 1977 29 1 2  0 6 1 0 . 36 67 42 2  9 1994 2 1 12 . 0 0 0
      end
      label values v025 LABE
      label def LABE 1 "urban", modify
      label values v131 v131
      label def v131 2 "punjabi", modify
      label values v133 v133
      label values v151 LABL
      label values b4 LABL
      label def LABL 1 "male", modify
      label def LABL 2 "female", modify
      label values v701 v701
      label def v701 0 "no education", modify
      label def v701 2 "secondary", modify
      label def v701 3 "higher", modify
      label values v702 LABAV
      label values v704 v704
      label def v704 11 "accountants", modify
      label def v704 13 "teachers (all levels)", modify
      label def v704 36 "transport conductors", modify
      label def v704 39 "clerical and related workers nec", modify
      label values v716 v716
      label def v716 13 "teachers (all levels)", modify
      label def v716 67 "unemployed", modify
      label values v730 v730
      label values b5 LABN
      label def LABN 0 "no", modify
      label def LABN 1 "yes", modify
      Air pollution data for the month of March 2000

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int DHSCLUST double ZonalSt_sh
       1   .349314004182816
       2 .34931400418281555
       3 .34931400418281555
       4 .34931400418281555
       5  .2954939901828766
       6 .34931400418281555
       7  .3211260139942169
       8  .5356400012969971
       9  .5356400012969971
      10  .5356400012969971
      end
      Example data of air pollution for the month of April 2000.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int DHSCLUST double ZonalSt_sh
       1 .34931400418281555
       2 .34931400418281555
       3 .34931400418281555
       4 .34931400418281555
       5  .2954939901828766
       6 .34931400418281555
       7  .3211260139942169
       8  .5356400012969971
       9  .5356400012969971
      10  .5356400012969971
      end



      thanks

      Comment


      • #4
        With this data organization, you are many steps away from being able to do what you ask.

        First, you need to build up a single data set that contains the air pollution data for each location in each month. It appears that you can do that by appending the individual monthly data sets that you have all together, but you must add a variable to it showing the month-year. Within that data set, it is not hard to calculate averages over 3 and 9 month preceding windows.

        Second, you need to combine your month and year of birth variables into a single month-year variable. And, crucially, the variable DHSCLUST needs to be in this data set as well. Looking at the example data, I'm guessing that the variable v009 is, in fact, this variable. So in the illustrative code below, I rename it accordingly: the variable must have the same name in both data sets. if V009 isn't the DHSCLUST variable, then you need to rename whichever variable there does indicate the DHSCLUST. If there is no such variable, you have to get it: without the DHSCLUST it is impossible to properly match up the two kinds of data.

        So, the code is going to look something like this:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(v001 v002) byte v009 int v010 byte(v012 v025 v131 v133 v136 v151 v701 v702) int(v704 v716) byte(v730 bord b1) int b2 byte(b4 b5 b8) float(agedays_death nm im cm)
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 8 10 1994 1 1 12 . 0 0 0
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 7 11 1993 1 0  . 0 1 1 1
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 6  9 1988 2 1 18 . 0 0 0
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 5  8 1986 2 1 20 . 0 0 0
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 4  4 1985 2 1 21 . 0 0 0
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 3  6 1984 1 0  . 0 1 1 1
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 2  6 1983 1 0  . 0 1 1 1
        1  7  8 1962 44 1 2  5 7 1 3 4 11 67 49 1  5 1982 2 1 24 . 0 0 0
        1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 4  9 2001 2 1  5 . 0 0 0
        1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 3 12 1992 1 0  . 5 1 1 1
        1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 2 12 1979 1 1 27 . 0 0 0
        1 17  1 1959 48 1 2 16 2 2 3 4 13 13  . 1  3 1978 1 1 28 . 0 0 0
        1 27  5 1971 35 1 2 10 6 1 2 5 39 67 41 2  2 2000 1 1  7 . 0 0 0
        1 27  5 1971 35 1 2 10 6 1 2 5 39 67 41 1  6 1996 2 1 10 . 0 0 0
        1 37 10 1970 36 1 2 10 5 1 3 6 39 67 38 3  7 2005 2 1  1 . 0 0 0
        1 37 10 1970 36 1 2 10 5 1 3 6 39 67 38 2  1 1999 1 1  8 . 0 0 0
        1 37 10 1970 36 1 2 10 5 1 3 6 39 67 38 1  6 1995 1 1 11 . 0 0 0
        1 47  3 1977 29 1 2  0 6 1 0 . 36 67 42 4 11 2004 1 1  2 . 0 0 0
        1 47  3 1977 29 1 2  0 6 1 0 . 36 67 42 3  6 2000 1 1  6 . 0 0 0
        1 47  3 1977 29 1 2  0 6 1 0 . 36 67 42 2  9 1994 2 1 12 . 0 0 0
        end
        label values v025 LABE
        label def LABE 1 "urban", modify
        label values v131 v131
        label def v131 2 "punjabi", modify
        label values v133 v133
        label values v151 LABL
        label values b4 LABL
        label def LABL 1 "male", modify
        label def LABL 2 "female", modify
        label values v701 v701
        label def v701 0 "no education", modify
        label def v701 2 "secondary", modify
        label def v701 3 "higher", modify
        label values v702 LABAV
        label values v704 v704
        label def v704 11 "accountants", modify
        label def v704 13 "teachers (all levels)", modify
        label def v704 36 "transport conductors", modify
        label def v704 39 "clerical and related workers nec", modify
        label values v716 v716
        label def v716 13 "teachers (all levels)", modify
        label def v716 67 "unemployed", modify
        label values v730 v730
        label values b5 LABN
        label def LABN 0 "no", modify
        label def LABN 1 "yes", modify
        tempfile birth_data
        save `birth_data'
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int DHSCLUST double ZonalSt_sh
         1   .349314004182816
         2 .34931400418281555
         3 .34931400418281555
         4 .34931400418281555
         5  .2954939901828766
         6 .34931400418281555
         7  .3211260139942169
         8  .5356400012969971
         9  .5356400012969971
        10  .5356400012969971
        end
        tempfile 2000m3 // MARCH 2000
        save `2000m3'
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int DHSCLUST double ZonalSt_sh
         1 .34931400418281555
         2 .34931400418281555
         3 .34931400418281555
         4 .34931400418281555
         5  .2954939901828766
         6 .34931400418281555
         7  .3211260139942169
         8  .5356400012969971
         9  .5356400012969971
        10  .5356400012969971
        end
        tempfile 2000m4 // APRIL 200
        save `2000m4'
        
        //  ABOVE ARE THE DATA SETS
        //  ACTIVE CODE STARTS HERE
        
        //  COMBINE THE MONTHLY AIR POLLUTION DATA SETS
        clear
        tempfile all_months
        save `all_months', emptyok
        foreach f in 2000m3 2000m4 { // EXPAND TO FULL LIST OF MONTHLY FILES
            use ``f'', clear // IF YOUR DATA SETS ARE NOT TEMPFILES, -use `f'-
            gen mdate = tm(`f')
            append using `all_months'
            save `"`all_months'"', replace
        }
        format mdate %tm
        //  CALCULATE THREE AND NINE MONTH LAGGING AVERAGES
        rangestat (mean) lag3_Zonal = ZonalSt_sh, by(DHSCLUST) interval(mdate -3 -1)
        rangestat (mean) lag9_Zonal = ZonalSt_sh, by(DHSCLUST) interval(mdate -9 -1)
        isid DHSCLUST mdate, sort
        save `"`all_months'"', replace
        
        
        //  PREPARE THE BIRTH DATA SET TO MERGE WITH THE POLLUTION DATA
        use `birth_data', clear
        rename v009 DHSCLUST // THIS IS JUST A GUESS BECAUSE IT'S 1-10
        gen mdate = ym(b2, b1)
        assert missing(mdate) == missing(b2, b1)
        format mdate %tm
        
        //  NOW PUT THE BIRTH DATA SET TOGETHER WITH THE COMBINED POLLUTION DATA
        merge m:1 DHSCLUST mdate using `all_months', keep(master match)
        Notes:

        1. -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
        2. It is likely that your monthly pollution data sets are real data sets, not tempfiles. So in the loop that appends them all together, you will -use `f'-, or if the filenames contain embedded blanks -use `"`f'"'- instead of -use ``f''-.
        3. Similarly if the filenames are not 2000m3, 2000m4, etc. then you will have to have some other code that extracts the actual Stata numeric code for that month from the filename. Since I don't know what the filenames are, I can't help you with that.
        4. The code does not illustrate the solution well in your example data because your example pollution data is only for March and April of 2000, but none of the example birthdates are in those months.

        Although not essential to solving this particular problem, I highly recommend that you rename all the variables in these data sets to something that has mnemonic value. If you spend a few weeks away from this data and come back to it, it is unlikely you will remember what any of these variables are with names like v131 or b5.

        Comment


        • #5
          Hi thanks for your quick reply.


          please what is the use of this command,

          tempfile all_months Regards

          Comment


          • #6
            This command tells Stata to create a temporary file and to store its name in the local macro all_months. Thereafter, any reference to `all_months' is a reference to that file. The nice thing about temporary files is that they are, well, temporary. They are very useful for holding configurations of data that are needed for intermediate calculations but are not needed for the long run. Temporary files are automatically deleted after the program that creates them ends, so you don't have to go about hunting them down and erasing them to clear out space on your hard drive.

            In this instance, the purpose was to create a file that would hold all of the data in the various monthly pollution files. In fact, I would guess that you might want to save that result as a permanent file as it may well have other uses later on in your project. But as I was writing demonstration code to run on my machine, and I have no reason to save your example data on my computer for the long haul. So I put it into a temporary file: available for the moment and gone as soon as I'm done with it.

            Comment


            • #7
              thanks my air pollution excel files names are ZS_2000_03, ZS_2000_04 so on till ZS_2018_07. should i go and change my excel file names to 2000m3 formate.

              Comment


              • #8
                You could, but I wouldn't. I'd rather have Stata calculate the file names while it loops over the months from March 2003 through Julyh 2018.

                Also, I wouldn't work with the Excel files here. First get all of them imported to Stata. I have the sense that, in fact, you have already done that. If not, do it now. I'll assume that the Stata files you create are named ZS_2003_03.dta through ZS_2018_07.dta. Then you can use this code:

                Code:
                //  COMBINE THE MONTHLY AIR POLLUTION DATA SETS
                clear
                local first_month = tm(2003m3)
                local last_month = tm(2018m7)
                
                
                tempfile all_months
                save `all_months', emptyok
                forvalues m = `first_month'/`last_month' {
                    local yy = year(dofm(`m'))
                    local mm: display %02.0f =month(dofm(`m'))
                    use ZS_`yy'_`mm', clear
                    gen mdate = `m'
                    append using `all_months'
                    save `"`all_months'"', replace
                }
                format mdate %tm
                // CALCULATE THREE AND NINE MONTH LAGGING AVERAGES
                rangestat (mean) lag3_Zonal = ZonalSt_sh, by(DHSCLUST) interval(mdate -3 -1)
                rangestat (mean) lag9_Zonal = ZonalSt_sh, by(DHSCLUST) interval(mdate -9 -1)
                isid DHSCLUST mdate, sort
                save `"`all_months'"', replace
                The parts of the code that are different from what was shown in #4 are shown in bold face.

                Comment


                • #9
                  Thanks


                  file ZS_2003_03.dta not found


                  I am getting this error message, please how to specify the path here, please


                  my Stata data files are located at E:\Health and air quality\December\Extract2



                  Thanks

                  Comment


                  • #10
                    Hi it has worked by specifying the working directory

                    Comment


                    • #11
                      Thanks a lot Clyde Schechter

                      Comment


                      • #12
                        HI Clyde Schechter

                        what is the purpose of this command
                        assert missing(mdate) == missing(b2, b1)

                        Comment


                        • #13
                          The code immediately before that line creates a Stata internal format monthly date variable from your separate month and year variables b1 and b2. The purpose of the assert command is to verify that this was successfully concluded. So, if your data set had an observation with b1 = 13, you have a problem because the month must always be an integer between 1 and 12. When the Stata -monthly()- function encounters an invalid month, it returns missing value. So the -assert- command would notice that in that observation, mdate is missing even though b1 and b2 are not: which can only happen if b1 in that observation does not define a valid month. Since the remainder of the work you will be doing requires a valid monthly date variable, this is a crucial check on the validity of your data.

                          Comment


                          • #14
                            Hi sir if i have my pollution data in one file rather than a separate file for each month how I will have to change the commands



                            ----------------------- copy starting from the next line -----------------------
                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input int ClusterPoints double(ZS20003 ZS20004 ZS20005 ZS20006 ZS20007 ZS20008)
                            1  .2480315 .53149605 .47244096 .37007874 .68503934 .62598425
                            2  .2519685 .42519686 .31102362 .35039371 .46850392 .53543305
                            3  .2480315 .53149605 .47244096 .37007874 .68503934 .62598425
                            4  .2480315 .53149605 .74409449 .64960629 .62204725 .68503934
                            5 .22834645 .55118108 .81889766 .61417323 .64566928 .66929132
                            end
                            ------------------ copy up to and including the previous line ------------------

                            Listed 5 out of 972 observations

                            Comment


                            • #15
                              From what you show it appears that there are more changes than just having everything in one file. You have only yearly, not monthly data, and the data are in wide layout.

                              With only yearly data, it isn't possible to calculate 3 month or 9 month lags.

                              So it is not a matter of changing the data: the problem as originally stated cannot be solved with this data. So think about how you want to change the problem itself and then post back.

                              Comment

                              Working...
                              X