Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing the base year, and creating an index from that year in a time series

    Hello,

    I have a similar problem as last time. But slightly different and have been trying for a few hours and cannot seem to get it right.
    The past thread similar to this was this one: https://www.statalist.org/forums/for...n-index-series

    In short what I should get is an index with base year 2007 = 1, and the differences up to the end of the time series for every country.

    How could I do this for this data set?

    And for the record, would it be possible to split the different panels (countries) into different datasets? What i mean is to get, AT from year 2007 to the end of the time series, and put it in another country, and do it for every country?

    Thank you.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 country int year float value
    "AT"   2007  92.5
    "AT"   2008    97
    "AT"   2009  95.3
    "AT"   2010  99.1
    "AT"   2011 103.8
    "AT"   2012 104.8
    "AT"   2013 103.7
    "AT"   2014 102.2
    "AT"   2015   100
    "AT"   2016  97.5
    "AT"   2017  99.4
    "BE"   2007  90.9
    "BE"   2008  99.3
    "BE"   2009  92.2
    "BE"   2010  97.2
    "BE"   2011 105.1
    "BE"   2012 108.8
    "BE"   2013 109.4
    "BE"   2014 104.4
    "BE"   2015   100
    "BE"   2016  98.2
    "BE"   2017 106.9
    "BG"   2007  78.2
    "BG"   2008  88.6
    "BG"   2009  84.8
    "BG"   2010  90.9
    "BG"   2011  98.7
    "BG"   2012   104
    "BG"   2013 102.6
    "BG"   2014 101.7
    "BG"   2015   100
    "BG"   2016  97.1
    "BG"   2017 101.2
    "CH"   2007 102.3
    "CH"   2008 106.6
    "CH"   2009   104
    "CH"   2010 104.6
    "CH"   2011 104.9
    "CH"   2012 104.4
    "CH"   2013 104.2
    "CH"   2014 103.5
    "CH"   2015   100
    "CH"   2016  98.6
    "CH"   2017  98.4
    "CY"   2007  85.7
    "CY"   2008  95.7
    "CY"   2009    94
    "CY"   2010  97.8
    "CY"   2011 103.5
    "CY"   2012 111.9
    "CY"   2013 109.5
    "CY"   2014   106
    "CY"   2015   100
    "CY"   2016  95.3
    "CY"   2017  98.9
    "CZ"   2007  93.5
    "CZ"   2008  97.7
    "CZ"   2009  94.6
    "CZ"   2010  95.8
    "CZ"   2011 101.1
    "CZ"   2012 103.3
    "CZ"   2013 104.1
    "CZ"   2014 103.3
    "CZ"   2015   100
    "CZ"   2016  96.7
    "CZ"   2017  98.5
    "DE"   2007  93.7
    "DE"   2008  98.8
    "DE"   2009  94.8
    "DE"   2010  96.1
    "DE"   2011   101
    "DE"   2012 102.7
    "DE"   2013 102.8
    "DE"   2014 101.8
    "DE"   2015   100
    "DE"   2016  98.4
    "DE"   2017 100.8
    "DK"   2007  86.2
    "DK"   2008  97.6
    "DK"   2009  91.1
    "DK"   2010  96.9
    "DK"   2011 104.4
    "DK"   2012 107.1
    "DK"   2013 109.5
    "DK"   2014 106.9
    "DK"   2015   100
    "DK"   2016  98.5
    "DK"   2017 101.7
    "EA19" 2007  92.8
    "EA19" 2008  98.4
    "EA19" 2009  93.6
    "EA19" 2010  96.2
    "EA19" 2011 101.7
    "EA19" 2012 104.4
    "EA19" 2013 104.2
    "EA19" 2014 102.7
    "EA19" 2015   100
    "EA19" 2016  97.9
    "EA19" 2017 100.8
    "EE"   2007  79.8
    end
    Last edited by Falco Wolf; 09 Dec 2018, 15:43.

  • #2
    So, here is the code to create an inflation index (I'm actually calling it a GDP deflator) with base year == 2007. I would recommend that you keep it all countries in one dataset and then match by country and year, but I will show you how to loop over countries and save them to different files.

    Code:
    bysort country (year): gen def_2007 = value / value[1]  // creating the deflator with base year == 2007
    gen value_base_2007 = def_2007 * 100  // just showing the new index (if needed)
    
    * Just formatting them (to make them easier to read)
    format def_2007 %9.3f
    format value %9.1f
    format value_base_2007 %9.1f
    
    * Just labeling the variables
    label var def_2007 "GDP Deflator with base year = 2007"
    label var value_base_2007 "Index with 2007 == 100 (just def_2007 * 100)"
    
    . list country year value def_2007 value_base_2007 if inlist(country, "AT", "BE", "BG"), sepby(country) noobs
    
      +----------------------------------------------+
      | country   year   value   def_2007   val~2007 |
      |----------------------------------------------|
      |      AT   2007    92.5      1.000      100.0 |
      |      AT   2008    97.0      1.049      104.9 |
      |      AT   2009    95.3      1.030      103.0 |
      |      AT   2010    99.1      1.071      107.1 |
      |      AT   2011   103.8      1.122      112.2 |
      |      AT   2012   104.8      1.133      113.3 |
      |      AT   2013   103.7      1.121      112.1 |
      |      AT   2014   102.2      1.105      110.5 |
      |      AT   2015   100.0      1.081      108.1 |
      |      AT   2016    97.5      1.054      105.4 |
      |      AT   2017    99.4      1.075      107.5 |
      |----------------------------------------------|
      |      BE   2007    90.9      1.000      100.0 |
      |      BE   2008    99.3      1.092      109.2 |
      |      BE   2009    92.2      1.014      101.4 |
      |      BE   2010    97.2      1.069      106.9 |
      |      BE   2011   105.1      1.156      115.6 |
      |      BE   2012   108.8      1.197      119.7 |
      |      BE   2013   109.4      1.204      120.4 |
      |      BE   2014   104.4      1.149      114.9 |
      |      BE   2015   100.0      1.100      110.0 |
      |      BE   2016    98.2      1.080      108.0 |
      |      BE   2017   106.9      1.176      117.6 |
      |----------------------------------------------|
      |      BG   2007    78.2      1.000      100.0 |
      |      BG   2008    88.6      1.133      113.3 |
      |      BG   2009    84.8      1.084      108.4 |
      |      BG   2010    90.9      1.162      116.2 |
      |      BG   2011    98.7      1.262      126.2 |
      |      BG   2012   104.0      1.330      133.0 |
      |      BG   2013   102.6      1.312      131.2 |
      |      BG   2014   101.7      1.301      130.1 |
      |      BG   2015   100.0      1.279      127.9 |
      |      BG   2016    97.1      1.242      124.2 |
      |      BG   2017   101.2      1.294      129.4 |
      +----------------------------------------------+
    Code to loop over each country (assuming original file with all data is name orig_data.dta)
    (Hat tip Nick Cox from Show to split a data set
    Code:
    levelsof country, local(my_countries) 
    foreach i of local my_countries  {
    use orig_data.dta
    keep if country== "`i'"
    save "`i'_inflation_index.dta", replace
    }

    Comment


    • #3
      David,

      Thanks for your help.

      When running the loop i get an error that says:

      "
      . foreach i of local my_countries {
      2. use orig_data.dta
      3. keep if country== "`i'"
      4. save "`i'_inflation_index.dta", replace
      5. }
      no; data in memory would be lost
      r(4);

      end of do-file"

      What is the problem? Why can i not run the loop¿

      Comment


      • #4
        You need the clear option on use.

        But, but, but: why do you want a separate dataset for each country? What advantage will that serve?

        Comment


        • #5
          Nick,

          You mean I will need to do on 2. of the loop: use orig_data.dta, clear?

          My end goal is to merge this newly created index into other files that are country specific. ALl the other files are too big to have all the countries in one file, so we have to do it country specific... which takes longer but I think I don't have other choices...(or maybe I do but I thought I was constrained to do it this way because of the weight of the files).

          Comment


          • #6
            Yes, that's correct. Your loop should be more like

            Code:
            foreach i of local my_countries {
                use orig_data.dta, clear 
                keep if country== "`i'"
                save "`i'_inflation_index.dta", replace
            }

            Comment


            • #7
              Falco,

              Those other country-specific files might be too large to combine into a single file, but this inflation_index file won't be (100 countries * 20 yrs == 2,000 obs). Just add a column to the country-specific file with country. It will be the same for all obs in that dataset (so may seem kind of pointless), but you can then use it to merge data in from inflation_index.dta.

              Code:
              * I assume AT means "Austria" and data has a variable named country
              use Austria_data.dta, clear
              merge m:1 country year using inflation_index.dta, keepusing(def_2007) keep(match master)
              * As long as each dataset had a variable country, could merge in def_2007 using a loop

              Comment


              • #8
                Hey Benson,

                Thanks for the response.

                Would their be a way to create a loop to merge all the countries: m:1 country year using indlation_index.dta
                not to do it automatically? I have around 20 countries.

                For example, say I want to add the inflation column to the estonia, latvia and croatia dataset, from only one do file. Is this possible?

                Comment


                • #9
                  Falco Wolf,

                  Sure. Your code would look something like this (I didn't have the country abbrev for estonia, latvia and croatia handy so I used "AT" "BE" "BG" "CH" )

                  Code:
                  * Using a loop to merge in variable def_2007 from inflation index file
                  local inflation_file "C:\Data\Inflation data\Inflation_index.dta"  // I find it easier to put long filenames in a Stata macro
                  
                  * NOTE: The red below is just to highlight where the `i' is
                  foreach i in "AT" "BE" "BG" "CH"  {
                  use  "C:\Data\Country files\`i'_orig_data.dta", clear
                  merge m:1 country year using "`inflation_file'", keepusing(def_2007) keep(match master) gen(merge_def_2007)
                  * Do other stuff with file (label variables, rename vars, add notes, desc, summarize, etc) here while the file is open
                  save "C:\Data\Country files\`i'_orig_data.dta", replace
                  }
                  NOTE: The above code assumes:
                  1) Country files are named "AT_orig_data.dta", "BE_orig_data.dta", "BG_orig_data.dta", etc

                  2) I kept the filepath's in just to show that it can be done (and the files don't have to be in Stata's default directory or path)

                  3) The variable to merge in is called def_2007
                  Last edited by David Benson; 19 Dec 2018, 13:28.

                  Comment


                  • #10
                    Thanks!

                    However, I'm having a bit of trouble adapting the loop. I get this error: file C:\Falco\Data\CountriesCY_orig_data.dta not found

                    So, I have the inflation_index (with the variables I want to merge) in the same folder file: C:\Falco\Data\Countries

                    The countries i'm trying to loop for now are, CY (cyprus), MT (Malta), and IE (Ireland.
                    The respective files are:
                    CY_orig_data
                    MT_orig_data
                    IE_orig_data
                    they are in the same folder file: C:\Falco\Data\Countries

                    So what I did was:

                    local inflation_file "C:\Falco\Data\Countries"

                    foreach i in "CY" "IE" "MT" {
                    use "C:\Falco\Data\Countries`i'_orig_data", clear
                    merge m:1 country year using "`inflation_file'", keepusing(def_2007 ppi_value) keep(match master) gen(merge_def_2007)
                    save "C:\Falco\Data\Countries`i'_orig_data.dta", replace
                    }


                    Where is the error in the loop?

                    Comment


                    • #11
                      So there are 2 problems that I can see:
                      1) You need an extra "/" after Countries and before "`i'_orig_data"
                      2) Your local inflation_file "C:\Falco\Data\Countries" lists the path, but does not include the actual inflation_index file name

                      Code:
                      * Text in red is stuff I added / changed from your code
                      local inflation_file "C:\Falco\Data\Countries\inflation_index.dta"  // obviously, change to actual full path & filename of actual inflation index file
                      
                      foreach i in "CY" "IE" "MT" {
                      use "C:\Falco\Data\Countries\`i'_orig_data", clear
                      merge m:1 country year using "`inflation_file'", keepusing(def_2007 ppi_value) keep(match master) gen(merge_def_2007)
                      save "C:\Falco\Data\Countries\`i'_orig_data.dta", replace
                      }
                      A shortcut if you are using Windows, if you SHIFT + Right Click on a file (so, right click while holding down SHIFT), and then select "Copy as Path", you have saved the full filename and path to the clipboard and can then merely CTRL + V to paste it into the do file.

                      Click image for larger version

Name:	Copy as Path - smaller.png
Views:	1
Size:	57.3 KB
ID:	1475658

                      Comment


                      • #12
                        Backslashes before local macro references are a no-no-no. This is documented in

                        18.3.11 Constructing Windows filenames by using macros

                        Stata uses the \ character to tell its parser not to expand macros.

                        Windows uses the \ character as the directory path separator.

                        Mostly, there is no problem using a \ in a filename. However, if you are writing a program that
                        contains a Windows path in macro path and a filename in fname, do not assemble the final result as

                        `path'\`fname'

                        because Stata will interpret the \ as an instruction to not expand `fname'. Instead, assemble the
                        final result as

                        `path'/`fname'

                        Stata understands / as a directory separator on all platforms.
                        and also in https://www.stata-journal.com/sjpdf....iclenum=pr0042

                        Comment


                        • #13
                          Thanks Benson and Nick.

                          I'm still not getting it to run.

                          local inflation_file "C:\Falco\Data\Countries\inflation_index.dta" // This I understand
                          . foreach i in "CY" "IE" "MT" { //What does i do? does this mean i is all the files I have names CY, IE, MT?
                          2. use "C:\Falco\Data/`i', clear // It says CY.dta is not found when I do actually have the file in the directory and this is the path.
                          3. merge m:1 country year using "`inflation_file'", keepusing(def_2007 ppi_value) keep(match master) gen(merge_def_2007)
                          4. save "C:\Falco\Data/`i'_orig_data.dta", replace 5. } file C:\Falco\Data/CY, clear.dta not found
                          5.}
                          Last edited by Falco Wolf; 20 Dec 2018, 07:56.

                          Comment


                          • #14
                            Hi Falco Wolf ,

                            . foreach i in "CY" "IE" "MT" { //What does i do? does this mean i is all the files I have names CY, IE, MT?
                            No, what it means is that the first time it goes through the loop, every time there is an `i', Stata will replace `i' with "CY". The 2nd time through it will put "IE" every place it sees `i', and so on. Using the letter i is just a personal preference. You can use a, b, c, d, or whatever letter or word you want. People tend to use "v" or "var" when looping over variable lists. I often use "yr" when I have a different file for each year and I am looping over those years.

                            So in my loop that I posted in post #11, the first time through, it would be as if you had manually typed in:
                            Code:
                            * foreach i in "CY" "IE" "MT" {
                            
                            * First time through it would be as if you had manually typed in:
                            use "C:\Falco\Data\Countries/CY_orig_data", clear
                            merge m:1 country year using "`inflation_file'", keepusing(def_2007 ppi_value) keep(match master) gen(merge_def_2007)
                            save "C:\Falco\Data\Countries/CY_orig_data.dta", replace
                            * The second time through it would replace "CY" with "IE", 3rd time with "MT", and so on
                            2. use "C:\Falco\Data/`i', clear // It says CY.dta is not found when I do actually have the file in the directory and this is the path.
                            So I hope my explanation above explains this. The file you are looking for is CY_orig_data, but with "C:\Falco\Data/`i'" you've told it to look for "C:\Falco\Data/CY.dta" but Stata needs an exact match. (This isn't a command that finds every file with "CY" anywhere in the title). Also note, you are missing the closing parenthesis on line 2. That's why on line 4 Stata responds with:
                            Code:
                            file "C:\Falco\Data/CY, clear.dta" not found

                            I would go read the following 2 articles to get an overview of local macros, and foreach and forvalues loops. They are easy to follow and will make your Stata life so much easier.

                            Comment


                            • #15
                              Thank you very much !

                              Comment

                              Working...
                              X