Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creation and multiplication of different variables in a loop

    I have drawn consumer price index (cpi) variables from Excel into Stata. The variables are by quarter and year from 2002-20022 (e.g., wq12002_cpi wq22002_cpi wq32002_cpi wq42002_cpi. . . wq12022_cpi wq22022_cpi wq32022_cpi wq42022_cpi). They each have one value between 0-1 to represent the percentage quarter-year change of cpi. I need to adjust wages by cpi for each quarter. (e.g., wq12002_cpi * wq12002_wage. . . wq42022_cpi * wq42022_wage).

    I created the following loop to open each file, do this calculation, and append the new variables to the main file BUT two issues 1). the new cpi variables show as missing 2). it only keeps the last variable of any set (e.g., wq42022_). This is true also for the year and quarter variables that I have created in the same loop:


    ************************************************** ******************************

    {
    preserve

    capture log close

    forvalues yr = 2002(1)2022 {
    foreach qtr of numlist 1/4 {

    global yr

    global qtr

    global filepath "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/"

    global filename "lfs_q`qtr'y`yr'" // e.g. lfs_q2y2004

    global suffix ".dta"

    use "${filepath}${filename}${suffix}" ,clear


    ************************************************** ******************************

    cd "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/"
    use "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/lfs_q`qtr'y`yr'.dta"

    rename *, lower

    ************************************************** ******************************

    ** TIME

    // Year - Reference Week/Year

    gen wQ`qtr'`yr'_yr = `yr'

    label variable wQ`qtr'`yr'_yr "Q`qtr'`yr' Year (`yr')"
    label values wQ`qtr'`yr'_yr wQ`qtr'`yr'_yrla

    ta wQ`qtr'`yr'_yr
    sum wQ`qtr'`yr'_yr


    // Calendar Quarter

    gen wQ`qtr'`yr'_qtr = `qtr'

    label variable wQ`qtr'`yr'_qtr "Q`qtr'`yr' Quarter (`qtr)"
    label values wQ`qtr'`yr'_qtr wQ`qtr'`yr'_qtrla

    ta wQ`qtr'`yr'_qtr
    sum wQ`qtr'`yr'_qtr


    ************************************************** ******************************
    ************************************************** ******************************

    append using "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/CPI.dta"

    * Wages/Pay

    rename (gross99 net99 grsswk netwk hourpay) (wQ`qtr'`yr'_paygro wQ`qtr'`yr'_paynet wQ`qtr'`yr'_paygro_wk wQ`qtr'`yr'_paynet_wk wQ`qtr'`yr'_paygro_hr)

    recode wQ`qtr'`yr'_paygro wQ`qtr'`yr'_paynet wQ`qtr'`yr'_paygro_wk wQ`qtr'`yr'_paynet_wk wQ`qtr'`yr'_paygro_hr (-9/-1=.)

    label variable wQ`qtr'`yr'_paygro "wQ`qtr'`yr' Gross Pay (GBP) - Continuous"
    label values wQ`qtr'`yr'_paygro wQ`qtr'`yr'_paygrola

    label variable wQ`qtr'`yr'_paynet "wQ`qtr'`yr' Net Pay (GBP) - Continuous"
    label values wQ`qtr'`yr'_paynet wQ`qtr'`yr'_paynetla

    label variable wQ`qtr'`yr'_paygro_wk "wQ`qtr'`yr' Weekly Gross Pay (GBP) - Continuous"
    label values wQ`qtr'`yr'_paygro_wk wQ`qtr'`yr'_paygro_wkla

    label variable wQ`qtr'`yr'_paynet_wk "wQ`qtr'`yr' Weekly Net Pay (GBP) - Continuous"
    label values wQ`qtr'`yr'_paynet_wk wQ`qtr'`yr'_paynet_wkla

    label variable wQ`qtr'`yr'_paygro_hr "wQ`qtr'`yr' Hourly Gross Pay (GBP) - Continuous"
    label values wQ`qtr'`yr'_paygro_hr wQ`qtr'`yr'_paygro_hrla


    ************************************************** ******************************

    keep wQ`qtr'`yr'_*

    save "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/lfs_q`qtr'y`yr'.dta", replace

    }
    }

    restore
    }

    ************************************************** ******************************
    ************************************************** ******************************

    In doing some checks, I found that when I have the original converted Excel file and check, for example:

    display wQ12002_cpi * wQ12002_cpi (i.e., cpi for quarter 1 wave 2002), the output is 1 (as expected because wQ2002 is a variable that equals 1 and it has one observation).

    BUT

    When I append the file to the main file and repeat the above, it gives me . (missing)


    See an example of what the variables look like:


    ************************************************** ******************************
    . ta wQ12002_cpi

    wQ12002_cpi | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 1 100.00 100.00
    ------------+-----------------------------------
    Total | 1 100.00

    . ta wQ12003_cpi

    wQ12003_cpi | Freq. Percent Cum.
    ------------+-----------------------------------
    .9853799 | 1 100.00 100.00
    ------------+-----------------------------------
    Total | 1 100.00

    ************************************************** ******************************

    -> ta of wQ42003_yr

    Q42003 Year |
    (2003) | Freq. Percent Cum.
    ------------+-----------------------------------
    2003 | 127,021 100.00 100.00
    ------------+-----------------------------------
    Total | 127,021 100.00


    -> ta of wQ42003_qtr

    Q42003 |
    Quarter |
    (`qtr) | Freq. Percent Cum.
    ------------+-----------------------------------
    4 | 127,021 100.00 100.00
    ------------+-----------------------------------
    Total | 127,021 100.00

    ************************************************** ******************************

    . sum wQ42003_wage wQ42003_wage_cpi wQ42003_wage_cpi_categorical

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    wQ42003_wage | 13,814 9175.29 14361.97 0 99995
    wQ42003_wa~pi | 0
    wQ42003_wa~al | 0

    ************************************************** ******************************


    Anyone have any insights?

    Thanks in advance!

    O

  • #2
    Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data from the CPI.dta dataset. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

    Be sure to use the dataex command to do this. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use the dataex command.

    Comment


    • #3
      Thank you William. Here is the output for seven cpi variables:

      . dataex wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input double(wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi)
      .6373314856290198 .6769704354518052 .6810971244573216 .6924518636062623 .7054353360488796 .724606604211476 .7401387618278098 .7427053214550157
      end
      ------------------ copy up to and including the previous line ------------------

      Listed 1 out of 1 observations


      Alter, does this attachment help?

      Comment


      • #4
        Thank you William. Here is the output for seven cpi variables:

        . dataex wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi

        ----------------------- copy starting from the next line -----------------------
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double(wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi)
        .6373314856290198 .6769704354518052 .6810971244573216 .6924518636062623 .7054353360488796 .724606604211476 .7401387618278098 .7427053214550157
        end
        ------------------ copy up to and including the previous line ------------------

        Listed 1 out of 1 observations


        Alternatively, the attachment may help?

        Comment


        • #5
          Not that I can help at this moment but I don't understand the structure at all. From your code in #1 it seems your data are structured like this, with 21 files inside 4 different folders broken down by quarters.

          Code:
          "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q1/"
            lfs_q1y2002.dta
            lfs_q1y2003.dta
            lfs_q1y2004.dta
            lfs_q1y2005.dta
            .
            .
            .
            lfs_q1y2022.dta
          
          "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q2/"
            lfs_q2y2002.dta
            lfs_q2y2003.dta
            lfs_q2y2004.dta
            lfs_q2y2005.dta
            .
            .
            .
            lfs_q2y2022.dta
          
          "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q3/"
            lfs_q3y2002.dta
            lfs_q3y2003.dta
            lfs_q3y2004.dta
            lfs_q3y2005.dta
            .
            .
            .
            lfs_q3y2022.dta
          
          "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q4/"
            lfs_q4y2002.dta
            lfs_q4y2003.dta
            lfs_q4y2004.dta
            lfs_q4y2005.dta
            .
            .
            .
            lfs_q4y2022.dta
          But that does not look like the code sample provided in #3 and #4 at all, because in those files all the years are already inside the file. Would be great if there is a brief description of the file structure, what does each file look like (with dataex), and a mock up of what you'd like to create.

          Comment


          • #6
            It is not clear to me that using append was the appropriate way of adding your CPI data to your quarterly datasets. And it is more useful to use Stata numeric quarterly date values than to have a separate year and quarter variables.

            So the first step is to work on your understanding of Stata's "date and time" variables, which are complicated with is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

            All Stata manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

            The next step is to understand that you don't want to have lots of variables with names like wQ12003_wage, wQ22003_wage, .... . That is what is known here as a "wide" layout of your data. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a "long" layout of your data rather than a wide layout of the same data.

            That means instead of one observation per individual (or whatever the unit of observation is in these datasets) you will have one observation for each combination of individual and quarter. The sort of problems you will encounter trying to use your wide data will almost certainly be solved by reshaping the data. It is much easier, for example, to compare the second observation to the first, the third to the second, and so on, than it is to compare the second variable to the first, the third to the second, etc.

            So at this point you reshape your CPI data from the wide layout to a long layout and create a Stata numeric date variable. Here's example code, applied to your example data. The tools you use are the rename and reshape commands.
            Code:
            help rename group // not just help rename
            help reshape
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input double(wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi)
            .6373314856290198 .6769704354518052 .6810971244573216 .6924518636062623 .7054353360488796 .724606604211476 .7401387618278098 .7427053214550157
            end
            rename (wQ(#)(####)_cpi) (w_cpi#_#)
            ds
            generate seq = _n
            reshape long w_cpi, i(seq) j(when) string
            drop seq
            generate date = quarterly(when,"QY") // create a Stata quarterly date variable
            format date %tq
            list, clean
            tempfile cpi
            save "`cpi'"
            Code:
            . rename (wQ(#)(####)_cpi) (w_cpi#_#)
            
            . ds
            w_cpi1_2022  w_cpi1_2020  w_cpi1_2018  w_cpi1_2016
            w_cpi1_2021  w_cpi1_2019  w_cpi1_2017  w_cpi1_2015
            
            . generate seq = _n
            
            . reshape long w_cpi, i(seq) j(when) string
            (j = 1_2015 1_2016 1_2017 1_2018 1_2019 1_2020 1_2021 1_2022)
            
            Data                               Wide   ->   Long
            -----------------------------------------------------------------------------
            Number of observations                1   ->   8           
            Number of variables                   9   ->   3           
            j variable (8 values)                     ->   when
            xij variables:
            w_cpi1_2015 w_cpi1_2016 ... w_cpi1_2022   ->   w_cpi
            -----------------------------------------------------------------------------
            
            . drop seq
            
            . generate date = quarterly(when,"QY") // create a Stata quarterly date variable
            
            . format date %tq
            
            . list, clean
            
                     when       w_cpi     date  
              1.   1_2015   .74270532   2015q1  
              2.   1_2016   .74013876   2016q1  
              3.   1_2017    .7246066   2017q1  
              4.   1_2018   .70543534   2018q1  
              5.   1_2019   .69245186   2019q1  
              6.   1_2020   .68109712   2020q1  
              7.   1_2021   .67697044   2021q1  
              8.   1_2022   .63733149   2022q1  
            
            . tempfile cpi
            
            . save "`cpi'"
            file /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_01058.000001 saved as .dta format
            
            .
            Next, you then append all your quarterly datasets such as lfs_q2y2004, creating a single dataset, creating a Stata quarterly date variable in the process. The following (untested) code shows what I have in mind.
            Code:
            // create empty Combined dataset
            clear
            save "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta", replace
            
            forvalues yr = 2002(1)2022 {
            forvalues qtr = 1/4 {
            
            global filepath "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/"
            global filename "lfs_q`qtr'y`yr'" // e.g. lfs_q2y2004
            global suffix ".dta"
            
            use "${filepath}${filename}${suffix}", clear
            
            ** TIME
            generate quarter = tq(`yr'q`qtr')
            format quarter %tq
            
            append "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta"
            save "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta", replace
            
            }
            }
            Finally, once that is done you can add your CPI data to this data.
            Code:
            help merge
            Code:
            use "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta", clear
            merge m:1 quarter using "`cpi'"

            Comment


            • #7
              Re. Ken's query. Really appreciate your response - thanks! There are two types of data. The first follows the format that you have listed. The second is the cpi data that was imported from Excel and that is structured as I outlined. The problem is perhaps the incompatibility of each.

              Re. William's response. Thank you so much! I will try this. I am certainly not experienced with time data in Stata by any means (at least by my own assessment) but I will attempt to do as instructed... Finger's crossed!

              Comment


              • #8
                The above code works, so thank you! But, I am not quite seeing how I can adjust each year/quarter wage by the 'correct' year/quarter cpi? (Note: my current syntax for a continuous and categorical variable is below; _paygro is wages and _paygro_cpi is wages [ideally] adjusted by year/quarter cpi)

                Code:
                gen wQ`qtr'`yr'_paygro_cpi = wQ`qtr'`yr'_paygro * wq_cpi        // CPI Quarter Year Change from 2002
                
                gen wQ`qtr'`yr'_paygro_cpi_c = wQ`qtr'`yr'_paygro_cpi
                replace wQ`qtr'`yr'_paygro_cpi_c = 1 if wQ`qtr'`yr'_paygro_cpi >= 1 & wQ`qtr'`yr'_paygro_cpi < 10000
                replace wQ`qtr'`yr'_paygro_cpi_c = 2 if wQ`qtr'`yr'_paygro_cpi >= 10000 & wQ`qtr'`yr'_paygro_cpi < 20000
                replace wQ`qtr'`yr'_paygro_cpi_c = 3 if wQ`qtr'`yr'_paygro_cpi >= 20000 & wQ`qtr'`yr'_paygro_cpi < 30000
                replace wQ`qtr'`yr'_paygro_cpi_c = 4 if wQ`qtr'`yr'_paygro_cpi >= 30000 & wQ`qtr'`yr'_paygro_cpi < 100000
                P.S. I am still in the midst of digesting the datetime documentation - thanks for the suggestion.

                Comment


                • #9
                  First of all, in the code in post #6 that reshapes the CPI data, change
                  Code:
                  generate date = quarterly(when,"QY") // create a Stata quarterly date variable
                  to
                  Code:
                  generate quarter = quarterly(when,"QY") // create a Stata quarterly date variable
                  After you append your quarterly datasets and merge the reshaped CPI dataset, your Combined dataset will have the variables
                  Code:
                  quarter gross99 net99 grsswk netwk hourpay when w_cpi
                  Going forward, something like the following.
                  Code:
                  rename (gross99 net99 grsswk netwk hourpay) (paygro paynet paygro_wk paynet_wk paygro_hr)
                  
                  recode paygro paynet paygro_wk paynet_wk paygro_hr (-9/-1=.)
                  
                  label variable paygro "Gross Pay (GBP) - Continuous"
                  label variable paynet "Net Pay (GBP) - Continuous"
                  label variable paygro_wk "Weekly Gross Pay (GBP) - Continuous"
                  label variable paynet_wk "Weekly Net Pay (GBP) - Continuous"
                  label variable paygro_hr "Hourly Gross Pay (GBP) - Continuous"
                  
                  gen paygro_cpi = paygro * w_cpi        // CPI Quarter Year Change from 2002
                  
                  gen paygro_cpi_c = paygro_cpi
                  replace paygro_cpi_c = 1 if paygro_cpi >= 1     & paygro_cpi < 10000
                  replace paygro_cpi_c = 2 if paygro_cpi >= 10000 & paygro_cpi < 20000
                  replace paygro_cpi_c = 3 if paygro_cpi >= 20000 & paygro_cpi < 30000
                  replace paygro_cpi_c = 4 if paygro_cpi >= 30000 & paygro_cpi < 100000
                  This works because with your data in a long layout, you no longer have different variable names for different quarters, and each observation has both the CPI and the wages data for the same quarter.

                  Comment


                  • #10
                    It all makes perfect sense and works brilliantly! You're genius!!!

                    Comment

                    Working...
                    X