Creation and multiplication of different variables in a loop

Olivia Hamilton

Join Date: Oct 2022

Posts: 18
#1

Creation and multiplication of different variables in a loop

24 Oct 2022, 09:36

I have drawn consumer price index (cpi) variables from Excel into Stata. The variables are by quarter and year from 2002-20022 (e.g., wq12002_cpi wq22002_cpi wq32002_cpi wq42002_cpi. . . wq12022_cpi wq22022_cpi wq32022_cpi wq42022_cpi). They each have one value between 0-1 to represent the percentage quarter-year change of cpi. I need to adjust wages by cpi for each quarter. (e.g., wq12002_cpi * wq12002_wage. . . wq42022_cpi * wq42022_wage).

I created the following loop to open each file, do this calculation, and append the new variables to the main file BUT two issues 1). the new cpi variables show as missing 2). it only keeps the last variable of any set (e.g., wq42022_). This is true also for the year and quarter variables that I have created in the same loop:

************************************************** ******************************

{
preserve

capture log close

forvalues yr = 2002(1)2022 {
foreach qtr of numlist 1/4 {

global yr

global qtr

global filepath "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/"

global filename "lfs_q`qtr'y`yr'" // e.g. lfs_q2y2004

global suffix ".dta"

use "${filepath}${filename}${suffix}" ,clear

************************************************** ******************************

cd "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/"
use "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/lfs_q`qtr'y`yr'.dta"

rename *, lower

************************************************** ******************************

** TIME

// Year - Reference Week/Year

gen wQ`qtr'`yr'_yr = `yr'

label variable wQ`qtr'`yr'_yr "Q`qtr'`yr' Year (`yr')"
label values wQ`qtr'`yr'_yr wQ`qtr'`yr'_yrla

ta wQ`qtr'`yr'_yr
sum wQ`qtr'`yr'_yr

// Calendar Quarter

gen wQ`qtr'`yr'_qtr = `qtr'

label variable wQ`qtr'`yr'_qtr "Q`qtr'`yr' Quarter (`qtr)"
label values wQ`qtr'`yr'_qtr wQ`qtr'`yr'_qtrla

ta wQ`qtr'`yr'_qtr
sum wQ`qtr'`yr'_qtr

************************************************** ******************************
************************************************** ******************************

append using "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/CPI.dta"

* Wages/Pay

rename (gross99 net99 grsswk netwk hourpay) (wQ`qtr'`yr'_paygro wQ`qtr'`yr'_paynet wQ`qtr'`yr'_paygro_wk wQ`qtr'`yr'_paynet_wk wQ`qtr'`yr'_paygro_hr)

recode wQ`qtr'`yr'_paygro wQ`qtr'`yr'_paynet wQ`qtr'`yr'_paygro_wk wQ`qtr'`yr'_paynet_wk wQ`qtr'`yr'_paygro_hr (-9/-1=.)

label variable wQ`qtr'`yr'_paygro "wQ`qtr'`yr' Gross Pay (GBP) - Continuous"
label values wQ`qtr'`yr'_paygro wQ`qtr'`yr'_paygrola

label variable wQ`qtr'`yr'_paynet "wQ`qtr'`yr' Net Pay (GBP) - Continuous"
label values wQ`qtr'`yr'_paynet wQ`qtr'`yr'_paynetla

label variable wQ`qtr'`yr'_paygro_wk "wQ`qtr'`yr' Weekly Gross Pay (GBP) - Continuous"
label values wQ`qtr'`yr'_paygro_wk wQ`qtr'`yr'_paygro_wkla

label variable wQ`qtr'`yr'_paynet_wk "wQ`qtr'`yr' Weekly Net Pay (GBP) - Continuous"
label values wQ`qtr'`yr'_paynet_wk wQ`qtr'`yr'_paynet_wkla

label variable wQ`qtr'`yr'_paygro_hr "wQ`qtr'`yr' Hourly Gross Pay (GBP) - Continuous"
label values wQ`qtr'`yr'_paygro_hr wQ`qtr'`yr'_paygro_hrla

************************************************** ******************************

keep wQ`qtr'`yr'_*

save "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/lfs_q`qtr'y`yr'.dta", replace

}
}

restore
}

************************************************** ******************************
************************************************** ******************************

In doing some checks, I found that when I have the original converted Excel file and check, for example:

display wQ12002_cpi * wQ12002_cpi (i.e., cpi for quarter 1 wave 2002), the output is 1 (as expected because wQ2002 is a variable that equals 1 and it has one observation).

BUT

When I append the file to the main file and repeat the above, it gives me . (missing)

See an example of what the variables look like:

************************************************** ******************************
. ta wQ12002_cpi

wQ12002_cpi | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 100.00 100.00
------------+-----------------------------------
Total | 1 100.00

. ta wQ12003_cpi

wQ12003_cpi | Freq. Percent Cum.
------------+-----------------------------------
.9853799 | 1 100.00 100.00
------------+-----------------------------------
Total | 1 100.00

************************************************** ******************************

-> ta of wQ42003_yr

Q42003 Year |
(2003) | Freq. Percent Cum.
------------+-----------------------------------
2003 | 127,021 100.00 100.00
------------+-----------------------------------
Total | 127,021 100.00

-> ta of wQ42003_qtr

Q42003 |
Quarter |
(`qtr) | Freq. Percent Cum.
------------+-----------------------------------
4 | 127,021 100.00 100.00
------------+-----------------------------------
Total | 127,021 100.00

************************************************** ******************************

. sum wQ42003_wage wQ42003_wage_cpi wQ42003_wage_cpi_categorical

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wQ42003_wage | 13,814 9175.29 14361.97 0 99995
wQ42003_wa~pi | 0
wQ42003_wa~al | 0

************************************************** ******************************

Anyone have any insights?

Thanks in advance!

O
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

24 Oct 2022, 09:57

Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data from the CPI.dta dataset. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

Be sure to use the dataex command to do this. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use the dataex command.
1 like
Comment
Olivia Hamilton

Join Date: Oct 2022

Posts: 18
#3

24 Oct 2022, 10:48

Thank you William. Here is the output for seven cpi variables:

. dataex wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double(wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi) .6373314856290198 .6769704354518052 .6810971244573216 .6924518636062623 .7054353360488796 .724606604211476 .7401387618278098 .7427053214550157 end

------------------ copy up to and including the previous line ------------------

Listed 1 out of 1 observations

Alter, does this attachment help?
Comment
Olivia Hamilton

Join Date: Oct 2022

Posts: 18
#4

24 Oct 2022, 11:16

Thank you William. Here is the output for seven cpi variables:

. dataex wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double(wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi) .6373314856290198 .6769704354518052 .6810971244573216 .6924518636062623 .7054353360488796 .724606604211476 .7401387618278098 .7427053214550157 end

------------------ copy up to and including the previous line ------------------

Listed 1 out of 1 observations

Alternatively, the attachment may help?
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1063

24 Oct 2022, 12:19

Not that I can help at this moment but I don't understand the structure at all. From your code in #1 it seems your data are structured like this, with 21 files inside 4 different folders broken down by quarters.

Code:

"/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q1/"
  lfs_q1y2002.dta
  lfs_q1y2003.dta
  lfs_q1y2004.dta
  lfs_q1y2005.dta
  .
  .
  .
  lfs_q1y2022.dta

"/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q2/"
  lfs_q2y2002.dta
  lfs_q2y2003.dta
  lfs_q2y2004.dta
  lfs_q2y2005.dta
  .
  .
  .
  lfs_q2y2022.dta

"/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q3/"
  lfs_q3y2002.dta
  lfs_q3y2003.dta
  lfs_q3y2004.dta
  lfs_q3y2005.dta
  .
  .
  .
  lfs_q3y2022.dta

"/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q4/"
  lfs_q4y2002.dta
  lfs_q4y2003.dta
  lfs_q4y2004.dta
  lfs_q4y2005.dta
  .
  .
  .
  lfs_q4y2022.dta

But that does not look like the code sample provided in #3 and #4 at all, because in those files all the years are already inside the file. Would be great if there is a brief description of the file structure, what does each file look like (with dataex), and a mock up of what you'd like to create.

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

24 Oct 2022, 13:19

It is not clear to me that using append was the appropriate way of adding your CPI data to your quarterly datasets. And it is more useful to use Stata numeric quarterly date values than to have a separate year and quarter variables.

So the first step is to work on your understanding of Stata's "date and time" variables, which are complicated with is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

All Stata manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

The next step is to understand that you don't want to have lots of variables with names like wQ12003_wage, wQ22003_wage, .... . That is what is known here as a "wide" layout of your data. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a "long" layout of your data rather than a wide layout of the same data.

That means instead of one observation per individual (or whatever the unit of observation is in these datasets) you will have one observation for each combination of individual and quarter. The sort of problems you will encounter trying to use your wide data will almost certainly be solved by reshaping the data. It is much easier, for example, to compare the second observation to the first, the third to the second, and so on, than it is to compare the second variable to the first, the third to the second, etc.

So at this point you reshape your CPI data from the wide layout to a long layout and create a Stata numeric date variable. Here's example code, applied to your example data. The tools you use are the rename and reshape commands.

Code:

help rename group // not just help rename
help reshape

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(wQ12022_cpi wQ12021_cpi wQ12020_cpi wQ12019_cpi wQ12018_cpi wQ12017_cpi wQ12016_cpi wQ12015_cpi)
.6373314856290198 .6769704354518052 .6810971244573216 .6924518636062623 .7054353360488796 .724606604211476 .7401387618278098 .7427053214550157
end
rename (wQ(#)(####)_cpi) (w_cpi#_#)
ds
generate seq = _n
reshape long w_cpi, i(seq) j(when) string
drop seq
generate date = quarterly(when,"QY") // create a Stata quarterly date variable
format date %tq
list, clean
tempfile cpi
save "`cpi'"

Code:

. rename (wQ(#)(####)_cpi) (w_cpi#_#)

. ds
w_cpi1_2022  w_cpi1_2020  w_cpi1_2018  w_cpi1_2016
w_cpi1_2021  w_cpi1_2019  w_cpi1_2017  w_cpi1_2015

. generate seq = _n

. reshape long w_cpi, i(seq) j(when) string
(j = 1_2015 1_2016 1_2017 1_2018 1_2019 1_2020 1_2021 1_2022)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations                1   ->   8           
Number of variables                   9   ->   3           
j variable (8 values)                     ->   when
xij variables:
w_cpi1_2015 w_cpi1_2016 ... w_cpi1_2022   ->   w_cpi
-----------------------------------------------------------------------------

. drop seq

. generate date = quarterly(when,"QY") // create a Stata quarterly date variable

. format date %tq

. list, clean

         when       w_cpi     date  
  1.   1_2015   .74270532   2015q1  
  2.   1_2016   .74013876   2016q1  
  3.   1_2017    .7246066   2017q1  
  4.   1_2018   .70543534   2018q1  
  5.   1_2019   .69245186   2019q1  
  6.   1_2020   .68109712   2020q1  
  7.   1_2021   .67697044   2021q1  
  8.   1_2022   .63733149   2022q1  

. tempfile cpi

. save "`cpi'"
file /var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//S_01058.000001 saved as .dta format

.

Next, you then append all your quarterly datasets such as lfs_q2y2004, creating a single dataset, creating a Stata quarterly date variable in the process. The following (untested) code shows what I have in mind.

Code:

// create empty Combined dataset
clear
save "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta", replace

forvalues yr = 2002(1)2022 {
forvalues qtr = 1/4 {

global filepath "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined_Q`qtr'/"
global filename "lfs_q`qtr'y`yr'" // e.g. lfs_q2y2004
global suffix ".dta"

use "${filepath}${filename}${suffix}", clear

** TIME
generate quarter = tq(`yr'q`qtr')
format quarter %tq

append "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta"
save "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta", replace

}
}

Finally, once that is done you can add your CPI data to this data.

Code:

help merge

Code:

use "/Users/hamilton/Dropbox/- Projects/-Quiet Quitting/DATA/Combined.dta", clear
merge m:1 quarter using "`cpi'"

Comment

Olivia Hamilton

Join Date: Oct 2022

Posts: 18
#7

25 Oct 2022, 13:26

Re. Ken's query. Really appreciate your response - thanks! There are two types of data. The first follows the format that you have listed. The second is the cpi data that was imported from Excel and that is structured as I outlined. The problem is perhaps the incompatibility of each.

Re. William's response. Thank you so much! I will try this. I am certainly not experienced with time data in Stata by any means (at least by my own assessment) but I will attempt to do as instructed... Finger's crossed!
Comment

Olivia Hamilton

Join Date: Oct 2022
Posts: 18

26 Oct 2022, 02:17

The above code works, so thank you! But, I am not quite seeing how I can adjust each year/quarter wage by the 'correct' year/quarter cpi? (Note: my current syntax for a continuous and categorical variable is below; _paygro is wages and _paygro_cpi is wages [ideally] adjusted by year/quarter cpi)

Code:

gen wQ`qtr'`yr'_paygro_cpi = wQ`qtr'`yr'_paygro * wq_cpi        // CPI Quarter Year Change from 2002

gen wQ`qtr'`yr'_paygro_cpi_c = wQ`qtr'`yr'_paygro_cpi
replace wQ`qtr'`yr'_paygro_cpi_c = 1 if wQ`qtr'`yr'_paygro_cpi &gt;= 1 &amp; wQ`qtr'`yr'_paygro_cpi &lt; 10000
replace wQ`qtr'`yr'_paygro_cpi_c = 2 if wQ`qtr'`yr'_paygro_cpi &gt;= 10000 &amp; wQ`qtr'`yr'_paygro_cpi &lt; 20000
replace wQ`qtr'`yr'_paygro_cpi_c = 3 if wQ`qtr'`yr'_paygro_cpi &gt;= 20000 &amp; wQ`qtr'`yr'_paygro_cpi &lt; 30000
replace wQ`qtr'`yr'_paygro_cpi_c = 4 if wQ`qtr'`yr'_paygro_cpi &gt;= 30000 &amp; wQ`qtr'`yr'_paygro_cpi &lt; 100000

P.S. I am still in the midst of digesting the datetime documentation - thanks for the suggestion.

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

26 Oct 2022, 07:42

First of all, in the code in post #6 that reshapes the CPI data, change

Code:

generate date = quarterly(when,"QY") // create a Stata quarterly date variable

Code:

generate quarter = quarterly(when,"QY") // create a Stata quarterly date variable

After you append your quarterly datasets and merge the reshaped CPI dataset, your Combined dataset will have the variables

Code:

quarter gross99 net99 grsswk netwk hourpay when w_cpi

Going forward, something like the following.

Code:

rename (gross99 net99 grsswk netwk hourpay) (paygro paynet paygro_wk paynet_wk paygro_hr)

recode paygro paynet paygro_wk paynet_wk paygro_hr (-9/-1=.)

label variable paygro "Gross Pay (GBP) - Continuous"
label variable paynet "Net Pay (GBP) - Continuous"
label variable paygro_wk "Weekly Gross Pay (GBP) - Continuous"
label variable paynet_wk "Weekly Net Pay (GBP) - Continuous"
label variable paygro_hr "Hourly Gross Pay (GBP) - Continuous"

gen paygro_cpi = paygro * w_cpi        // CPI Quarter Year Change from 2002

gen paygro_cpi_c = paygro_cpi
replace paygro_cpi_c = 1 if paygro_cpi >= 1     & paygro_cpi < 10000
replace paygro_cpi_c = 2 if paygro_cpi >= 10000 & paygro_cpi < 20000
replace paygro_cpi_c = 3 if paygro_cpi >= 20000 & paygro_cpi < 30000
replace paygro_cpi_c = 4 if paygro_cpi >= 30000 & paygro_cpi < 100000

This works because with your data in a long layout, you no longer have different variable names for different quarters, and each observation has both the CPI and the wages data for the same quarter.

Comment

Olivia Hamilton

Join Date: Oct 2022

Posts: 18
#10

29 Oct 2022, 08:54

It all makes perfect sense and works brilliantly! You're genius!!!
Comment

Announcement