interpreting advanced stata code: global, word count

Nora Schwaller

Join Date: Jul 2019

Posts: 7
#1

interpreting advanced stata code: global, word count

02 Dec 2022, 16:19

I know basic stata coding but am primarily using R at this point, where I know how to do a lot more advanced work. I got a piece of stata code that I need to recreate in R, and I have no idea what is going on. It looks like this:

Code:

global n_medians : word count $medians global i = 1 while $i <= $n_medians { global median : word $i of $medians global weight : word $i of $median_weights gen weight_$median = $weight if ($median != .) replace $median = $median * weight_$median global missweights $missweights weight_$median global i = $i + 1 }

Can someone please just explain what each of these steps are doing? I don't understand global macros in stata, I'm not sure what the $ does, and I'm completely lost on what word count is supposed to do. Any help would be appreciated.

Last edited by Nora Schwaller; 02 Dec 2022, 16:44.
Tags: global, while loop
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

02 Dec 2022, 17:54

I can't describe the use of macros (global or local) better than does Chapter 18 Programming in Stata of the Stata User's Guide PDF (included in your Stata installation and accessible from Stata's Help menu, or online at https://www.stata.com/manuals/u.pdf).

If you're new to Stata, when I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

Stata also supples YouTube videos, if that's your thing.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35711
#3

02 Dec 2022, 18:49

@William Lisowski's advice is excellent. He is probably wise in not offering any translation or guesses at what is going on.

This strikes me as fairly weird code by many, perhaps most, Stata standards, and knowing some Stata is necessary but not sufficient to explain it.

It could also be very old code, as most Stata programmers would not write now a while loop if they could write a foreach or forvalues loop.

Yet further, most Stata programmers would not use global macros in this way, but local macros, or move to Mata.

On that and other grounds, it could also have been written by someone who knew much more about some other language than about Stata.

You should be able to give some context on what the code is about -- and if you can't explain the context it is not only difficult but also dangerous for you to translate it even into a language you know better, the danger being the risk of producing utterly incorrect if not meaningless code. It could also be dangerous for you to take my guesses too seriously unless they make complete sense given what else you know.

I guess that the global macro medians contains a list of variable names and the global macro median_weights another list of variable names.

The word count just counts the number of words, where words are separated by spaces (there is another part of the definition unlikely to be relevant here). Thus there are three words in this global

Code:

. global foo "a b c" . global wc : word count $foo . di $wc 3

The code then loops in parallel over those "words" and a corresponding list of words that are, presumably, names of variables holding weights and creates new variables (columns in the data) that are products of each first variable -- evidently holding a median or medians of some kind -- and of a second variable holding corresponding weights.

I can't rule out that in R this would just be element-wise multiplication of two vectors, but the detail about missing values remains a puzzle.

Holding constants in Stata in variables (columns in a dataset) is sometimes a good idea but not usually.

Last edited by Nick Cox; 02 Dec 2022, 19:16. Reason: Note various edits, as I have struggled to work out what is going on and to explain some of it.
1 like
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1409
#4

02 Dec 2022, 20:41

To carry forward the example in #3, and fully endorsing all the caveats there as well, I think this corresponds to a scenario where there are some variables/columns (say, a, b and c) that store medians, and corresponding variables/columns that store weights, say d, e, f), using prior declaration like

Code:

global medians a b c global weights d e f

Then n_medians will evaluate to 3 ("words" in Stata are just things separated by spaces, so either of these macros have three words). The loop then iterates over i =1 to 3. Each time, it picks out the i'th word of the above macros. So, in the first iteration, it picks out a and assigns it to the macro median, and picks out d and assigns it to the macro weight. Then it generates a new variable/column called weight_a, which is basically identical to the column d (at least for non-missing values) . Now it replaces the medians stored in column a with a weighted version, by multiplying each row element of column a with the same row element of column weight_a (which as we said is essentially the same thing as the original column d).

Finally, the code appears to append the name of the newly created column/variable weight_a to the global macro missweights. If missweights was initialised as empty prior to this chunk of code, then after fully executing all iterations of the loop, it will look like

Code:

global missweights weight_a weight_b weight_c

i.e. it is a listing of the newly named variables/columns that store the weights (again, essentially the same as the original columns d, e and f)

(I share Nick's bafflement. There are just so many ways this seems to be weird code, as if it were written by someone who doesn't really work with Stata).

Last edited by Hemanshu Kumar; 02 Dec 2022, 20:48.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#5

03 Dec 2022, 08:32

My question here who be who wrote this code? What exactly was their purpose? I've used Stata for 6 years now, and I wouldn't write code like this, but even if I wrote it normally, I don't get the point of it. Like I guess that's my point, whoever wrote this had a task they wanted to accomplish. The moment we see that, we can get a better idea of what's happening and more importantly, why.
Comment
Nora Schwaller

Join Date: Jul 2019

Posts: 7
#6

03 Dec 2022, 11:11

Jared Greathouse - great point, should provide more context. The code is part of a larger piece that weights census tract values to interpolate from 2000 to 2010 census geographies. That bit is pretty simple, it's just multiply the 2000 value by the weight related to a 2010 census tract and summarize by 2010 census tracts. This specific code segment is supposed to prepare values that are averages (e.g., percapita income) by the total value (so some function of combining/weighting percapita income with the number of individuals in the census tract) before it is used in the normal weighting process. While I know that's what it does generally, I've no idea what is happening with the specifics.

@William I will check out that chapter, thanks.
Comment

Announcement

interpreting advanced stata code: global, word count

Comment

Comment

Comment

Comment

Comment