Duplication of ID variables in panel data

Chris Boulis

Join Date: Feb 2019
Posts: 368

Duplication of ID variables in panel data

02 Apr 2020, 21:05

Hi All.

I have the following duplication of data in id & p_id, which I need to resolve. This data has paired up couples where they exist. As an example of the duplication problem, I have included data on one couple - some variables are included for both in a couple.

While not evident here, we sometimes observe an id with multiple partners (p_ids) that remains in the survey for many waves. We also find p_ids may drop out of the survey at the end of their relationship with a given id. In such cases, the id has more observations than some p_ids.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte wave int(age p_age) byte(marstat p_marstat) float length byte(educ p_educ) long(inc p_inc) byte lifesat double volunteer byte fired
14  15  1 47 44 1 1 16 5 8  79958  79958  4 . .
14  15  2 48 45 1 1 16 5 8 108000 108000  7 0 1
14  15  3 49 46 1 1 16 5 8 139000 139000  7 0 1
14  15  4 50 47 1 1 16 5 8 141718 141718  8 0 1
14  15  5 51 48 1 1 16 5 8 140723 140723  7 0 1
14  15  6 52 49 1 1 16 5 8 113008 113008  8 0 1
14  15  7 53 50 1 1 16 5 8 122832 122832  8 0 1
14  15  8 54 51 1 1 16 5 5 137991 137991  6 0 1
14  15  9 55 52 1 1 16 5 5 137140 137140  8 0 2
14  15 10 56 53 1 1 16 5 5  85330  85330  8 0 2
14  15 11 57 54 1 1 16 5 5  51148  51148  7 0 1
14  15 12 58 55 1 1 16 5 5  49378  49378  5 0 1
14  15 13 59 56 1 1 16 5 5  42822  42822  8 0 1
14  15 14 60 57 1 1 16 5 5  48164  48164  7 0 1
14  15 15 61 58 1 1 16 5 5  57011  57011  8 0 1
14  15 16 62 59 1 1 16 5 5  57930  57930  9 0 1
14       . 18 64  . 3 .  0 5 .      0      .  8 1 1
15  14  1 44 47 1 1 16 8 5  79958  79958  7 . .
15  14  2 45 48 1 1 16 8 5 108000 108000  8 0 1
15  14  3 46 49 1 1 16 8 5 139000 139000  7 0 2
15  14  4 47 50 1 1 16 8 5 141718 141718  8 0 1
15  14  5 48 51 1 1 16 8 5 140723 140723  7 0 1
15  14  6 49 52 1 1 16 8 5 113008 113008  8 0 1
15  14  7 50 53 1 1 16 8 5 122832 122832  6 0 1
15  14  8 51 54 1 1 16 5 5 137991 137991  7 0 1
15  14  9 52 55 1 1 16 5 5 137140 137140  7 0 1
15  14 10 53 56 1 1 16 5 5  85330  85330  7 0 1
15  14 11 54 57 1 1 16 5 5  51148  51148  7 0 1
15  14 12 55 58 1 1 16 5 5  49378  49378  8 0 1
15  14 13 56 59 1 1 16 5 5  42822  42822  7 0 1
15  14 14 57 60 1 1 16 5 5  48164  48164  9 0 1
15  14 15 58 61 1 1 16 5 5  57011  57011  7 0 1
15  14 16 59 62 1 1 16 5 5  57930  57930  8 0 1
15       . 17 60  . 3 .  0 5 .  51000      .  7 0 1
15       . 18 61  . 3 .  0 5 .  50000      .  8 0 1
end

I believe this duplication issue may have arisen from the code used to merge the data (see below).

Code:

local variables id p_id age marstat length educ inc lifesat volunteer fired 
local filename allwaves
clear
save "`savingdir'/`filename'", replace emptyok
    forvalues wave=1/18 {
    local waveprefix = word(c(alpha), `wave')
    quietly use "`origdatadir'/Combined_`waveprefix'180c.dta", clear
    rename `waveprefix'* *
    isvar `variables'        
    keep `r(varlist)'
    generate byte wave = `wave'
    display "Wave `wave' (`waveprefix') - kept `thiswave'
    append using "`savingdir'/`filename'"
    save "`savingdir'/`filename'", replace
}

// partner data
tempfile partners
drop if p_id==""
drop id
rename * p_*
rename (p_id p_wave) (id wave)
save `partners'

// merge
use "`savingdir'/`filename'"
merge 1:1 id wave using `partners', nolabel
assert (p_id!="" & _merge==3) | (p_id=="" & _merge==1)
drop _merge

// make panel
destring id p_id, replace 
sort id wave
xtset id wave
save "`savingdir'/`filename'", replace
order id wave

I appreciate any assistance to help address the duplication.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

02 Apr 2020, 21:42

Well, even if there might have been a different data management up to this point that would have gotten you a more compact data organization, I think it is simpler and better to re-order the data at this point, rather than mucking around with code that came before and perhaps introducing new errors.

Here's an interesting thing we can exploit. id and p_id are never the same: no person is partnered with him/herself. And missing value is, in Stata, larger than all real numbers. So, each pairing occurs twice, once with id < p_id, and again with p_id < id. For those who are unpaired at any particular point, p_id is missing in that observation, so id < p_id. So we can reduce this data set by retaining only those observations with id < p_id. This will select one observation for each pair (in each wave) and retain the unpaired observations. We need, however, to verify that the data in the two versions of the paired observations are consistent with each other so that there are no inconsistencies and there is no information lost by dropping those with p_id > id. The code below does this simply:

Code:

clear* * Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte wave int(age p_age) byte(marstat p_marstat) float length byte(educ p_educ) long(inc p_inc) byte lifesat double volunteer byte fired 14 15 1 47 44 1 1 16 5 8 79958 79958 4 . . 14 15 2 48 45 1 1 16 5 8 108000 108000 7 0 1 14 15 3 49 46 1 1 16 5 8 139000 139000 7 0 1 14 15 4 50 47 1 1 16 5 8 141718 141718 8 0 1 14 15 5 51 48 1 1 16 5 8 140723 140723 7 0 1 14 15 6 52 49 1 1 16 5 8 113008 113008 8 0 1 14 15 7 53 50 1 1 16 5 8 122832 122832 8 0 1 14 15 8 54 51 1 1 16 5 5 137991 137991 6 0 1 14 15 9 55 52 1 1 16 5 5 137140 137140 8 0 2 14 15 10 56 53 1 1 16 5 5 85330 85330 8 0 2 14 15 11 57 54 1 1 16 5 5 51148 51148 7 0 1 14 15 12 58 55 1 1 16 5 5 49378 49378 5 0 1 14 15 13 59 56 1 1 16 5 5 42822 42822 8 0 1 14 15 14 60 57 1 1 16 5 5 48164 48164 7 0 1 14 15 15 61 58 1 1 16 5 5 57011 57011 8 0 1 14 15 16 62 59 1 1 16 5 5 57930 57930 9 0 1 14 . 18 64 . 3 . 0 5 . 0 . 8 1 1 15 14 1 44 47 1 1 16 8 5 79958 79958 7 . . 15 14 2 45 48 1 1 16 8 5 108000 108000 8 0 1 15 14 3 46 49 1 1 16 8 5 139000 139000 7 0 2 15 14 4 47 50 1 1 16 8 5 141718 141718 8 0 1 15 14 5 48 51 1 1 16 8 5 140723 140723 7 0 1 15 14 6 49 52 1 1 16 8 5 113008 113008 8 0 1 15 14 7 50 53 1 1 16 8 5 122832 122832 6 0 1 15 14 8 51 54 1 1 16 5 5 137991 137991 7 0 1 15 14 9 52 55 1 1 16 5 5 137140 137140 7 0 1 15 14 10 53 56 1 1 16 5 5 85330 85330 7 0 1 15 14 11 54 57 1 1 16 5 5 51148 51148 7 0 1 15 14 12 55 58 1 1 16 5 5 49378 49378 8 0 1 15 14 13 56 59 1 1 16 5 5 42822 42822 7 0 1 15 14 14 57 60 1 1 16 5 5 48164 48164 9 0 1 15 14 15 58 61 1 1 16 5 5 57011 57011 7 0 1 15 14 16 59 62 1 1 16 5 5 57930 57930 8 0 1 15 . 17 60 . 3 . 0 5 . 51000 . 7 0 1 15 . 18 61 . 3 . 0 5 . 50000 . 8 0 1 end assert id != p_id frame put _all if p_id < id, into(seconds) keep if id < p_id frlink 1:1 id p_id wave, frame(seconds p_id id wave) foreach v of varlist age marstat educ inc { assert `v' == frval(seconds, p_`v') if !missing(p_id) assert p_`v' == frval(seconds, `v') if !missing(p_id) assert missing(p_`v') if missing(p_id) } foreach v of varlist lifesat volunteer fired { gen p_`v' = frval(seconds, `v'), after(`v') }

At the end of this code, the data set has been verified consistent with respect to id and p_id's reports and contains only one version of each pair (the one with id < p_id) and retains all unpaired observations. I would save this data file and use it for your pair-based work, including the calculations about duration of pairings that we worked on in your previous thread. If you also need to do some analyses requiring individual level observations, it is easy enough to reshape this into long layout. It isn't often that I recommend the use of a wide-layout data set, but in this case the natural unit of analysis is not the person but the pair, and the individual-level variables constitute, in effect, distinct attributes of the pair.

Now, I'll just comment on some things I've noticed about this data that leave me somewhat uneasy. In every observation p_inc and inc are equal. That seems wrong to me, but maybe I'm misunderstanding what this variable represents. Also, there are these other variables lifesat, volunteer, and fired that are not consistent and wher there is no p_ version. Perhaps I do not understand what these variables are, and whether, perhaps these observations are mistakes. But to prevent loss of information, I've created new p_* versions of these variables in the data that is retained.

Last edited by Clyde Schechter; 02 Apr 2020, 21:46.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#3

03 Apr 2020, 20:13

Hi Clyde Schechter. Thank you for your quick and valuable reply. This looks really good and I like your clear explanation. I am still running Stata 15.1 (sorry I didn't mention this in my post). Could you kindly provide an alternative to code new to Stata 16, such as -frames-. I searched help for -frlink- and -frval- but Stata didn't respond, but this could be that they are also specific to Stata 16.

The analysis will tend to be about the pair. Will wide format allow for analysis looking at behavioural changes in the pair over time, e.g. with respect to volunteering, lifesat, fired, etc?

In every observation p_inc and inc are equal

I mistakingly included household income instead of individual income - so that makes sense.

Also, there are these other variables lifesat, volunteer, and fired that are not consistent and wher there is no p_ version

I did not include the p_id equivalent for these variables - my error, although your work around addressed that. Thus, I will exclude the last foreach loop.

Finally, with respect to

Code:

foreach v of varlist age marstat educ inc {

Given, my code above, should I replace with

Code:

foreach v of `variables' {

As my kept variables list is very long, I will refer to the local macro `variables' correct?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30097

03 Apr 2020, 20:43

Code:

assert id != p_id
//  SPLIT THE DATA INTO TWO DATA SETS, ONE HAVING THE OBSERVATIONS
//  WHERE p_id < id, AND THE OTHER THOSE WITH p_id >= id (INCLUDING
//  p_id MISSING)
preserve
keep if p_id < id
//  SWITCH NAMES: ALL p_ VARIABLES BECOME PLAIN, AND ALL PLAIN ONES
//  BECOME p_
ds p_* wave, not
local non_p_vbles `r(varlist)'
rename p_* _*
rename (`non_p_vbles') p_=
rename _* *
tempfile holding
save `holding'
restore
keep if id < p_id
merge 1:1 id p_id wave using `holding', update
assert _merge <= 4
drop _merge
local variables lifesat volunteer fired // AND PERHAPS OTHERS IN YOUR DATA SET
foreach v of local variables {
    order p_`v', after(`v')
}

This uses a tempfile instead of a frame. And because there are no frames, frlink and frval() are not available, but -merge- takes their place. The -asserts- are a bit hard to replicate in this framework, but by using the -update- option in the -merge- command and verifying that _merge <= 4 we verify the consistency of the reports between the original pair of records that are now merged together.

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 368
#5

07 Apr 2020, 05:23

Hi Clyde Schechter. Thank you for your updated code. Though I'm getting an issue at the "merge 1:1" command line

Code:

merge 1:1 id p_id wave using `holding', update variable id not found r(111);

I don't quite follow all the code, do you mind explaining how to interpret this code please (that you didn't cover in #4) - particularly up to "_drop merge". Kind regards, Chris.

Last edited by Chris Boulis; 07 Apr 2020, 05:25.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

07 Apr 2020, 09:59

I tested the code in #4 before posting it using the sample data in #2, and it ran without error messages and produced results that matched my expectations from hand-calculations. I just tried it again with the same results, so I can't replicate the problem you are having in #5. You must be doing something different.

Here is the code again, with additional comments to explain what is being done:

Code:

// VERIFY NO PERSON IS PARTNERED WITH HIM/HERSELF assert id != p_id // SPLIT THE DATA INTO TWO DATA SETS, ONE HAVING THE OBSERVATIONS // WHERE p_id < id, AND THE OTHER THOSE WITH p_id > id (INCLUDING // p_id MISSING) DO THE p_id < id SEGMENT FIRST. preserve keep if p_id < id // SWITCH variable NAMES: ALL p_ VARIABLES BECOME PLAIN, AND ALL PLAIN ONES // BECOME p_ ds p_* wave, not local non_p_vbles `r(varlist)' rename p_* _* rename (`non_p_vbles') p_= rename _* * // SAVE THIS PART OF THE DATA IN A TEMPORARY FILE tempfile holding save `holding' // BRING BACK THE ORIGINAL DATA AND RETAIN ONLY THOSE WHERE id < p_id // THIS WILL AUTOMATICALLY INCLUDE THOSE WHO HAVE NO PARTNERED // BECAUSE p_id WILL BE A MISSING VALUE, HENCE > id restore keep if id < p_id // NOW PUT THE TWO PIECES TOGETHER merge 1:1 id p_id wave using `holding', update // VERIFY THAT NO VARIABLES HAVE CONFLICTING NON-MISSING VALUES // IN THE TWO DATA SETS assert _merge <= 4 drop _merge // RE-ARRANGE THE VARIABLES TO PUT p_var IMMEDIATELY AFTER var // IF THEY AREN'T ALREADY IN THAT ORDER. local variables lifesat volunteer fired // AND PERHAPS OTHERS IN YOUR DATA SET foreach v of local variables { order p_`v', after(`v') }

Note: The code itself is not changed from #4

The top-level view of this is that the data you start with has, for people in partnerships, two versions of the same data: one has person 1 as id and person 2 as p_id, and the other has it the other way around. This also means that var in the first version is (or, rather, should be) the same as p_var in the second version. The overall approach of this code is to split the data set into two parts, each containing one of the versions for each partnership. Then the second version gets its variable names so that what it considered var is now called p_var and vice versa. Now the two parts of the data set are (or should be) identical. [We allow for the possibility of conflict between a reported value in one version and a missing value in the other, but no conflicts between different non-missing values.] When we put the two pieces together now, because they have been made identical, the "-merge-" operation actually just ends up importing the observations for people who have no partner and doing nothing to any of the data on partnered people.
1 like
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

08 Apr 2020, 05:36

Thank you for explaining the code Clyde Schechter. It was very helpful. I also found no issue when using the sample data in #1. Below is the Stata output using the code in #4 on my full dataset . I've double checked, but not sure what's wrong. Can you see an issue?

Code:

. assert id != p_id
. preserve
. keep if p_id < id
(276,145 observations deleted)
. ds p_* wave, not                                     // can you please explain how the 'not' option works?
id inc age p_id lifesat volunteer fired marstat educ   // brief list of vars displayed
. local non_p_vbles `r(varlist)'
. rename p_* _*
. rename (`non_p_vbles') p_=
. rename _* *
. tempfile holding
. save `holding'
file C:\Users\chris\AppData\Local\Temp\ST_a98_000002.tmp saved
. restore
. keep if id < p_id
(88,282 observations deleted)
. merge 1:1 id p_id wave using `holding', update
variable id not found
r(111);

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#8

08 Apr 2020, 10:47

This may be the oddest thing I've ever seen. It's actually your little comment asking me to explain the -not- option that drew my attention to why you are having this problem, though I do not have a solution for you.

So first let me answer that question. As you know -ds varlist- lists the names of the variables in varlist after expanding any wildcards. When the -not- option is specified, it instead lists the names of all of the variables in the data set except the ones mentioned in the varlist. So, in particular, -ds p_* wave, not- should list all variables except wave and except any variable that begins with p_.

Now look at the output Stata gave you from that command: it includes p_id. That's clearly wrong. p_id begins with p_ and should not be there. And that is causing your later problem at the -merge-. Because the variable id in `holding' ultimately is supposed to be the result of renaming p_id to id. For some reason, your Stata thinks that p_id does not begin with p_. Consequently when we reach the -rename p_* _*- and -rename _* *- commands, p_id is not processed, and the variable id is not created.

There is another indication that something is wrong with p_id. Look at the command -rename (`non_p_vbles') p_=-. The variable id is part of the output of that -ds, not- command and is therefore included in `r(varlist)'. Consequently, the -rename- command will rename id to p_id. But since p_id didn't get renamed to _id, p_id should still exist, and Stata should throw an error message there complaining that you can't rename a variable to a name that is already in use. So for some reason Stata is not recognizing what looks to our eyes to be p_id as consisting of the characters p, _, i_, and d.

I do not know why -ds- should have made this mistake. Perhaps the variable whose name we are reading with our eyes as p_id is an optical illusion. Perhaps it is actually contains some other non-printing character that does not match the p_* wildcard.

Here's what I suggest you do. Load in your data set and then run the following code (exactly: copy and paste it into the do-file editor so you don't inadvertently change it in any way):

Code:

keep *id ds local vbles `r(varlist)' clear local n_vars: word count `vbles' set obs `n_vars' gen var1 = "" forvalues i = 1/`n_vars'{ replace var1 = `"`:word `i' of `vbles''"' in `i' } chartab var1

(Note: -chartab- is written by Robert Picard and is available from SSC. You must install this to run the above code, unless you already have it.)

The output I get from doing this with the -dataex- example is:

Code:

. chartab var1 decimal hexadecimal character | frequency unique name ------------------------------------+-------------------------------------- 95 \u005f _ | 1 LOW LINE 100 \u0064 d | 2 LATIN SMALL LETTER D 105 \u0069 i | 2 LATIN SMALL LETTER I 112 \u0070 p | 1 LATIN SMALL LETTER P ------------------------------------+-------------------------------------- freq. count distinct ASCII characters = 6 4 Multibyte UTF-8 characters = 0 0 Unicode replacement character = 0 0 Total Unicode characters = 6 4

If my theory of the problem is correct, your output will be different and will identify what is wrong with the variable that is supposed to be named p_id.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#9

08 Apr 2020, 17:50

Thank you so much Clyde Schechter. I promised I'd wake up early today to figure out what's going on and I believe I found a small issue that may have been causing the problem. I ran the code you gave in #8 and Stata provided this output:

Code:

. chartab var1 decimal hexadecimal character | frequency unique name ------------------------------------+-------------------------------------- 95 \u005f _ | 1 LOW LINE 100 \u0064 d | 2 LATIN SMALL LETTER D 105 \u0069 i | 2 LATIN SMALL LETTER I 112 \u0070 p | 1 LATIN SMALL LETTER P ------------------------------------+-------------------------------------- freq. count distinct ASCII characters = 6 4 Multibyte UTF-8 characters = 0 0 Unicode replacement character = 0 0 Total Unicode characters = 6 4

Now all my code (for the dataset) seems to be good, except it stops when it runs:

Code:

. foreach v of local varlist { 2. order p_`v', after(`v') 3. } after(): too many variables specified r(103);

I do have quite a lot of variables though I'm not sure that would cause the issue. I found this error occurs when the loop ordering p_`v' after `v' runs directly after "local varlist". So after receiving this error, I ran the loop separately and it worked. As such, it appears that I can run my code, but in two stages. Any ideas on how to 'fix' this?

I also wanted to ask whether it is an issue to use "local variables" twice in the code as I also use it in earlier code to merge the waves (see #1). I noticed in your "local variables" line of code in #4 that you did not include "id or p_id so I removed them from my list. Can you explain why we don't include these or the wave variable in "local variables"?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#10

08 Apr 2020, 18:35

I believe I found a small issue that may have been causing the problem.

Well, what was the issue you found, and how did you fix it?

I do have quite a lot of variables though I'm not sure that would cause the issue. I found this error occurs when the loop ordering p_`v' after `v' runs directly after "local varlist". So after receiving this error, I ran the loop separately and it worked. As such, it appears that I can run my code, but in two stages. Any ideas on how to 'fix' this?

I don't understand how that could be. If you run the loop separately, then local varlist is simply undefined, which means that the loop is skipped altogether. Yes, it won't throw an error message, but it also won't do anything. Now, what it does isn't crucial: it just moves the variables around so that each p_ variable is immediately after the corresponding non-p_ variable. But that's just a matter of convenience and esthetics, not anything important. If it isn't running after the -local variables- command, then something is wrong in the -local variables- command, but there is nothing wrong with the one I wrote in #6, and I don't know what yours (which presumably includes some other things) looks like. I'd actually be very interested to see what your definition of local macro variables was that could cause this error. I'm having trouble imagining what it could be, because no matter what is in local macro variables, the -foreach- loop should parse it into single tokens, so that there would only be one token in -after()- at each iteration. So it is really hard for me to explain how you ended up with that particular error message. I could see the possibility that something in local macro variables was not an actual variable name in your data, but that would produce a different error message. I don't see how you could have ended up with more than one thing in -after()-, so I'm really curious to learn what caused that.

I also wanted to ask whether it is an issue to use "local variables" twice in the code as I also use it in earlier code to merge the waves (see #1). I noticed in your "local variables" line of code in #4 that you did not include "id or p_id so I removed them from my list. Can you explain why we don't include these or the wave variable in "local variables"?

No, it is not a problem to use the same name for a local macro twice. You just have to understand that when you use it the second time, the definition from the first time no longer applies. The only problem that could arise is if you needed to also use the original definition. In that case, you need to use a different name for the second local macro. But in this case, the local macro variables that you first defined in #1 has served its purpose and is no longer needed by the time we get to where I define local macro variables in #6, so there is no problem. The original definition in #1 no longer applies, and the definition in #6 prevails until the end of the code. The reason my definition of it does not include id or p_id is that the purpose of defining local variables in #6 was to have a list of those variables that occur in both plain and p_ forms where the plain and p_forms were not located next to each other in the data set. That local macro then defines the variables over which the loop that came after it iterates to re-order the variables next to each other. I didn't include id and p_id because: a) id and p_id are already next to each other in the data, so no reordering is needed for them, and b) including p_id would cause an error because there is no variable p_p_id (which would be looked for inside the loop if p_id were in local variables.) The same reasoning applies for not including wave in that list.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#11

08 Apr 2020, 21:43

Hi Clyde Schechter. Thanks for your clarifications - appreciated. In response to your questions:

Well, what was the issue you found, and how did you fix it?

The id variables for the respondent and partner are referred to differently in my dataset, so I used id, p_id as these are more straightforward. While I could update most code to suit, it could not work with the code in #4 as it was, so it was easier to rename them to id and p_id, which enabled your code to work as intended.

I could see the possibility that something in local macro variables was not an actual variable name in your data, but that would produce a different error message.

I use the wildcard * (e.g. le* which comprises many variables, including lemarr lefrd lemvd) doing so saves time not having to list them all in the code. I note that the list of variables in "local variables" only includes non p_variables, that is, age, sex, ... and not p_age, p_sex - is that right or do I need to include both? And except for leaving out "id p_id and wave" - it is the same list of variables coded in #1.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#12

09 Apr 2020, 10:47

I use the wildcard * (e.g. le* which comprises many variables, including lemarr lefrd lemvd) doing so saves time not having to list them all in the code. I note that the list of variables in "local variables" only includes non p_variables, that is, age, sex, ... and not p_age, p_sex - is that right or do I need to include both? And except for leaving out "id p_id and wave" - it is the same list of variables coded in #1.

Yes, that's correct. The idea is to generate a list of the things that occur in both plain and p_ varieties, and then feed that list to a loop that moves the p_ version immediately after the plain version. So no p_ things can be in there, because there is no p_p_ variable to move, and id and wave are excluded because p_id is already located after id (at least in the data example I've been working with), and wave has no p_ counterpart.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#13

09 Apr 2020, 18:32

Hi Clyde Schechter. I understand, thanks for clarifying. I have been experimenting with the error in #9 - "too many variables specified" and found that the loop will work with all of the variables, except those with the wildcard *. I've not been able to find a solution to this. Though I did see Nick Cox use -egen- in place of a loop in https://stackoverflow.com/questions/...-in-their-name, but I'm not sure if this would work, and if so, how to adapt it to my issue.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#14

09 Apr 2020, 19:15

Oh, I see what you're doing and what's going wrong. My code has -local variables lifesat volunteer fired-. You are expanding that with something like -local variables lifesat volunteer fired something*-, am I right?

That's not going to work. The -local- command does not expand a wildcard. The local variables will actually still say something*, it will not include a list of all the variables that start with something.

Then the foreach loop, when it gets to something*, passes something*, again, not expanded, into the loop. So when you get to the -order- command, it will read -order p_something*, after(something*)-. Now, for the first time, something* will be expanded to a list of variables, and inside -after()- that's illegal.

You have two alternatives. If the wildcard with * is standing in for a small number of variables you can just write them out instead of using the wildcard. Alternatively, you can get a local macro with the expanded wildcard as follows:

Code:

ds lifesat volunteer fired something* local variables `r(varlist)'

-ds- does expand the wildcard, and this way local variables will in fact contain the list of variables that begin with something, and the loop will then feed them one at a time into the -order- campaign.

Chris, here's a tip for better posting. It appears that several of the difficult mysteries we have encountered arose because you are posting example data using different variable names from the ones you are actually using in your own code. Many Stata commands are very sensitive to the names of variables, or references to variables. So we get the situation where my code works with what you have given me, and then, without you showing what you are doing differently, it doesn't work with yours.

So when you post example data in the future, use the actual variable names that you are using. Also, if you are adapting code that I (or somebody else) has provided you and it isn't working, show the exact and complete code that is giving you trouble--what you may think of as minor changes (like using a wildcard) are sometimes crucial, and when you don't reveal what is going on, troubleshooting is very difficult.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#15

09 Apr 2020, 23:39

Thanks Clyde Schechter. Yes I see your point, thanks for the tip, consider it done.

That's great. And I'm very happy to report that my code is now running in full

Code:

ds id hgage hgsex mrcurr esbrd edhigh1 tchave tchad tcr tcyng hiwsfei hwhmhl jbmtuea /// hxycig hxyalc xpycig xpyalc anengf ancobn hhsos gh1 fmlwop edsstyp fiprbeg rpevown lssupvl pdsad /// ordfnum mrn hxyoi atwkadc lsclubn lstrust lshrvol lsvol lsnwmc le* losat* cety* rel* local variables `r(varlist)' foreach v of local variables { order p_`v', after(`v') }

Actually, in looking to address this earlier I read a tip by Nick Cox that noted you could expand a wildcard with "describe, varlist", which seems similar to what -ds- does here. I now have a new trick up my sleeve with future coding. Thanks a lot.
Comment

Announcement

Duplication of ID variables in panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment