How to keep data regarding reference person only?

Harris Mazari

Join Date: Aug 2017

Posts: 2
#1

How to keep data regarding reference person only?

06 Aug 2017, 19:56

Hi, I want to know how I can select a reference person from each household using Stata..

I am working with household data and I want to chose a reference person for each household. I have decided to use the person with the highest income as the reference person. I have looked at ways to do it online and have not been able to find the correct command to do it. I know its a beginners question and I am sorry for wasting your time if you feel that.

The variables are named hhid (househoild ID) and income.(Disposable income for each respondent) in my data. I need to learn a command to highlight the reference person (preferably keeping data on reference person only and deleting others).

Please note my data looks like this Preferably I need it to look like this
hhid income hhid income
217 50 217 50
218 40 218 60
218 60 219 40
218 30 220 90
219 20
219 40
220 90
220 70
220 80

There are a total of more than 9000 households in the data set and obviously I can not just manually check the income for each person in each household. Kindly guide me on what code to use to achieve this. (if income is similar for two people then I would like to use age as the second filter)

P.S. I feel really scared of asking such a beginner question (even though I have read that most people are polite and helpful). i have tried my best to follow guides on 'how to ask a good question on STATALIST'.I have tried to find the answer to my question on STATALIST and google but have failed maybe I am searching the wrong words
Tags: None
Gurpreet Singh

Join Date: Jul 2017

Posts: 21
#2

06 Aug 2017, 21:50

Hi,

you may use the collapse command -

. collapse (max) income , by(hhid)

Only the households with maximum income will be retained by this command.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#3

06 Aug 2017, 22:05

I'm a bit confused by your description of your data. What you show cannot be right, because you cannot have two variables with the same name in a Stata data set. It would have made life easier had you posted an example of your actual Stata data using the -dataex- command. (More on that below.) So I'm going to make a guess as to what your data looks like. And if I'm wrong, I will have wasted both your time and mine.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int hhid1 float age1 byte income1 int hhid2 float age2 byte income2 217 41 50 217 40 50 218 40 40 218 36 60 218 41 60 219 34 40 218 41 30 220 26 90 219 38 20 . 36 . 219 41 40 . 34 . 220 42 90 . 39 . 220 41 70 . 35 . 220 39 80 . 34 . end // GET DATA INTO LONG LAYOUT gen long obs_no = _n reshape long hhid age income, i(obs_no) j(_j) drop _j drop if missing(hhid) // IDENTIFY PERSON WITH HIGHEST INCOME // (AND IF A TIE, BREAK TIE WITH OLDEST) gen byte income_missing = missing(income) by hhid income_missing (income age), sort: gen byte reference = (_n == _N & !income_missing)

So the above example data assumes that you have multiple people in each observation. So the first step is to regularize the data structure by having each person be a separate observation in the data. This is called long layout, and it is the preferred data organization in Stata for nearly all data management and analysis commands. (There are a few things that work better with wide layout, but they don't come up very often.)

Then it is a matter of sorting the data by income (and age within income) within households and tagging the last observation in the household. Things are slightly complicated because in Stata, missing value for numeric values always sorts last; missing value is larger than every real number in Stata. So we have to actually segregate the observations with missing income and avoid tagging them.

In the future, please show real Stata data examples, and use -dataex- to do it, as I have done above. Unfortunately, -dataex- is not part of official Stata. But you can get it by running -ssc install dataex-. Then run -help dataex- to read the simple instructions for using it. When you use -dataex-, you enable those who want to help you to create a complete, detailed, and faithful replica of your Stata example with a simple copy/paste operation.

Added: Crossed with #2, which relies on a different interpretation of what is wanted and produces a dataset containing only the reference person information. The code shown here identifies the reference observations in a 0/1 variable but retains all the original observations.
Comment
Harris Mazari

Join Date: Aug 2017

Posts: 2
#4

06 Aug 2017, 22:32

Thank you Gurpreet and Clyde. Sure I will try to use the -dataex- command next time actually i am new to both Statalist and Stata so still learning. Thanks.

Just to clarify this is how I meant it to look:

Please note my data looks like this.........................................Prefe rably I need it to look like this
hhid ............income................................ ............................................ hhid ............income
217 ..............50 .................................................. ..................................217 ...............50
218 ..............40 .................................................. ..................................218 ...............60
218 ..............60 .................................................. ..................................219 ...............40
218 ..............30 .................................................. ..................................220 ...............90
219 ..............20
219 ..............40
220 ..............90
220 ..............70
220 ..............80

Sorry for the confusion and thanks again.
Comment
Atchara Patoom

Join Date: Aug 2017

Posts: 2
#5

07 Aug 2017, 00:02

Hi Stata listers

I try to estimate the quaids model with censoring demand following Shonkwiler and Yen approach by nlsur quaids command in STATA.
When I use nlsur quiads command, I always get these following massages

1. nlsurquaids returned 199

verify that nlsurquaids is a function evaluator program

r(199);

2.varlist required

r(100);

How can I get rid of these massages? Do you have any advice regarding my problem?

Thank you in advance for your kindness

Atchara Patoom

Here are the Stata codes:

program nlsurquaids
*version13
syntax varlist(min=38 max=38) if, at(name)
tokenize`varlist'
args w1 w2 w3 w4 w5 w6 lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnexp x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 pdf1 pdf2 pdf3 pdf4 pdf5 pdf6 cdf1 cdf2 cdf3 cdf4 cdf5 cdf6

tempname a1 a2 a3 a4 a5 a6
scalar `a1' = `at'[1,1]
scalar `a2' = `at'[1,2]
scalar `a3' = `at'[1,3]
scalar `a4' = `at'[1,4]
scalar `a5' = `at'[1,5]
scalar `a6' = `at'[1,6]

tempname b1 b2 b3 b4 b5 b6
scalar `b1' = `at'[1,7]
scalar `b2' = `at'[1,8]
scalar `b3' = `at'[1,9]
scalar `b4' = `at'[1,10]
scalar `b5' = `at'[1,11]
scalar `b6' = `at'[1,12]

tempname g11 g12 g13 g14 g15 g16
tempname g21 g22 g23 g24 g25 g26
tempname g31 g32 g33 g34 g35 g36
tempname g41 g42 g43 g44 g45 g46
tempname g51 g52 g53 g54 g55 g56
tempname g61 g62 g63 g64 g65 g66

scalar `g11' = `at'[1,13]
scalar `g12' = `at'[1,14]
scalar `g13' = `at'[1,15]
scalar `g14' = `at'[1,16]
scalar `g15' = `at'[1,17]
scalar `g16' = `at'[1,18]

scalar `g21' = `g12'
scalar `g22' = `at'[1,19]
scalar `g23' = `at'[1,20]
scalar `g24' = `at'[1,21]
scalar `g25' = `at'[1,22]
scalar `g26' = `at'[1,23]

scalar `g31' = `g13'
scalar `g32' = `g23'
scalar `g33' = `at'[1,24]
scalar `g34' = `at'[1,25]
scalar `g35' = `at'[1,26]
scalar `g36' = `at'[1,27]

scalar `g41' = `g14'
scalar `g42' = `g24'
scalar `g43' = `g34'
scalar `g44' = `at'[1,28]
scalar `g45' = `at'[1,29]
scalar `g46' = `at'[1,30]

scalar `g51' = `g15'
scalar `g52' = `g25'
scalar `g53' = `g35'
scalar `g54' = `g45'
scalar `g55' = `at'[1,31]
scalar `g56' = `at'[1,32]

scalar `g61' = `g16'
scalar `g62' = `g26'
scalar `g63' = `g36'
scalar `g64' = `g46'
scalar `g65' = `g56'
scalar `g66' = `at'[1,33]

tempname l1 l2 l3 l4 l5 l6
scalar `l1' = `at'[1,34]
scalar `l2' = `at'[1,35]
scalar `l3' = `at'[1,36]
scalar `l4' = `at'[1,37]
scalar `l5' = `at'[1,38]
scalar `l6' = `at'[1,39]

**add household demographics variables
*
tempname r11 r12 r13 r14 r15 r16 r17 r18 r19 r110 r111 r112 r113
tempname r21 r22 r23 r24 r25 r26 r27 r28 r29 r210 r211 r212 r213
tempname r31 r32 r33 r34 r35 r36 r37 r38 r39 r310 r311 r312 r313
tempname r41 r42 r43 r44 r45 r46 r47 r48 r49 r410 r411 r412 r413
tempname r51 r52 r53 r54 r55 r56 r57 r58 r59 r510 r511 r512 r513
tempname r61 r62 r63 r64 r65 r66 r67 r68 r69 r610 r611 r612 r613

scalar `r11' = `at'[1,40]
scalar `r12' = `at'[1,41]
scalar `r13' = `at'[1,42]
scalar `r14' = `at'[1,43]
scalar `r15' = `at'[1,44]
scalar `r16' = `at'[1,45]
scalar `r17' = `at'[1,46]
scalar `r18' = `at'[1,47]
scalar `r19' = `at'[1,48]
scalar `r110' = `at'[1,49]
scalar `r111' = `at'[1,50]
scalar `r112' = `at'[1,51]
scalar `r113' = `at'[1,52]

scalar `r21' = `at'[1,53]
scalar `r22' = `at'[1,54]
scalar `r23' = `at'[1,55]
scalar `r24' = `at'[1,56]
scalar `r25' = `at'[1,57]
scalar `r26' = `at'[1,58]
scalar `r27' = `at'[1,59]
scalar `r28' = `at'[1,60]
scalar `r29' = `at'[1,61]
scalar `r210' = `at'[1,62]
scalar `r211' = `at'[1,63]
scalar `r212' = `at'[1,64]
scalar `r213' = `at'[1,65]

scalar `r31' = `at'[1,66]
scalar `r32' = `at'[1,67]
scalar `r33' = `at'[1,68]
scalar `r34' = `at'[1,69]
scalar `r35' = `at'[1,70]
scalar `r36' = `at'[1,71]
scalar `r37' = `at'[1,72]
scalar `r38' = `at'[1,73]
scalar `r39' = `at'[1,74]
scalar `r310' = `at'[1,75]
scalar `r311' = `at'[1,76]
scalar `r312' = `at'[1,77]
scalar `r313' = `at'[1,78]

scalar `r41' = `at'[1,79]
scalar `r42' = `at'[1,80]
scalar `r43' = `at'[1,81]
scalar `r44' = `at'[1,82]
scalar `r45' = `at'[1,83]
scalar `r46' = `at'[1,84]
scalar `r47' = `at'[1,85]
scalar `r48' = `at'[1,86]
scalar `r49' = `at'[1,87]
scalar `r410' = `at'[1,88]
scalar `r411' = `at'[1,89]
scalar `r412' = `at'[1,90]
scalar `r413' = `at'[1,91]

scalar `r51' = `at'[1,92]
scalar `r52' = `at'[1,93]
scalar `r53' = `at'[1,94]
scalar `r54' = `at'[1,95]
scalar `r55' = `at'[1,96]
scalar `r56' = `at'[1,97]
scalar `r57' = `at'[1,98]
scalar `r58' = `at'[1,99]
scalar `r59' = `at'[1,100]
scalar `r510' = `at'[1,101]
scalar `r511' = `at'[1,102]
scalar `r512' = `at'[1,103]
scalar `r513' = `at'[1,104]

scalar `r61' = `at'[1,105]
scalar `r62' = `at'[1,106]
scalar `r63' = `at'[1,107]
scalar `r64' = `at'[1,108]
scalar `r65' = `at'[1,109]
scalar `r66' = `at'[1,110]
scalar `r67' = `at'[1,111]
scalar `r68' = `at'[1,112]
scalar `r69' = `at'[1,113]
scalar `r610' = `at'[1,114]
scalar `r611' = `at'[1,115]
scalar `r612' = `at'[1,116]
scalar `r613' = `at'[1,117]

*r11, r12, r13 estimated with loops:
loc start=118
forv i=1(1)13 {
scalar `r11`i''=`at'[1,`start']
loc start=`start'+1
}
*
*
loc start=131
forv i=1(1)13 {
scalar `r12`i''=`at'[1,`start']
loc start=`start'+1
}
*
*
loc start=144
forv i=1(1)13 {
scalar `r13`i''=`at'[1,`start']
loc start=`start'+1
}
*
*
*
**pdf
*
tempname d1 d2 d3 d4 d5 d6
scalar `d1' = `at'[1,157]
scalar `d2' = `at'[1,158]
scalar `d3' = `at'[1,159]
scalar `d4' = `at'[1,160]
scalar `d5' = `at'[1,161]
scalar `d6' = `at'[1,162]

quietly {
// First get the price index
// I set a_0 = 5
tempvar lnpindex
gen double `lnpindex' = 5 + `a1'*`lnp1' + `a2'*`lnp2'+ `a3'*`lnp3' + `a4'*`lnp4'+ `a5'*`lnp5'+ `a6'*`lnp6'
forvalues i = 1/6 {
forvalues j = 1/6 {
replace `lnpindex' = `lnpindex' + 0.5*`g`i'`j''*`lnp`i''*`lnp`j''
}
}
// The b(p) term in the QUAIDS model:
tempvar bofp
gen double `bofp' = 0
forvalues i = 1/6 {
replace `bofp' = `bofp' + `lnp`i''*`b`i''
}
replace `bofp' = exp(`bofp')

replace `w1' = (`a1' + `g11'*`lnp1' + `g12'*`lnp2' +`g13'*`lnp3' + `g14'*`lnp4' + `g15'*`lnp5'+ `g16'*`lnp6' + `b1'*(`lnexp' - `lnpindex') + `l1'/`bofp'*(`lnexp' - `lnpindex')^2 +`r11'*`x1' +`r12'*`x2' + `r13'*`x3'+ `r14'*`x4' +`r15'*`x5' +`r16'*`x6' +`r17'*`x7' + `r18'*`x8' + `r19'*`x9' + `r110'*`x10'+`r111'*`x11' +`r112'*`x12' + `r113'*`x13') * `cdf1' + `d1'*`pdf1'

replace `w2' = (`a2' + `g21'*`lnp1' + `g22'*`lnp2' +`g23'*`lnp3' + `g24'*`lnp4' + `g25'*`lnp5' + `g26'*`lnp6' +`b2'*(`lnexp' - `lnpindex') + `l2'/`bofp'*(`lnexp' - `lnpindex')^2 +`r21'*`x1' +`r22'*`x2' + `r23'*`x3'+ `r24'*`x4' +`r25'*`x5' +`r26'*`x6' +`r27'*`x7' + `r28'*`x8' + `r29'*`x9' + `r210'*`x10'+`r211'*`x11' +`r212'*`x12' + `r213'*`x13')*`cdf2' +`d2'*`pdf2'

replace `w3' = (`a3' + `g31'*`lnp1' + `g32'*`lnp2' +`g33'*`lnp3' + `g34'*`lnp4' + `g35'*`lnp5' + `g36'*`lnp6' + `b3'*(`lnexp' - `lnpindex') + `l3'/`bofp'*(`lnexp' - `lnpindex')^2 +`r31'*`x1' +`r32'*`x2' + `r33'*`x3'+ `r34'*`x4' +`r35'*`x5' +`r36'*`x6' +`r37'*`x7' + `r38'*`x8' + `r39'*`x9' + `r310'*`x10'+`r311'*`x11' +`r312'*`x12' + `r313'*`x13')*`cdf3' +`d3'*`pdf3'

replace `w4' = (`a4' + `g41'*`lnp1' + `g42'*`lnp2' +`g43'*`lnp3' + `g44'*`lnp4' + `g45'*`lnp5' + `g46'*`lnp6' +`b4'*(`lnexp' - `lnpindex') + `l4'/`bofp'*(`lnexp' - `lnpindex')^2 +`r41'*`x1' +`r42'*`x2' + `r43'*`x3'+ `r44'*`x4' +`r45'*`x5' +`r46'*`x6' +`r47'*`x7' + `r48'*`x8' + `r49'*`x9' + `r410'*`x10'+`r411'*`x11' +`r412'*`x12' + `r413'*`x13')*`cdf4' +`d4'*`pdf4'

replace `w5' = (`a5' + `g51'*`lnp1' + `g52'*`lnp2' +`g53'*`lnp3' + `g54'*`lnp4' + `g55'*`lnp5' + `g56'*`lnp6' +`b5'*(`lnexp' - `lnpindex') + `l5'/`bofp'*(`lnexp' - `lnpindex')^2 +`r51'*`x1' +`r52'*`x2' + `r53'*`x3'+ `r54'*`x4' +`r55'*`x5' +`r56'*`x6' +`r57'*`x7' + `r58'*`x8' + `r59'*`x9' + `r510'*`x10'+`r511'*`x11' +`r512'*`x12' + `r513'*`x13')*`cdf5' +`d5'*`pdf5'

replace `w6' = (`a6' + `g61'*`lnp1' + `g62'*`lnp2' +`g63'*`lnp3' + `g64'*`lnp4' + `g65'*`lnp5' + `g66'*`lnp6' +`b6'*(`lnexp' - `lnpindex') + `l6'/`bofp'*(`lnexp' - `lnpindex')^2 +`r61'*`x1' +`r62'*`x2' + `r63'*`x3'+ `r64'*`x4' +`r65'*`x5' +`r66'*`x6' +`r67'*`x7' + `r68'*`x8' + `r69'*`x9' + `r610'*`x10'+`r611'*`x11' +`r612'*`x12' + `r613'*`x13')*`cdf6' +`d6'*`pdf6'

}
end

set trace off

/* cdfs have to be added to command */
glo cdfs ""
forv i=1(1)6 {
glo cdfs "${cdfs} cdf`i'"
}

glo A_NOT =5
*
noi nlsur quaids @ w1 w2 w3 w4 w5 w6 lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnexp x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 pdf1 pdf2 pdf3 pdf4 pdf5 pdf6 ${cdfs}, ifgnls nequations(6) param(a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 b6 g11 g12 g13 g14 g15 g16 g22 g23 g24 g25 g26 g33 g34 g35 g36 g44 g45 g46 g55 g56 g66 l1 l2 l3 l4 l5 l6 r11 r12 r13 r14 r15 r16 r17 r18 r19 r110 r111 r112 r113 r21 r22 r23 r24 r25 r26 r27 r28 r29 r210 r211 r212 r213 r31 r32 r33 r34 r35 r36 r37 r38 r39 r310 r311 r312 r313 r41 r42 r43 r44 r45 r46 r47 r48 r49 r410 r411 r412 r413 r51 r52 r53 r54 r55 r56 r57 r58 r59 r510 r511 r512 r513 r61 r62 r63 r64 r65 r66 r67 r68 r69 r610 r611 r612 r613 d1 d2 d3 d4 d5 d6)

est store quaidsNNP2

set trace on
set tracedepth 4

* Share means and price means
quietly {
foreach x of varlist w* lnp* lnexp {
sum `x'
scalar `x'mean=r(mean)
}
* Price indexes
glo asum "_b[a1]*lnp1mean"
forv i=2(1)6 {
glo asum "${asum} + _b[a`i']*lnp`i'mean"
}
glo gsum ""
forv i=1(1)6 {
forv j=1(1)6 {
glo gsum "${gsum} + 0.5*_b[g`i'`j']*lnp`i'mean*lnp`j'mean"
}
}
glo ap "6.11 + ${asum} ${gsum}"
glo bp "_b[b1]*lnp1mean"
forv i=2(1)6 {
glo bp "${bp} + _b[b`i']*lnp`i'mean"
}
glo bp "(exp(${bp}))"
* Mus
forv i=1(1)6 {
glo mu`i' "_b[b`i'] + 2*_b[l`i']/${bp}*(lnexpmean-(${ap}))"
}
forv j=1(1)6 {
glo gsum2`j' ""
forv k=1(1)6 {
glo gsum2`j' "${gsum2`j'} + _b[g`j'`k']*lnp`k'mean"
}
}
}
*
*ereturn list
*

forv i=1(1)6 {
forv j=1(1)6 {
glo delta=cond(`i'==`j',1,0)
glo mu`i'`j' "_b[g`i'`j'] - ${mu`i'}*(_b[a`j'] ${gsum2`j'})-_b[l`i']*_b[b`j']/${bp}*(lnexpmean - (${ap}))^2"
* If expression is too long, split it
cap nlcom (elasexp`i': ${mu`i'}/w`i'mean + 1) (mu`i'`j': ${mu`i'`j'}), post noheader
if _rc {

qui nlcom (elasexp`i': ${mu`i'}/w`i'mean + 1) (mu`i'`j'f: (1e+2)*(${mu`i'`j'})), post noheader
qui nlcom (elasexp`i': _b[elasexp`i']) (mu`i'`j':_b[mu`i'`j'f]/(1e+2)), post noheader
}
* Uncompensated price elasticity
nlcom (elasexp`i': _b[elasexp`i']) (elu`i'`j':_b[mu`i'`j']/w`i'mean - ${delta}) , post noheader
* Compensated price elasticity
nlcom (elc`i'`j': _b[elu`i'`j'] + _b[elasexp`i']*w`j'mean), noheader
qui est restore quaidsmariko
}
}
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#6

07 Aug 2017, 08:33

Harris Mazari Gupreet's code in #2 will do what you want. I had understood you to want something different.

Atchara Patoom Your post is unrelated to the topic of this thread. Please repost starting a new topic, and use a title that is informative about the question you are asking. Also, before reposting, please read the FAQ, especially #12, regarding the preferred way to show code here.
Comment

Announcement

How to keep data regarding reference person only?

Comment

Comment

Comment

Comment

Comment