Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep data regarding reference person only?

    Hi, I want to know how I can select a reference person from each household using Stata..

    I am working with household data and I want to chose a reference person for each household. I have decided to use the person with the highest income as the reference person. I have looked at ways to do it online and have not been able to find the correct command to do it. I know its a beginners question and I am sorry for wasting your time if you feel that.

    The variables are named hhid (househoild ID) and income.(Disposable income for each respondent) in my data. I need to learn a command to highlight the reference person (preferably keeping data on reference person only and deleting others).

    Please note my data looks like this Preferably I need it to look like this
    hhid income hhid income
    217 50 217 50
    218 40 218 60
    218 60 219 40
    218 30 220 90
    219 20
    219 40
    220 90
    220 70
    220 80

    There are a total of more than 9000 households in the data set and obviously I can not just manually check the income for each person in each household. Kindly guide me on what code to use to achieve this. (if income is similar for two people then I would like to use age as the second filter)

    P.S. I feel really scared of asking such a beginner question (even though I have read that most people are polite and helpful). i have tried my best to follow guides on 'how to ask a good question on STATALIST'.I have tried to find the answer to my question on STATALIST and google but have failed maybe I am searching the wrong words

  • #2
    Hi,

    you may use the collapse command -

    . collapse (max) income , by(hhid)

    Only the households with maximum income will be retained by this command.

    Comment


    • #3
      I'm a bit confused by your description of your data. What you show cannot be right, because you cannot have two variables with the same name in a Stata data set. It would have made life easier had you posted an example of your actual Stata data using the -dataex- command. (More on that below.) So I'm going to make a guess as to what your data looks like. And if I'm wrong, I will have wasted both your time and mine.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int hhid1 float age1 byte income1 int hhid2 float age2 byte income2
      217 41 50 217 40 50
      218 40 40 218 36 60
      218 41 60 219 34 40
      218 41 30 220 26 90
      219 38 20   . 36  .
      219 41 40   . 34  .
      220 42 90   . 39  .
      220 41 70   . 35  .
      220 39 80   . 34  .
      end
      
      //    GET DATA INTO LONG LAYOUT
      gen long obs_no = _n
      reshape long hhid age income, i(obs_no) j(_j)
      drop _j
      drop if missing(hhid)
      
      //    IDENTIFY PERSON WITH HIGHEST INCOME
      //    (AND IF A TIE, BREAK TIE WITH OLDEST)
      gen byte income_missing = missing(income)
      by hhid income_missing (income age), sort: gen byte reference = (_n == _N & !income_missing)
      So the above example data assumes that you have multiple people in each observation. So the first step is to regularize the data structure by having each person be a separate observation in the data. This is called long layout, and it is the preferred data organization in Stata for nearly all data management and analysis commands. (There are a few things that work better with wide layout, but they don't come up very often.)

      Then it is a matter of sorting the data by income (and age within income) within households and tagging the last observation in the household. Things are slightly complicated because in Stata, missing value for numeric values always sorts last; missing value is larger than every real number in Stata. So we have to actually segregate the observations with missing income and avoid tagging them.

      In the future, please show real Stata data examples, and use -dataex- to do it, as I have done above. Unfortunately, -dataex- is not part of official Stata. But you can get it by running -ssc install dataex-. Then run -help dataex- to read the simple instructions for using it. When you use -dataex-, you enable those who want to help you to create a complete, detailed, and faithful replica of your Stata example with a simple copy/paste operation.

      Added: Crossed with #2, which relies on a different interpretation of what is wanted and produces a dataset containing only the reference person information. The code shown here identifies the reference observations in a 0/1 variable but retains all the original observations.

      Comment


      • #4
        Thank you Gurpreet and Clyde. Sure I will try to use the -dataex- command next time actually i am new to both Statalist and Stata so still learning. Thanks.

        Just to clarify this is how I meant it to look:

        Please note my data looks like this.........................................Prefe rably I need it to look like this
        hhid ............income................................ ............................................ hhid ............income
        217 ..............50 .................................................. ..................................217 ...............50
        218 ..............40 .................................................. ..................................218 ...............60
        218 ..............60 .................................................. ..................................219 ...............40
        218 ..............30 .................................................. ..................................220 ...............90
        219 ..............20
        219 ..............40
        220 ..............90
        220 ..............70
        220 ..............80

        Sorry for the confusion and thanks again.

        Comment


        • #5
          Hi Stata listers

          I try to estimate the quaids model with censoring demand following Shonkwiler and Yen approach by nlsur quaids command in STATA.
          When I use nlsur quiads command, I always get these following massages



          1. nlsurquaids returned 199

          verify that nlsurquaids is a function evaluator program

          r(199);



          2.varlist required

          r(100);



          How can I get rid of these massages? Do you have any advice regarding my problem?




          Thank you in advance for your kindness

          Atchara Patoom

          Here are the Stata codes:

          program nlsurquaids
          *version13
          syntax varlist(min=38 max=38) if, at(name)
          tokenize`varlist'
          args w1 w2 w3 w4 w5 w6 lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnexp x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 pdf1 pdf2 pdf3 pdf4 pdf5 pdf6 cdf1 cdf2 cdf3 cdf4 cdf5 cdf6


          tempname a1 a2 a3 a4 a5 a6
          scalar `a1' = `at'[1,1]
          scalar `a2' = `at'[1,2]
          scalar `a3' = `at'[1,3]
          scalar `a4' = `at'[1,4]
          scalar `a5' = `at'[1,5]
          scalar `a6' = `at'[1,6]

          tempname b1 b2 b3 b4 b5 b6
          scalar `b1' = `at'[1,7]
          scalar `b2' = `at'[1,8]
          scalar `b3' = `at'[1,9]
          scalar `b4' = `at'[1,10]
          scalar `b5' = `at'[1,11]
          scalar `b6' = `at'[1,12]


          tempname g11 g12 g13 g14 g15 g16
          tempname g21 g22 g23 g24 g25 g26
          tempname g31 g32 g33 g34 g35 g36
          tempname g41 g42 g43 g44 g45 g46
          tempname g51 g52 g53 g54 g55 g56
          tempname g61 g62 g63 g64 g65 g66



          scalar `g11' = `at'[1,13]
          scalar `g12' = `at'[1,14]
          scalar `g13' = `at'[1,15]
          scalar `g14' = `at'[1,16]
          scalar `g15' = `at'[1,17]
          scalar `g16' = `at'[1,18]


          scalar `g21' = `g12'
          scalar `g22' = `at'[1,19]
          scalar `g23' = `at'[1,20]
          scalar `g24' = `at'[1,21]
          scalar `g25' = `at'[1,22]
          scalar `g26' = `at'[1,23]


          scalar `g31' = `g13'
          scalar `g32' = `g23'
          scalar `g33' = `at'[1,24]
          scalar `g34' = `at'[1,25]
          scalar `g35' = `at'[1,26]
          scalar `g36' = `at'[1,27]


          scalar `g41' = `g14'
          scalar `g42' = `g24'
          scalar `g43' = `g34'
          scalar `g44' = `at'[1,28]
          scalar `g45' = `at'[1,29]
          scalar `g46' = `at'[1,30]


          scalar `g51' = `g15'
          scalar `g52' = `g25'
          scalar `g53' = `g35'
          scalar `g54' = `g45'
          scalar `g55' = `at'[1,31]
          scalar `g56' = `at'[1,32]


          scalar `g61' = `g16'
          scalar `g62' = `g26'
          scalar `g63' = `g36'
          scalar `g64' = `g46'
          scalar `g65' = `g56'
          scalar `g66' = `at'[1,33]

          tempname l1 l2 l3 l4 l5 l6
          scalar `l1' = `at'[1,34]
          scalar `l2' = `at'[1,35]
          scalar `l3' = `at'[1,36]
          scalar `l4' = `at'[1,37]
          scalar `l5' = `at'[1,38]
          scalar `l6' = `at'[1,39]

          **add household demographics variables
          *
          tempname r11 r12 r13 r14 r15 r16 r17 r18 r19 r110 r111 r112 r113
          tempname r21 r22 r23 r24 r25 r26 r27 r28 r29 r210 r211 r212 r213
          tempname r31 r32 r33 r34 r35 r36 r37 r38 r39 r310 r311 r312 r313
          tempname r41 r42 r43 r44 r45 r46 r47 r48 r49 r410 r411 r412 r413
          tempname r51 r52 r53 r54 r55 r56 r57 r58 r59 r510 r511 r512 r513
          tempname r61 r62 r63 r64 r65 r66 r67 r68 r69 r610 r611 r612 r613


          scalar `r11' = `at'[1,40]
          scalar `r12' = `at'[1,41]
          scalar `r13' = `at'[1,42]
          scalar `r14' = `at'[1,43]
          scalar `r15' = `at'[1,44]
          scalar `r16' = `at'[1,45]
          scalar `r17' = `at'[1,46]
          scalar `r18' = `at'[1,47]
          scalar `r19' = `at'[1,48]
          scalar `r110' = `at'[1,49]
          scalar `r111' = `at'[1,50]
          scalar `r112' = `at'[1,51]
          scalar `r113' = `at'[1,52]



          scalar `r21' = `at'[1,53]
          scalar `r22' = `at'[1,54]
          scalar `r23' = `at'[1,55]
          scalar `r24' = `at'[1,56]
          scalar `r25' = `at'[1,57]
          scalar `r26' = `at'[1,58]
          scalar `r27' = `at'[1,59]
          scalar `r28' = `at'[1,60]
          scalar `r29' = `at'[1,61]
          scalar `r210' = `at'[1,62]
          scalar `r211' = `at'[1,63]
          scalar `r212' = `at'[1,64]
          scalar `r213' = `at'[1,65]



          scalar `r31' = `at'[1,66]
          scalar `r32' = `at'[1,67]
          scalar `r33' = `at'[1,68]
          scalar `r34' = `at'[1,69]
          scalar `r35' = `at'[1,70]
          scalar `r36' = `at'[1,71]
          scalar `r37' = `at'[1,72]
          scalar `r38' = `at'[1,73]
          scalar `r39' = `at'[1,74]
          scalar `r310' = `at'[1,75]
          scalar `r311' = `at'[1,76]
          scalar `r312' = `at'[1,77]
          scalar `r313' = `at'[1,78]



          scalar `r41' = `at'[1,79]
          scalar `r42' = `at'[1,80]
          scalar `r43' = `at'[1,81]
          scalar `r44' = `at'[1,82]
          scalar `r45' = `at'[1,83]
          scalar `r46' = `at'[1,84]
          scalar `r47' = `at'[1,85]
          scalar `r48' = `at'[1,86]
          scalar `r49' = `at'[1,87]
          scalar `r410' = `at'[1,88]
          scalar `r411' = `at'[1,89]
          scalar `r412' = `at'[1,90]
          scalar `r413' = `at'[1,91]


          scalar `r51' = `at'[1,92]
          scalar `r52' = `at'[1,93]
          scalar `r53' = `at'[1,94]
          scalar `r54' = `at'[1,95]
          scalar `r55' = `at'[1,96]
          scalar `r56' = `at'[1,97]
          scalar `r57' = `at'[1,98]
          scalar `r58' = `at'[1,99]
          scalar `r59' = `at'[1,100]
          scalar `r510' = `at'[1,101]
          scalar `r511' = `at'[1,102]
          scalar `r512' = `at'[1,103]
          scalar `r513' = `at'[1,104]

          scalar `r61' = `at'[1,105]
          scalar `r62' = `at'[1,106]
          scalar `r63' = `at'[1,107]
          scalar `r64' = `at'[1,108]
          scalar `r65' = `at'[1,109]
          scalar `r66' = `at'[1,110]
          scalar `r67' = `at'[1,111]
          scalar `r68' = `at'[1,112]
          scalar `r69' = `at'[1,113]
          scalar `r610' = `at'[1,114]
          scalar `r611' = `at'[1,115]
          scalar `r612' = `at'[1,116]
          scalar `r613' = `at'[1,117]

          *r11, r12, r13 estimated with loops:
          loc start=118
          forv i=1(1)13 {
          scalar `r11`i''=`at'[1,`start']
          loc start=`start'+1
          }
          *
          *
          loc start=131
          forv i=1(1)13 {
          scalar `r12`i''=`at'[1,`start']
          loc start=`start'+1
          }
          *
          *
          loc start=144
          forv i=1(1)13 {
          scalar `r13`i''=`at'[1,`start']
          loc start=`start'+1
          }
          *
          *
          *
          **pdf
          *
          tempname d1 d2 d3 d4 d5 d6
          scalar `d1' = `at'[1,157]
          scalar `d2' = `at'[1,158]
          scalar `d3' = `at'[1,159]
          scalar `d4' = `at'[1,160]
          scalar `d5' = `at'[1,161]
          scalar `d6' = `at'[1,162]


          quietly {
          // First get the price index
          // I set a_0 = 5
          tempvar lnpindex
          gen double `lnpindex' = 5 + `a1'*`lnp1' + `a2'*`lnp2'+ `a3'*`lnp3' + `a4'*`lnp4'+ `a5'*`lnp5'+ `a6'*`lnp6'
          forvalues i = 1/6 {
          forvalues j = 1/6 {
          replace `lnpindex' = `lnpindex' + 0.5*`g`i'`j''*`lnp`i''*`lnp`j''
          }
          }
          // The b(p) term in the QUAIDS model:
          tempvar bofp
          gen double `bofp' = 0
          forvalues i = 1/6 {
          replace `bofp' = `bofp' + `lnp`i''*`b`i''
          }
          replace `bofp' = exp(`bofp')

          replace `w1' = (`a1' + `g11'*`lnp1' + `g12'*`lnp2' +`g13'*`lnp3' + `g14'*`lnp4' + `g15'*`lnp5'+ `g16'*`lnp6' + `b1'*(`lnexp' - `lnpindex') + `l1'/`bofp'*(`lnexp' - `lnpindex')^2 +`r11'*`x1' +`r12'*`x2' + `r13'*`x3'+ `r14'*`x4' +`r15'*`x5' +`r16'*`x6' +`r17'*`x7' + `r18'*`x8' + `r19'*`x9' + `r110'*`x10'+`r111'*`x11' +`r112'*`x12' + `r113'*`x13') * `cdf1' + `d1'*`pdf1'


          replace `w2' = (`a2' + `g21'*`lnp1' + `g22'*`lnp2' +`g23'*`lnp3' + `g24'*`lnp4' + `g25'*`lnp5' + `g26'*`lnp6' +`b2'*(`lnexp' - `lnpindex') + `l2'/`bofp'*(`lnexp' - `lnpindex')^2 +`r21'*`x1' +`r22'*`x2' + `r23'*`x3'+ `r24'*`x4' +`r25'*`x5' +`r26'*`x6' +`r27'*`x7' + `r28'*`x8' + `r29'*`x9' + `r210'*`x10'+`r211'*`x11' +`r212'*`x12' + `r213'*`x13')*`cdf2' +`d2'*`pdf2'


          replace `w3' = (`a3' + `g31'*`lnp1' + `g32'*`lnp2' +`g33'*`lnp3' + `g34'*`lnp4' + `g35'*`lnp5' + `g36'*`lnp6' + `b3'*(`lnexp' - `lnpindex') + `l3'/`bofp'*(`lnexp' - `lnpindex')^2 +`r31'*`x1' +`r32'*`x2' + `r33'*`x3'+ `r34'*`x4' +`r35'*`x5' +`r36'*`x6' +`r37'*`x7' + `r38'*`x8' + `r39'*`x9' + `r310'*`x10'+`r311'*`x11' +`r312'*`x12' + `r313'*`x13')*`cdf3' +`d3'*`pdf3'



          replace `w4' = (`a4' + `g41'*`lnp1' + `g42'*`lnp2' +`g43'*`lnp3' + `g44'*`lnp4' + `g45'*`lnp5' + `g46'*`lnp6' +`b4'*(`lnexp' - `lnpindex') + `l4'/`bofp'*(`lnexp' - `lnpindex')^2 +`r41'*`x1' +`r42'*`x2' + `r43'*`x3'+ `r44'*`x4' +`r45'*`x5' +`r46'*`x6' +`r47'*`x7' + `r48'*`x8' + `r49'*`x9' + `r410'*`x10'+`r411'*`x11' +`r412'*`x12' + `r413'*`x13')*`cdf4' +`d4'*`pdf4'



          replace `w5' = (`a5' + `g51'*`lnp1' + `g52'*`lnp2' +`g53'*`lnp3' + `g54'*`lnp4' + `g55'*`lnp5' + `g56'*`lnp6' +`b5'*(`lnexp' - `lnpindex') + `l5'/`bofp'*(`lnexp' - `lnpindex')^2 +`r51'*`x1' +`r52'*`x2' + `r53'*`x3'+ `r54'*`x4' +`r55'*`x5' +`r56'*`x6' +`r57'*`x7' + `r58'*`x8' + `r59'*`x9' + `r510'*`x10'+`r511'*`x11' +`r512'*`x12' + `r513'*`x13')*`cdf5' +`d5'*`pdf5'



          replace `w6' = (`a6' + `g61'*`lnp1' + `g62'*`lnp2' +`g63'*`lnp3' + `g64'*`lnp4' + `g65'*`lnp5' + `g66'*`lnp6' +`b6'*(`lnexp' - `lnpindex') + `l6'/`bofp'*(`lnexp' - `lnpindex')^2 +`r61'*`x1' +`r62'*`x2' + `r63'*`x3'+ `r64'*`x4' +`r65'*`x5' +`r66'*`x6' +`r67'*`x7' + `r68'*`x8' + `r69'*`x9' + `r610'*`x10'+`r611'*`x11' +`r612'*`x12' + `r613'*`x13')*`cdf6' +`d6'*`pdf6'



          }
          end

          set trace off

          /* cdfs have to be added to command */
          glo cdfs ""
          forv i=1(1)6 {
          glo cdfs "${cdfs} cdf`i'"
          }

          glo A_NOT =5
          *
          noi nlsur quaids @ w1 w2 w3 w4 w5 w6 lnp1 lnp2 lnp3 lnp4 lnp5 lnp6 lnexp x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 pdf1 pdf2 pdf3 pdf4 pdf5 pdf6 ${cdfs}, ifgnls nequations(6) param(a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 b6 g11 g12 g13 g14 g15 g16 g22 g23 g24 g25 g26 g33 g34 g35 g36 g44 g45 g46 g55 g56 g66 l1 l2 l3 l4 l5 l6 r11 r12 r13 r14 r15 r16 r17 r18 r19 r110 r111 r112 r113 r21 r22 r23 r24 r25 r26 r27 r28 r29 r210 r211 r212 r213 r31 r32 r33 r34 r35 r36 r37 r38 r39 r310 r311 r312 r313 r41 r42 r43 r44 r45 r46 r47 r48 r49 r410 r411 r412 r413 r51 r52 r53 r54 r55 r56 r57 r58 r59 r510 r511 r512 r513 r61 r62 r63 r64 r65 r66 r67 r68 r69 r610 r611 r612 r613 d1 d2 d3 d4 d5 d6)

          est store quaidsNNP2

          set trace on
          set tracedepth 4

          * Share means and price means
          quietly {
          foreach x of varlist w* lnp* lnexp {
          sum `x'
          scalar `x'mean=r(mean)
          }
          * Price indexes
          glo asum "_b[a1]*lnp1mean"
          forv i=2(1)6 {
          glo asum "${asum} + _b[a`i']*lnp`i'mean"
          }
          glo gsum ""
          forv i=1(1)6 {
          forv j=1(1)6 {
          glo gsum "${gsum} + 0.5*_b[g`i'`j']*lnp`i'mean*lnp`j'mean"
          }
          }
          glo ap "6.11 + ${asum} ${gsum}"
          glo bp "_b[b1]*lnp1mean"
          forv i=2(1)6 {
          glo bp "${bp} + _b[b`i']*lnp`i'mean"
          }
          glo bp "(exp(${bp}))"
          * Mus
          forv i=1(1)6 {
          glo mu`i' "_b[b`i'] + 2*_b[l`i']/${bp}*(lnexpmean-(${ap}))"
          }
          forv j=1(1)6 {
          glo gsum2`j' ""
          forv k=1(1)6 {
          glo gsum2`j' "${gsum2`j'} + _b[g`j'`k']*lnp`k'mean"
          }
          }
          }
          *
          *ereturn list
          *

          forv i=1(1)6 {
          forv j=1(1)6 {
          glo delta=cond(`i'==`j',1,0)
          glo mu`i'`j' "_b[g`i'`j'] - ${mu`i'}*(_b[a`j'] ${gsum2`j'})-_b[l`i']*_b[b`j']/${bp}*(lnexpmean - (${ap}))^2"
          * If expression is too long, split it
          cap nlcom (elasexp`i': ${mu`i'}/w`i'mean + 1) (mu`i'`j': ${mu`i'`j'}), post noheader
          if _rc {


          qui nlcom (elasexp`i': ${mu`i'}/w`i'mean + 1) (mu`i'`j'f: (1e+2)*(${mu`i'`j'})), post noheader
          qui nlcom (elasexp`i': _b[elasexp`i']) (mu`i'`j':_b[mu`i'`j'f]/(1e+2)), post noheader
          }
          * Uncompensated price elasticity
          nlcom (elasexp`i': _b[elasexp`i']) (elu`i'`j':_b[mu`i'`j']/w`i'mean - ${delta}) , post noheader
          * Compensated price elasticity
          nlcom (elc`i'`j': _b[elu`i'`j'] + _b[elasexp`i']*w`j'mean), noheader
          qui est restore quaidsmariko
          }
          }

          Comment


          • #6
            Harris Mazari Gupreet's code in #2 will do what you want. I had understood you to want something different.

            Atchara Patoom Your post is unrelated to the topic of this thread. Please repost starting a new topic, and use a title that is informative about the question you are asking. Also, before reposting, please read the FAQ, especially #12, regarding the preferred way to show code here.

            Comment

            Working...
            X