Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an unique personal ID in using SIPP (Survey of Income and Program Participation)

    Dear STATA Professionals,

    How are you today?
    I want to do the panel data analysis by using SIPP (Survey of Income and Program Participation) data set from NBER.
    The SIPP is a nationally representative survey which collects information from a large sample of households every four months (waves) over a
    period of two to three years. For example, 1992 panel contains 9 waves of interviews and each wave has 4 consecutive monthly interviews, so it is person-month interviewing data.
    However, not all individuals appear in the entire 9 waves. Calendar month of interview in each wave is different from rotation groups, but it is uniquely identified by 'reference month' variable. The reference month variable "refmth" has four values from 1 to 4 to distinguish number of interview month but not across waves.
    Some researchers use sample from all reference months, while some researchers use sample from only 1st reference month or 4th reference month per wave due to recall problem.

    For the panel regression in Stata, I have to declare panel ID and time variable like following;
    xtset panelID timevariable

    I have failed several times in declaring xtset with following error message; "repeated time values within panel".
    Whenever I checked "isid panel ID time variable", it shows that they are not uniquely identified.
    I found that I had not uniquely identify individuals in the household. According to the user's guide, we can identify individuals with using following four variables;

    1.suid: Sample Unit Identification number assigning to the household and common to all individuals in that household. This identifier is constructed the same way on each wave
    regardless of moves, to enable matching from wave to wave.
    2. addid: The two-digit address ID code identifies each household associated with the same sample unit identification number. For example, the address ID code is 11
    for all sample addresses that are the same as in Wave 1.
    3. entry : Entry Address ID of the household that this person belonged to at the time this person first became part of the sample.
    4. pnum: Person Number. The person ID is a five-digit number consisting of the two-digit entry address ID and a three-digit person number.
    Person numbers 101, 102, etc., are assigned in Wave 1; 201, 202, etc., are assigned to persons added to the roster
    in Wave 2, and so forth. Usually 101 is for household reference person and 102 for spouse to the reference person, and 103 for children.


    With using above four variables, I have tried following methods to identify an individual, which I learned from this website, Stata Forum.
    1. gen newvar=(suid*10000000) + (addid*100000) + ( entry*1000) + pnum, with expecting to see following results

    10740691111101 in which 1074069 is suid, 11 is addid, 11 is entry, and 101 is pnum.
    It didn't happen, however, because suid*10000000 didn't become 10740690000000 but become 10740689797120.
    It was rounded at the end of digit. I tried to increase digits by using a command "format %25.0g", and also change data type by using "recast double", but still I could see the rounding up.

    2. egen id=concat(suid addid entry pnum) , format(%25.0g)
    Then, I could have a this combined number, for example, 10740691111101 as a string variable.
    So, I tried to convert it into a numeric by using 'encode', but it didn't work with error message "too many variables".
    So, I used a command 'destring', then it was converted into a numeric without rounding up and I applied "format %25.0g" to stretch out numbers.
    Problem is some numbers are exactly same when it is string and is numeric, but many numbers are different in value in numeric from in string.

    For example,
    suid addid entry pnum id(string) id2(numeric)
    1074969 11 11 101 10749691111101 10749691111101
    1074969 11 11 102 10749691111102 10749691111102
    1074969 11 11 103 10749691111103 10749691111102

    In case of pnum 103, it is well combined as a string but after converting it into a numeric, it became a different number especially the last digit.
    This things happen if suid>=90123026. First 36891 observations are o.k., but other observations out of 435757 have this problem, which is that only last digit is different.

    1. How should I handle of rounding up problem at some digit of numbers in the first method above?
    2. Why numbers are different from a string and a numeric in the second method above?
    3. If I want to use all four reference months with as many as waves, how should I declare 'xtset' in using SIPP?
    4. If I want to use only one reference month wish as many as waves, how should I declare 'xtset' in using SIPP?

    I have an experience of using monthly data from IPUMS-CPS for panel regression. It is called 'mish', month in sample household level, which indicates the number of times (from 1 to 8) occupants of a housing unit have been interviewed for the CPS. I didn't have a problem of declaring 'xtset', so I want to shape similar panel structure by using SIPP.

    Will you help me to identify a set of an unique individual and time variable in SIPP for panel regression?

    I generate following sample from the data by using 'dataex'.

    dataex suseqnum suid addid panel wave month year rot refmth entry pnum in 36800/435757

    For example,
    suid addid entry pnum id2(string) id(numeric) suseqnum panel wave month year rot refmth
    934358343 31 11 103 "9343583433111103" 9343583433111102 16831 1992 5 2 1993 2 1

    suseqnem: subsequent number or serial number
    suid: sample unit identification number, household level
    addid: the two-digit address ID code identifies each household
    panel: in this case, it is called 1992 panel
    wave: 1992 panel contains 9 waves
    month: calendar month ranging 1-12
    year: 1992 panel interviews same household over one or two years
    rot: rotation group in which samples are grouped into 4 rotation groups having a different interview periods
    refmth: reference month ranging 1-4. 1 means that a month before interview, for example
    entry: entry address ID of household that this person belonged to at the time this person first became part of the sample.
    pnum: person number. Detail is in the above.

    input double suid byte(addid entry) int pnum str16 id2 double id int(suseqnum panel) byte(wave month) int year byte(rot refmth)
    934358343 31 11 103 "9343583433111103" 9343583433111102 16831 1992 5 2 1993 2 1
    934358343 31 11 103 "9343583433111103" 9343583433111102 16124 1992 7 10 1993 2 1
    934358343 31 11 103 "9343583433111103" 9343583433111102 15517 1992 9 6 1994 2 1
    934358343 31 11 103 "9343583433111103" 9343583433111102 17190 1992 4 10 1992 2 1
    934358343 31 11 104 "9343583433111104" 9343583433111104 15517 1992 9 6 1994 2 1
    934358343 31 11 104 "9343583433111104" 9343583433111104 16831 1992 5 2 1993 2 1
    934358343 31 11 104 "9343583433111104" 9343583433111104 15819 1992 8 2 1994 2 1
    934358343 31 11 104 "9343583433111104" 9343583433111104 16473 1992 6 6 1993 2 1
    934358343 31 11 104 "9343583433111104" 9343583433111104 17190 1992 4 10 1992 2 1
    934358343 31 11 104 "9343583433111104" 9343583433111104 16124 1992 7 10 1993 2 1
    934358343 31 11 102 "9343583433111102" 9343583433111102 15517 1992 9 6 1994 2 1
    934358343 31 11 102 "9343583433111102" 9343583433111102 16124 1992 7 10 1993 2 1
    934358343 31 11 102 "9343583433111102" 9343583433111102 16473 1992 6 6 1993 2 1
    934358343 31 11 102 "9343583433111102" 9343583433111102 15819 1992 8 2 1994 2 1
    934358343 31 11 102 "9343583433111102" 9343583433111102 16831 1992 5 2 1993 2 1
    934358343 31 11 102 "9343583433111102" 9343583433111102 17190 1992 4 10 1992 2 1
    934358343 31 31 301 "9343583433131301" 9343583433131300 16124 1992 7 10 1993 2 1
    934358343 31 31 301 "9343583433131301" 9343583433131300 16473 1992 6 6 1993 2 1
    934358343 31 31 301 "9343583433131301" 9343583433131300 15517 1992 9 6 1994 2 1
    934358343 31 31 301 "9343583433131301" 9343583433131300 15819 1992 8 2 1994 2 1
    934358343 31 31 301 "9343583433131301" 9343583433131300 17190 1992 4 10 1992 2 1
    934358343 31 31 301 "9343583433131301" 9343583433131300 16831 1992 5 2 1993 2 1
    934358746 11 11 401 "9343587461111401" 9343587461111400 16832 1992 5 2 1993 2 1
    934358746 11 11 401 "9343587461111401" 9343587461111400 16474 1992 6 6 1993 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 16832 1992 5 2 1993 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 15820 1992 8 2 1994 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 16474 1992 6 6 1993 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 16125 1992 7 10 1993 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 18103 1992 2 2 1992 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 15518 1992 9 6 1994 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 17819 1992 3 6 1992 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 17191 1992 4 10 1992 2 1
    934358746 11 11 101 "9343587461111101" 9343587461111100 18103 1992 1 10 1991 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 18103 1992 1 10 1991 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 17819 1992 3 6 1992 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 17191 1992 4 10 1992 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 16474 1992 6 6 1993 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 16125 1992 7 10 1993 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 15518 1992 9 6 1994 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 15820 1992 8 2 1994 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 16832 1992 5 2 1993 2 1
    934358746 11 11 102 "9343587461111102" 9343587461111102 18103 1992 2 2 1992 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 16126 1992 7 10 1993 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 18104 1992 1 10 1991 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 15821 1992 8 2 1994 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 15519 1992 9 6 1994 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 18104 1992 2 2 1992 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 16833 1992 5 2 1993 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 17820 1992 3 6 1992 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 16475 1992 6 6 1993 2 1
    934358765 11 11 101 "9343587651111101" 9343587651111100 17192 1992 4 10 1992 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 18104 1992 2 2 1992 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 16833 1992 5 2 1993 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 17192 1992 4 10 1992 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 16126 1992 7 10 1993 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 15821 1992 8 2 1994 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 18104 1992 1 10 1991 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 17820 1992 3 6 1992 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 16475 1992 6 6 1993 2 1
    934358765 11 11 103 "9343587651111103" 9343587651111102 15519 1992 9 6 1994 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 16126 1992 7 10 1993 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 15519 1992 9 6 1994 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 16833 1992 5 2 1993 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 17192 1992 4 10 1992 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 17820 1992 3 6 1992 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 16475 1992 6 6 1993 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 15821 1992 8 2 1994 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 18104 1992 2 2 1992 2 1
    934358765 11 11 102 "9343587651111102" 9343587651111102 18104 1992 1 10 1991 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 18104 1992 2 2 1992 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 16475 1992 6 6 1993 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 16833 1992 5 2 1993 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 17820 1992 3 6 1992 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 17192 1992 4 10 1992 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 15821 1992 8 2 1994 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 15519 1992 9 6 1994 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 18104 1992 1 10 1991 2 1
    934358765 11 11 104 "9343587651111104" 9343587651111104 16126 1992 7 10 1993 2 1
    934651207 11 11 101 "9346512071111101" 9346512071111100 18105 1992 1 10 1991 2 1
    934651207 11 11 101 "9346512071111101" 9346512071111100 18105 1992 2 2 1992 2 1
    934651259 11 11 101 "9346512591111101" 9346512591111100 17822 1992 3 6 1992 2 1
    934651259 11 11 101 "9346512591111101" 9346512591111100 18106 1992 2 2 1992 2 1
    934651259 11 11 101 "9346512591111101" 9346512591111100 18106 1992 1 10 1991 2 1
    934651259 11 11 102 "9346512591111102" 9346512591111102 18106 1992 2 2 1992 2 1
    934651259 11 11 102 "9346512591111102" 9346512591111102 18106 1992 1 10 1991 2 1
    934651259 11 11 102 "9346512591111102" 9346512591111102 17822 1992 3 6 1992 2 1
    934651259 31 11 101 "9346512593111101" 9346512593111100 17194 1992 4 10 1992 2 1
    934651259 31 11 102 "9346512593111102" 9346512593111102 17194 1992 4 10 1992 2 1
    934651259 41 11 101 "9346512594111101" 9346512594111100 16834 1992 5 2 1993 2 1
    934651259 41 11 102 "9346512594111102" 9346512594111102 16834 1992 5 2 1993 2 1
    934651259 51 11 101 "9346512595111101" 9346512595111100 16476 1992 6 6 1993 2 1
    934651259 51 11 101 "9346512595111101" 9346512595111100 16127 1992 7 10 1993 2 1
    934651259 51 11 101 "9346512595111101" 9346512595111100 15520 1992 9 6 1994 2 1
    934651259 51 11 101 "9346512595111101" 9346512595111100 15822 1992 8 2 1994 2 1
    934651259 51 11 102 "9346512595111102" 9346512595111102 15822 1992 8 2 1994 2 1
    934651259 51 11 102 "9346512595111102" 9346512595111102 15520 1992 9 6 1994 2 1
    934651259 51 11 102 "9346512595111102" 9346512595111102 16127 1992 7 10 1993 2 1
    934651259 51 11 102 "9346512595111102" 9346512595111102 16476 1992 6 6 1993 2 1
    934651831 11 11 101 "9346518311111101" 9346518311111100 15823 1992 8 2 1994 2 1
    934651831 11 11 101 "9346518311111101" 9346518311111100 18107 1992 2 2 1992 2 1


    I am sorry it doesn't look beautiful. It's my first time using 'dataex'. Columns are not well organized.
    In the above, some values are same in the string and the numeric but many are different.
    The combined value that I want to have is the number expressed as a string, but converted numeric values are wrong.
    I don't know why.

    Thank you so much sharing of your knowledge and time, in advance.
    Have a good day.
    David.
    Last edited by david krupp; 14 Aug 2017, 00:44.

  • #2
    For question 1, you need to generate double from the beginning; you cannot first generate a float and then recast it to a double in order to recover precision that you've already lost.

    For the second question, I cannot replicate your problem.

    .ÿclearÿ*

    .ÿ
    .ÿinputÿlongÿsuidÿintÿ(addidÿentryÿpnum)ÿstr14ÿ(idÿid2)

    ÿÿÿÿÿÿÿÿÿÿÿÿÿsuidÿÿÿÿÿaddidÿÿÿÿÿentryÿÿÿÿÿÿpnumÿÿÿÿÿÿÿÿÿÿÿÿÿÿidÿÿÿÿÿÿÿÿÿÿÿÿÿid2
    ÿÿ1.ÿ1074969ÿ11ÿ11ÿ101ÿ10749691111101ÿ10749691111101
    ÿÿ2.ÿ1074969ÿ11ÿ11ÿ102ÿ10749691111102ÿ10749691111102
    ÿÿ3.ÿ1074969ÿ11ÿ11ÿ103ÿ10749691111103ÿ10749691111102
    ÿÿ4.ÿend

    .ÿdropÿidÿid2

    .ÿ
    .ÿegenÿstrÿidÿ=ÿconcat(suidÿaddidÿentryÿpnum),ÿformat(%25.0g)

    .ÿdestringÿid,ÿgenerate(id2)
    id:ÿallÿcharactersÿnumeric;ÿid2ÿgeneratedÿasÿdouble

    .ÿ
    .ÿformatÿid2ÿ%25.0f

    .ÿ
    .ÿlist,ÿnoobs

    ÿÿ+------------------------------------------------------------------+
    ÿÿ|ÿÿÿÿsuidÿÿÿaddidÿÿÿentryÿÿÿpnumÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿidÿÿÿÿÿÿÿÿÿÿÿÿÿÿid2ÿ|
    ÿÿ|------------------------------------------------------------------|
    ÿÿ|ÿ1074969ÿÿÿÿÿÿ11ÿÿÿÿÿÿ11ÿÿÿÿ101ÿÿÿ10749691111101ÿÿÿ10749691111101ÿ|
    ÿÿ|ÿ1074969ÿÿÿÿÿÿ11ÿÿÿÿÿÿ11ÿÿÿÿ102ÿÿÿ10749691111102ÿÿÿ10749691111102ÿ|
    ÿÿ|ÿ1074969ÿÿÿÿÿÿ11ÿÿÿÿÿÿ11ÿÿÿÿ103ÿÿÿ10749691111103ÿÿÿ10749691111103ÿ|
    ÿÿ+------------------------------------------------------------------+

    .ÿ
    .ÿisidÿid2

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    Likewise with your longer dataset excerpt. The numerical id (double-precision) is fine even when suid is greater than or equal to 90123026

    .ÿclearÿ*

    .ÿ
    .ÿquietlyÿinputÿintÿsuseqnumÿdoubleÿsuidÿbyteÿaddidÿintÿpanelÿbyte(waveÿmonth)ÿintÿyearÿbyte(rotÿrefmthÿentry)ÿintÿpnum

    .ÿ
    .ÿ
    .ÿegenÿstrÿidÿ=ÿconcat(suidÿaddidÿentryÿpnum),ÿformat(%25.0g)

    .ÿdestringÿid,ÿgenerate(id2)
    id:ÿallÿcharactersÿnumeric;ÿid2ÿgeneratedÿasÿdouble

    .ÿ
    .ÿformatÿid2ÿ%`=strlen(id)'.0f

    .ÿ
    .ÿcontractÿidÿid2,ÿfreq(discard)

    .ÿisidÿid2

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    I'm not familiar with the survey, and so cannot comment on identifying individuals and individual-time combinations.

    Comment


    • #3
      The OP changed the longer excerpt after I posted. Here's the output using the second excerpt. (Same result.)


      .ÿclearÿ*

      .ÿ
      .ÿquietlyÿinputÿdoubleÿsuidÿbyte(addidÿentry)ÿintÿpnumÿstr16ÿid2ÿdoubleÿidÿint(suseqnumÿpanel)ÿbyte(waveÿmonth)ÿintÿyearÿbyte(rotÿrefmth)

      .ÿ
      .ÿegenÿstrÿid3ÿ=ÿconcat(suidÿaddidÿentryÿpnum),ÿformat(%25.0g)

      .ÿdestringÿid3,ÿgenerate(id4)
      id3:ÿallÿcharactersÿnumeric;ÿid4ÿgeneratedÿasÿdouble

      .ÿ
      .ÿformatÿid4ÿ%25.0f

      .ÿ
      .ÿcontractÿidÿid4,ÿfreq(discard)

      .ÿisidÿid4

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .

      Comment


      • #4
        Thanks Mr. Joseph Conveney for your valuable comments.

        For the question 1, this is what I did before
        gen newvar=(suid*10000000) + (addid*100000) + ( entry*1000) + pnum

        As your advice I did like following;
        gen double newvar=(suid*10000000) + (addid*100000) + ( entry*1000) + pnum

        and then, I did "format %25.0g newvar"
        It does work well, but I still can find following error;

        9760009701111100 101

        101 is pnum, so I should have 9760009701111101. I don't know why still some observations have different number at the last digit.

        9957014092111104 103, this number also should be 9957014092111103, another example.

        But, I know now the difference between "gen newvar" and "gen double newvar" for your advice.

        For the question 2. I followed your commands;
        egen str id = concat(suid addid entry pnum), format(%25.0g)
        destring id, generate(id2)
        format id2 %25.0f

        But, still I can find same problem like following;
        id(string) id2(numeric)
        9957014091111103 9957014091111102

        They are different the last digit after converting it into a numeric.

        By the way, I have a question about your comment.
        You said like following;

        egen str id = concat(suid addid entry pnum), format(%25.0g)
        destring id, generate(id2)
        id: all characters numeric; id2 generated as double

        I think id is a string not a numeric character because of "egen str id = concat( )"
        It creates a string, so we need to convert it into a numeric by using a command 'destring'.
        Do I misunderstand about this?


        Anyway, the last digit still different when it is converted into a numeric.
        Generally it happens to the high number of suid. I don't know why.
        That's why I cannot identify individual uniquely.

        I wanted to attach small size of data file to my post, but it didn't work.
        If you don't mind can I send the data to you?

        Thanks for your concerns and comments.
        David.
        Last edited by david krupp; 14 Aug 2017, 02:00.

        Comment


        • #5
          With the large integers, you're reaching the limits of double precision to represent them. My suggestion is to keep the integers smaller. Try something like the following.
          Code:
          egen str id = concat(suid addid entry pnum), format(%25.0g)
          tempfile temporary_file
          quietly save `temporary_file'
          contract id
          drop _freq
          generate double id2 = _n
          merge 1:m id using `temporary_file', assert(match) nogenerate noreport
          Now you have a series of numerical id values (id2) that fits into a double (probably even into a float).

          You could try
          Code:
          egen double id2 = groups(id)
          too.

          As to

          id: all characters numeric; id2 generated as double

          it's not a comment. It's the output reported by the destring command. It means just what it says, namely, that all of the characters in the string variable id are numerals and so id can be converted to numbers. It does not mean that the variable id is numeric.
          Last edited by Joseph Coveney; 14 Aug 2017, 02:45.

          Comment


          • #6
            Thanks Mr.Joseph Coveney for your valuable comments.

            I think that you got a point of this problem, which is the limits of double precision.
            That's why when suid is longer, the precision declined.

            First, I am very sorry that I didn't read outcomes carefully on the screen.
            Now I find that it is reported output but not your comments.

            Second, I think that your logic on the code above is very right, but I should briefly explain about the data structure because command "id2=_n" assumes that individuals appear only once in the data.

            SIPP 1992 panel data contains 9 waves, interviewing from October 1991 to September 1994.
            Although it is quarterly interviewing data, 4 different groups take interview in different time over the period.
            Therefore, I have monthly observations, and an individual may appear only for one wave(4 months) or maximum 9 waves(36 months) repeatedly.
            Those four months in a wave are called reference month encoding 1 to 4.

            4 reference months in a wave.
            9 waves in the SIPP 1992 panel.

            Although I keep samples from only one reference month per wave, many individuals appear once or more in the data for different waves.
            That's why it is called panel data not cross-sectional data.

            Therefore when an individual appears more than once, I think the command "id2=_n" cannot distinguish individuals uniquely, but give different id to the same person.
            If I still have misunderstanding, please enlighten me.

            I really appreciate your help.
            Please keep help me to solve the problem of declaring 'xtset'.

            Thanks for your valuable help.
            David.

            Comment


            • #7
              After
              Code:
              contract id
              each id will appear only once in the dataset that remains in memory.

              But first, try
              Code:
              egen str id = concat(suid addid entry pnum), format(%25.0g)
              egen double id2 = group(id)
              and see whether that works for you.

              Comment


              • #8
                Mr.Joseph Coveney,

                I have looked inside carefully Data Browse after run your code above.
                As your advice, after run 'contract id' each id appeared only once and frequency confirmed the number of appearances.
                And I run the second code right above 'egen double id2=group(id)', then I found it reduced one step than the code before suggested.

                Moreover, I have succeeded in declaring 'xtset' by using id2 made by your code.
                Thank you so much for your help.

                If you don't mind may I give you more questions?

                As I mentioned above, the structure of the SIPP 1992 panel like following;
                4 reference months in a wave.
                9 waves in the SIPP 1992 panel.
                Therefore, a variable refmth(reference month) has ranges 1 to 4, and a variable wave has ranges 1 to 9.

                1. To avoid recall bias or the error of repeated time values within panel, I took sample from only 1st reference month per wave.
                So, I have one reference month with 9 waves, under which I have succeeded in declaring 'xtset' by using id2 made from your code.
                If I want to use sample from all four reference months and nine waves, how can I generate an unique time variable?

                2. In addition to the question 1, if I want to use SIPP 1992 panel and SIPP 1993 panel together by merging two data.
                how can I generate an unique time variable? SIPP 1993 panel also has 4 reference months in a wave and 9 waves total.

                Thanks for your help and valuable comments on my problem.
                David.

                Comment


                • #9
                  Mr.Joseph Covenery,

                  I just checked whether declaring 'xtset' does work or not by using 'id2' made by your code and time variable that I made before.
                  I simply combined year, wave, and month.
                  I have succeeded in declaring 'xtset' for multiple reference months over many waves and years.

                  Thanks for sharing your knowledge and time for fresh users like me.
                  Have a good day.
                  David.

                  Comment


                  • #10
                    Glad to hear that you worked things out.

                    As an aside,you could have made a short-cut to the id2 by specifying the components directly in the egen . . . group() command and bypassed creating the string id variable as the intermediate step.
                    Code:
                    egen long id2 = group(suid addid entry pnum)
                    And if, for whatever reason, you wanted to have the string ID variable, anyway, (for example, for visual inspection of listings), then I recommend placing some kind of delimiter in it for improved readability. The punct() option of egen . . . concat() allows this.

                    .ÿinputÿlongÿsuidÿintÿ(addidÿentryÿpnum)

                    ÿÿÿÿÿÿÿÿÿÿÿÿÿsuidÿÿÿÿÿaddidÿÿÿÿÿentryÿÿÿÿÿÿpnum
                    ÿÿ1.ÿ1074969ÿ11ÿ11ÿ101
                    ÿÿ2.ÿ1074969ÿ11ÿ11ÿ102
                    ÿÿ3.ÿ1074969ÿ11ÿ11ÿ103
                    ÿÿ4.ÿend

                    .ÿ
                    .ÿegenÿstrÿidÿ=ÿconcat(suidÿaddidÿentryÿpnum),ÿformat(%9.0g)ÿpunct(-)

                    .ÿegenÿlongÿid2ÿ=ÿgroup(suidÿaddidÿentryÿpnum)

                    .ÿ
                    .ÿlist,ÿnoobs

                    ÿÿ+----------------------------------------------------------+
                    ÿÿ|ÿÿÿÿsuidÿÿÿaddidÿÿÿentryÿÿÿpnumÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿidÿÿÿid2ÿ|
                    ÿÿ|----------------------------------------------------------|
                    ÿÿ|ÿ1074969ÿÿÿÿÿÿ11ÿÿÿÿÿÿ11ÿÿÿÿ101ÿÿÿ1074969-11-11-101ÿÿÿÿÿ1ÿ|
                    ÿÿ|ÿ1074969ÿÿÿÿÿÿ11ÿÿÿÿÿÿ11ÿÿÿÿ102ÿÿÿ1074969-11-11-102ÿÿÿÿÿ2ÿ|
                    ÿÿ|ÿ1074969ÿÿÿÿÿÿ11ÿÿÿÿÿÿ11ÿÿÿÿ103ÿÿÿ1074969-11-11-103ÿÿÿÿÿ3ÿ|
                    ÿÿ+----------------------------------------------------------+

                    Comment


                    • #11
                      Mr.Joseph Covenery,

                      Thanks for your further valuable suggestions.

                      I have learned two important things from you.

                      1. difference between "gen newvar" and "gen double newvar".
                      According to your comments, I should generate 'double' from the beginning because I cannot first generate a 'float' and then recast it to a 'double' in order to recover precision that I have already lost.

                      2. As length of number is long, I will lose precision although type of data is 'double' due to the limits.
                      With your code and logic, I can generate a new unique identifier for individuals.
                      I had no idea that I could generate a new ID.

                      With your help, I could have succeed in declaring 'xtset' for the complex panel data.
                      If I encounter another problem while running the regression, I will not doubt you will help me.


                      Thanks for your help. Have a good day.
                      David.

                      Comment


                      • #12
                        Mr.Joseph Coveney,

                        How are you today?

                        egen str id = concat(panel suid addid entry pnum), format(%30.0g) punct(-)

                        Above is your code, which is very good to me.
                        I just added 'panel' because I use more than one data set.
                        Without 'panel', I found same 'suid' for different persons.
                        Anyway, it did work very well.

                        May I have a question about 'foreach', which is for loop over items.

                        I merged SIPP panel 1992 and SIPP panel 1993.
                        Each panel contains 9 waves, respectively.
                        And each wave contains 4 interview months, but some observations appear less than 4 interview months.
                        It means that the person didn't complete interview within a wave.
                        So, I want to keep individuals who completed interview at least one wave, regardless how many they took waves.
                        A variable wave is ranging 1 to 9.
                        A variable refmth (reference month) is ranging 1 to 4, regardless name of wave; whether it is wave1 or wave2.
                        Therefore, summation from 1 to 4 = 10 for each wave if an individual completed interview for that wave.
                        In this way, I want to screen observations, but it looks like very space consuming.
                        I tried to reduce following commands by using 'foreach', but I have not figured out yet.
                        First one is for panel 1992 and second one is for panel 1993.

                        I wonder whether you have an idea of converting them into 'foreach'?

                        Thanks.
                        David.

                        // sample appears in every wave in panel 1992 //

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==1
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==2
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==3
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==4
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==5
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==6
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==7
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==8
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1992 & wave==9
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1


                        // sample appears in every wave in panel 1993 //

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==1
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==2
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==3
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==4
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==5
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==6
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==7
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==8
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        bysort id2: egen sum1=sum(refmth) if panel==1993 & wave==9
                        tab sum1
                        drop if sum1<10 // sum of 4 reference months is 10
                        drop sum1

                        Comment

                        Working...
                        X