Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unique identifier for observation

    Hi everyone,

    need help regarding generating a new variable caseid for my data set has variable hhid such that new variable caseid would be like 1, 2, 3 provided variable hhid has some repeated observation.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long hhid
    1210103088
    1210103088
    1210105069
    1210105069
    1210105070
    1210105070
    1210109083
    1210109083
    1210109087
    1210109087
    1210109090
    1210109090
    1210109100
    1210109100
    1210110069
    1210110069
    1210112079
    1210112079
    1210203073
    1210203076
    1210203078
    1210205049
    1210205049
    1210205051
    1210205051
    1210209070
    1210210059
    1210210059
    1210210065
    1210212064
    1210212064
    1220101001
    1220101001
    1220101002
    1220101002
    1220101003
    1220101003
    1220101007
    1220101007
    1220102001
    1220102001
    1220102002
    1220102002
    1220102004
    1220102004
    1220103001
    1220103001
    1220103002
    1220103002
    1220103003
    1220103003
    1220104001
    1220104001
    1220104002
    1220104003
    1220105001
    1220105001
    1220105002
    1220105002
    1220105004
    1220106004
    1220106005
    1220106005
    1220106006
    1220106006
    1220106007
    1220106007
    1220107001
    1220107001
    1220107002
    1220107003
    1220107003
    1220107004
    1220107004
    1220107005
    1220107005
    1220201019
    1220201019
    1220201039
    1220202014
    1220202014
    1220204028
    1220204031
    1220204039
    1220204039
    1220204041
    1220204041
    1220206027
    1220206031
    1220206031
    1220206033
    1220206033
    1220206036
    1220208025
    1220208025
    1220208043
    1220212025
    1220212025
    1220212029
    1220212029
    end
    [/CODE]


    The expected output would be
    hhid caseid
    1210103088 1
    1210103088 1
    1210105069 2
    1210105069 2
    1210105070 3
    1210105070 3
    1210109083 4
    1210109083 4
    1210109087 5
    1210109087 5
    1210109090 6
    1210109090 6
    1210203073 7
    The last one has only observation.


    Thanks

    Ashish

  • #2
    Ashish:
    do you mean something along the folowing lines?
    Code:
    . gen count=1
    
    . collapse (count) count, by(hhid)
    
    . list in 1/10
    
         +--------------------+
         |       hhid   count |
         |--------------------|
      1. | 1210103088       2 |
      2. | 1210105069       2 |
      3. | 1210105070       2 |
      4. | 1210109083       2 |
      5. | 1210109087       2 |
         |--------------------|
      6. | 1210109090       2 |
      7. | 1210109100       2 |
      8. | 1210110069       2 |
      9. | 1210112079       2 |
     10. | 1210203073       1 |
         +--------------------+
    
    .
    It what above is the way to go, save a copy of your orignal dataset before invoking -collapse-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      See FAQ https://www.stata.com/support/faqs/d...p-identifiers/

      Code:
      egen caseid = group(hhid)

      Comment


      • #4
        Thank you both

        Comment

        Working...
        X