Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping small communities to avoid small cell problem

    Hello everyone,

    Thank you for your help in advance! I am quite desperate at this moment as this should be rather elementary but I couldn't seem to figure it out...

    I am using stata 15. I have a clustering dataset with household - community - street - district for a multilevel analysis. There are 5482 households in 837 communities clustered in 227 districts.
    I am trying to do the following before the analysis:
    1. group those communities with less than 5 households into one so that each cluster will have at least 5 households;
    2. I would like the households that got grouped together to be in the same street (if the enough household is found in the street) or the same district.
    Here is a glimpse of the dataset:


    Code:
    bysort district: list street community hid
    -> district = 荆州市沙市区

    +-----------------------------------+
    | street community hid |
    |-----------------------------------|
    1. | 中山路街道 梅台社区 6470518 |
    2. | 中山路街道 梅台社区 6470517 |
    3. | 中山路街道 江汉社区 6470214 |
    4. | 中山路街道 健康社区 6470118 |
    5. | 中山路街道 江汉社区 6470215 |
    |-----------------------------------|
    6. | 解放路 九曲桥社区 6480515 |
    7. | 中山路街道 梅台社区 6470515 |
    8. | 解放路 武德社区 6480315 |
    9. | 中山路街道 文化坊社区 6470416 |
    10. | 中山路 黄家塘社区 6470316 |
    |-----------------------------------|
    11. | 中山路街道 江汉社区 6470217 |
    12. | 中山路街道 江汉社区 6470219 |
    13. | 中山路街道 文化坊社区 6470414 |
    14. | 中山路街道 健康社区 6470117 |
    15. | 解放路 十方庵社区 6480216 |
    |-----------------------------------|
    16. | 中山路街道 江汉社区 6470216 |
    17. | 解放据街道 十方庵社区 6480214 |
    18. | 解放路 九曲桥社区 6480514 |
    19. | 中山路街道 梅台社区 6470516 |
    20. | 中山路街道 健康社区 6470115 |
    |-----------------------------------|
    21. | 中山路街道 文化坊社区 6470415 |
    22. | 中山路街道 健康社区 6470116 |
    +-----------------------------------+

    -------------------------------------------------------------------------------------
    -> district = 荆州市沙市县

    +-----------------------------------+
    | street community hid |
    |-----------------------------------|
    1. | 解放路 九曲桥社区 6480516 |
    2. | 解放路 北湖社区 6480116 |
    3. | 解放路街道 武德社区 6480317 |
    4. | 解放路 武德社区 6480316 |
    5. | 中山路 黄家塘社区 6470314 |
    |-----------------------------------|
    6. | 解放路 九曲桥社区 6480517 |
    7. | 解放路 武德社区 6480314 |
    8. | 中山路 黄家塘社区 6470317 |
    9. | 中山路 江汉社区 6470218 |
    10. | 中山路 健康社区 6470119 |
    +-----------------------------------+

    -------------------------------------------------------------------------------------
    -> district = 荆州市监利

    +------------------------------+
    | street commun~y hid |
    |------------------------------|
    1. | 容城 团结 6611106 |
    2. | 客城镇 茶庵社区 66110514 |
    +------------------------------+


    I tried to use
    Code:
    egen tag = tag(community hid)
    bysort community:  egen N_comm = total(tag)  //number of Households in each community
    
    /*egen tag2 = tag(street hid)
    bysort street:  egen N_street = total(tag2)  //number of observations in each street
    egen aggcomm = group(street community) if N_comm < 5  //group communities by street when N_comm<5
    */
    
    egen tag2 = tag(street community)
    bysort street:  egen N_street = total(tag2)  //number of communities in each street
    egen aggcomm = group(street community) if N_comm < 5  //group communities by street when N_comm<5
    
    gen neighborhood = .     //neighborhood ID
    replace neighborhood = community if N_comm>=5
    replace neighborhood = aggcomm if N_comm<5  
    fre neighborhood  
    
    egen tag4 = tag(neighborhood hid)
    bysort neighborhood:  egen N = total(tag4)
    But this doesnt seem right. The group function I used here creates unique combinations but not the clusters I wanted..

    I hope the explanation makes sense...Let me know if I could clarify anything!!!

    Thank you again!!
    Last edited by Yunxi Yue; 20 Jan 2022, 15:10.

  • #2
    do you have latitute/longitude data, or a shape file? If so, you can group by adjacent communities.

    Comment


    • #3
      Originally posted by George Ford View Post
      do you have latitute/longitude data, or a shape file? If so, you can group by adjacent communities.
      Unfortunately I do not...

      Comment


      • #4
        egen = tag just marks one observation per group. Not you want, me thinks.

        use dataex to produce a sample of your date that has the meaningful parts in it (some trouble areas).

        streets might cross multiple communities. that might be a problem (picking which community you link to).
        and, there may be multiple streets in a community, so then you have to pick which street of those to group on.


        It might be useful to see the results of this:
        Code:
        bys community: egen N_comm = total(community)  //count of observations by community
        tab community if N_comm<5  //how many communities are a problem?
        tab street if N_comm<5   //which streets are in these problem communities
        If there are few problems, then it may be sensible to replace the community variable for these on an individual basis. gen community_fix = community to keep the original data intact.







        Comment

        Working...
        X