Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I am analyzing data from 30 districts, focusing on 8 questions (q01 to q08). Here is the dataset I am working on,

    district q01 q02 q03 q04 q05 q06 q07 q08 population
    District 1 0 1 0 0 0 1 0 0 892
    District 2 0 0 0 1 0 1 0 0 138
    District 3 0 1 0 1 0 0 0 1 923
    District 4 1 1 0 1 1 0 0 0 887
    District 5 0 1 1 0 1 1 0 0 514
    District 6 1 0 1 1 1 0 1 1 578
    District 7 0 0 0 1 1 0 1 0 393
    District 8 1 0 0 1 0 1 0 0 566
    District 9 0 0 0 0 1 0 0 1 514
    District 10 1 0 0 0 0 0 0 1 770
    District 11 0 0 0 0 1 1 1 1 207
    District 12 1 1 0 1 0 0 0 1 625
    District 13 0 1 0 0 0 1 1 0 550
    District 14 1 0 1 1 0 0 0 0 596
    District 15 1 1 0 0 1 0 0 1 250
    District 16 1 1 1 0 1 1 0 0 481
    District 17 1 1 1 0 1 0 1 0 553
    District 18 1 0 1 1 0 0 1 1 652
    District 19 1 0 0 0 0 1 0 0 503
    District 20 0 1 1 1 0 0 0 1 234
    District 21 1 1 0 1 0 1 0 1 883
    District 22 1 1 0 0 0 1 0 0 344
    District 23 1 1 0 0 1 0 0 0 238
    District 24 1 1 1 1 0 1 0 1 944
    District 25 1 0 1 1 1 0 1 0 940
    District 26 0 0 0 1 0 1 1 0 730
    District 27 0 0 0 0 1 1 0 0 890
    District 28 1 1 1 1 1 0 0 0 916
    District 29 0 0 0 1 0 0 1 1 959
    District 30 0 1 0 1 0 1 0 1 404
    ```stata
    Code
    recode c01 (1=1 "1") (2=0 "0"), generate(binary_c01)
    recode c02 (1=1 "1") (2=0 "0"), generate(binary_c02)
    recode c03 (1=1 "1") (2=0 "0"), generate(binary_c03)
    recode c04 (1=1 "1") (2=0 "0"), generate(binary_c04)
    recode c05 (1=1 "1") (2=0 "0"), generate(binary_c05)
    recode c06 (1=1 "1") (2=0 "0"), generate(binary_c06)
    recode c07 (1=1 "1") (2=0 "0"), generate(binary_c07)
    recode c08 (1=1 "1") (2=0 "0"), generate(binary_c08)
    egen total_sum = rowtotal ( binary_c01 binary_c02 binary_c03 binary_c04 binary_c05 binary_c06 binary_c07 binary_c08)
    gen R_Sum = total_sum / 8
    collapse (mean) R_Sum , by(district)
    gen aggregate_population = R_Sum * Population
    **Normalized R_Sum**
    egen R_Sum_min = min(R_Sum)
    egen R_Sum_max = max(R_Sum)
    gen R_Sum_normalized = (R_Sum - R_Sum_min) / (R_Sum_max - R_Sum_min)
    **Normalized aggregate_population**
    egen aggregate_population_min = min(aggregate_population)
    egen aggregate_population_max = max(aggregate_population)
    gen aggregate_population_normalized = (aggregate_population - aggregate_population_min) / (aggregate_population_max - aggregate_population_min)
    **Listing**
    list district R_Sum aggregate_population R_Sum_normalized aggregate_population_normalized
    End
    The code runs without errors, and I observe the following:
    • One district has a normalized aggregate population value of 1.
    • One district has a normalized aggregate population value of 0.
    • The remaining districts have values between 0 and 1 after normalization.
    I understand that normalization should result in values between 0 and 1, but here one district is exactly "1" and the other is exactly "0", also I want to confirm if this is the correct approach for normalizing aggregate population values.

    Any advice or improvements to my approach would be greatly appreciated.

    Thank you for your assistance.
    Last edited by aafaque ali; 13 Jul 2024, 12:28.

  • #2
    There are several different ways to "normalize" a variable. The code you show is a correct implementation of one of them. Moreover, with the particular kind of "normalization" you are using, there will always be at least one with a "normalized" value of 1, specifically any one whose unnormalized value was the maximum, and at least one with a "normalized" value of 0, specifically any one whose unnormalized value was the minimum.

    As an aside, I don't understand why you have the -collapse (mean) R_sum, by(district)- command. Your data has only one observation for each district, so -collapse-ing -by(district)- leaves the value of R_sum unchanged. But -collapse- also has a side effect: any variables not mentioned in it are dropped. So after -collapse-, the only variables left in your data set are R_sum and district. The very next command uses the variable population, which no longer exists. So the code must halt with an error at that point. You said that your code runs without errors, but that cannot be true if this is truly the code you ran.

    In fact, even earlier in the code there are problems. Your -recode- commands all target a variable c0#, but there are no such variables in your data set: I think you mean q0#. Anyway, you can see that what I said in the last paragraph is true, as here is what I get when I run your code after replacing c0# by q0#:
    Code:
    . recode q01 (1=1 "1") (2=0 "0"), generate(binary_c01)
    (0 differences between q01 and binary_c01)
    
    . recode q02 (1=1 "1") (2=0 "0"), generate(binary_c02)
    (0 differences between q02 and binary_c02)
    
    . recode q03 (1=1 "1") (2=0 "0"), generate(binary_c03)
    (0 differences between q03 and binary_c03)
    
    . recode q04 (1=1 "1") (2=0 "0"), generate(binary_c04)
    (0 differences between q04 and binary_c04)
    
    . recode q05 (1=1 "1") (2=0 "0"), generate(binary_c05)
    (0 differences between q05 and binary_c05)
    
    . recode q06 (1=1 "1") (2=0 "0"), generate(binary_c06)
    (0 differences between q06 and binary_c06)
    
    . recode q07 (1=1 "1") (2=0 "0"), generate(binary_c07)
    (0 differences between q07 and binary_c07)
    
    . recode q08 (1=1 "1") (2=0 "0"), generate(binary_c08)
    (0 differences between q08 and binary_c08)
    
    . egen total_sum = rowtotal ( binary_c01 binary_c02 binary_c03 binary_c04 binary_c05 binary_c06 binary_c07 binary_c08)
    
    . gen R_Sum = total_sum / 8
    
    . collapse (mean) R_Sum , by(district)
    
    . gen aggregate_population = R_Sum * Population
    Population not found
    r(111);
    And even if -collapse- hadn't eliminated that variable, you still would have had an error at that command, because the name of the variable is population, not Population. Stata variable names are case-sensitive.

    It is very important, when asking for help here, that you show the exact, actual code that you are asking for help with. When you substitute "similar" code, you can, as here, introduce new errors that can prevent those who want to help you from even getting to the problem you are seeking help with. Always show the exact, actual code involved; in code there is no such thing as a "minor" change. The best way to be sure you do that is to copy the code from your do file, log file, or the Results window, to your computer's clipboard and then paste it here.

    Comment


    • #3
      Thank you Clyde Schechter for pointing out mistakes in this post, furthermore, I've updated my query and am waiting for your assistance.

      Comment

      Working...
      X