Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate the frequency of one variable occurring in the other?

    Hello everyone,

    I want to calculate the percentage of migrants in each district. I tried the codes below but I keep getting error messages "invalid 'by'". Thank you for the help.
    Code:
    egen iv = total (migrant), if migrant => 1, by( district)
    egen iv = total (migrant), if migrant >= 1, by( district)
    egen iv = total (migrant), if migrant >= 1, by(district)

    Code:
    tab district migrant
    
               |      migrant
        district |         0          1 |     Total
    -----------+----------------------+----------
             1 |        65         79 |       144 
             2 |        72         60 |       132 
             3 |        24         24 |        48 
             4 |        15          9 |        24 
             5 |        50         22 |        72 
             6 |        60         84 |       144 
             7 |        46         38 |        84 
             8 |        58         86 |       144 
             9 |        49         47 |        96 
            10 |        57         39 |        96 
            11 |        28         20 |        48 
            12 |        27         21 |        48 
            13 |        92         76 |       168 
            14 |        16          8 |        24 
            15 |        18          6 |        24 
            16 |        53         43 |        96 
            17 |        70         74 |       144 
            18 |        26         22 |        48 
            19 |        31         17 |        48 
            20 |        62         34 |        96 
            21 |        51         45 |        96 
            22 |        46         50 |        96 
    -----------+----------------------+----------
         Total |     1,016        904 |     1,920

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte district float migrant
    1 0
    6 0
    1 1
    1 1
    6 0
    1 1
    6 1
    1 1
    6 1
    6 1
    1 1
    6 1
    1 1
    1 0
    6 0
    6 1
    6 1
    1 1
    6 1
    1 0
    6 1
    1 1
    6 0
    1 0
    6 0
    6 0
    6 0
    6 1
    6 1
    6 1
    6 0
    6 0
    6 0
    6 1
    1 1
    6 0
    1 0
    6 1
    1 0
    6 1
    1 1
    6 0
    1 1
    6 1
    6 1
    1 1
    1 1
    6 0
    1 1
    6 1
    1 0
    6 1
    1 0
    6 0
    1 0
    6 0
    6 0
    1 1
    6 1
    6 0
    6 0
    6 1
    6 1
    6 1
    6 1
    6 0
    6 0
    6 1
    6 1
    1 1
    6 1
    1 1
    6 1
    1 1
    6 0
    6 1
    1 1
    6 1
    1 1
    6 1
    1 0
    6 1
    6 0
    1 0
    1 0
    1 1
    1 1
    1 1
    1 1
    1 1
    1 1
    1 0
    1 0
    1 0
    1 0
    1 0
    1 1
    1 0
    1 0
    1 0
    end

  • #2
    Code:
    sort district
    egen iv = total(migrant>=1) , by( district)

    Comment


    • #3
      That worked! Thank you so much!

      Comment


      • #4
        Since you're trying to get percentages, I am confused why you're trying to get the counts. Why not just get the percentages?

        Code:
        egen frac_immg = mean(migrant) if !missing(migrant), by(district)
        gen perc_immg = frac_immg * 100
        And just a very broad stroke comment, Stata grammar is usually a command followed by the required materials, then a comma "," and the followed by other options. If you are using two or more commas in the first level of the syntax, it's usually a mistake. For example, "if" is actually used before the comma, not after. And you added another comma before if, so it did not work.

        Also, I will not use (migrant >= 1) because if there are missing data in migrant (coded as "."), that case will be counted as a migrant.

        Comment


        • #5
          Thank you Ken for the advice and clarification!! I am going to use the command you provided. It was more accurate for me to use it in percentage as the results I got were similar to the results using other methods. I appreciate your help!!
          Last edited by Naika Sangroo; 10 Nov 2021, 17:33.

          Comment


          • #6
            Come to that


            Code:
             
             egen frac_immg = mean(100 * migrant), by(district)
            gets you there in one as mean() accepts expressions and ignores missings any way.

            Comment


            • #7
              Thank you Nick!

              Comment

              Working...
              X