Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loop to create new variable based on cross tabulation ouputs

    Dear Statalist
    I am trying to come out with some statistics but i am facing some issues :
    I have two variables : wealth index (count) in five categories (WI)
    and region (REG) which has 13 levels


    I would like to have the proportion of each wealth index level in each region: first by this script below i have the total population per region to use as denominator

    forvalues i =1/13 {
    egen w`i'= count(WI) if REG==`i', by(REG)
    }

    and now i would like to have the population of each wealth index level in each region, to use as numerator to have the proportion of each wealth index level per region, but my script below don't give me what i am expecting

    forvalues i =1/5 {
    forvalues j =1/13 {
    egen wi`i'_`j'= count(WI) if REG==`j' & WI==`j' , by(REG)
    }
    }

    I will appreciate any support to overcome this issue!
    best!



  • #2
    Without seeing a sample of your data it will be difficult to offer concrete advice. For example, it isn't clear from your post whether each observation in your data set corresponds to a person, or to a region, or even to something else. In addition, "don't give me what I am expecting" is not very informative--it tells us neither what you got nor what you were expecting.

    Please post a representative small excerpt of your data using the -dataex- command (-ssc install dataex-; instructions are at -help dataex-), and then show what the results for that example should look like.

    Comment


    • #3
      Dear Clyde,
      Thanks for the response. more precision, each observation corresponds to a person. sorry, for "don't give me what I am expecting" i wanted to say that I have the wealth index quintiles for the whole country and I would like to get the quintiles (number of person by each quintiles of wealth index in each of the 13 regions in the country) and my script did not.

      I follow yours instructions and here is an example. I hope it is helpful.
      Best regards!


      Code:
      * Example generated by -dataex-. To install: ssc    install    dataex
      clear
      input byte(REG WI)
      1 2
      1 2
      1 3
      1 2
      1 3
      1 4
      1 2
      1 4
      1 3
      1 4
      1 4
      1 4
      1 2
      1 3
      1 3
      1 2
      1 3
      1 2
      1 1
      1 2
      1 4
      1 4
      1 5
      1 4
      1 3
      1 2
      1 5
      1 5
      1 3
      1 3
      1 2
      1 5
      1 4
      1 4
      1 5
      1 5
      1 3
      1 5
      1 4
      1 4
      1 1
      1 3
      1 3
      1 5
      1 3
      1 5
      1 3
      1 4
      1 3
      1 4
      1 4
      1 2
      1 2
      1 1
      1 4
      1 3
      1 3
      1 2
      1 2
      1 2
      1 4
      1 2
      1 2
      1 2
      1 4
      1 2
      1 4
      1 3
      1 4
      1 2
      1 1
      1 1
      1 4
      1 4
      1 5
      1 3
      1 5
      1 3
      1 3
      1 4
      1 4
      1 1
      1 4
      1 3
      1 2
      1 2
      1 2
      1 3
      1 2
      1 2
      1 4
      1 3
      1 2
      1 2
      1 4
      1 4
      1 3
      1 4
      1 2
      1 4
      end
      label values REG WI
      label def REG 1 "BMH", modify
      label values WI WI
      label def WI 1 "Poorest", modify
      label def WI 2 "Poorer", modify
      label def WI 3 "Middle", modify
      label def WI 4 "Richer", modify
      label def WI 5 "Richest", modify










      Comment


      • #4
        Thanks for posting the example with -dataex-. Now that I understand your data, I think that you can see, at the same time, the population and the proportion of each wealth group in each region by running:

        Code:
        tab REG WI, row
        If you need to have these same statistics as Stata variables in your data set, not just see them in the Results window, you can do this:
        Code:
        by REG WI, sort: gen count_wi_in_reg = _N
        by REG: gen count_all_in_reg = _N
        gen prop_wi_in_reg = count_wi_in_reg/count_all_in_reg

        By the way, I think there is an error in your data set. You have the variable REG labeled with WI. I think you want it labeled with value label REG--that would make more sense.

        Comment


        • #5
          Dear Clyde,
          I have run the script and everything run well and correspond to what i trying to get. Lot and lot of thanks.
          All my best regards!

          Comment


          • #6
            Dear Clyde,
            I would like to have some help on how I could do a clustering analysis on categorical data after a multiple correspondance analysis mostly the clustering of the variables. What I have seen based mostly on the clustering of observations!
            Best and thanks for your support

            Comment

            Working...
            X