Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I convert variable frequency to a new variable that represents frequency range?

    I am using Stata 14.2

    I have a categorical variable district_name and another categorical variable store_name in each district. I encoded district to get the number of stores in each district (encode dist_name, gen(dist_n)). Now my new variable looks like this:

    Dist_n Freq (no of store names)
    DistA 10
    DistB 1200
    DistC 450
    DistD 80
    DistE 690

    Is it possible for me to generate a new variable that represents a range of the frequency of stores in districts. Ex.

    District_size N
    0 to 99 stores 2
    99 to 999 stores 2
    1000+ stores 1
    Last edited by Johny Daniel; 09 May 2017, 10:19.

  • #2
    Try this:
    Code:
    gen size_group = 1 if freq <= 99
    replace size_group = 2 if inrange(freq, 99, 999)
    replace size_group = 3 if freq >= 1000 & !missing(freq)
    label define size_group 1 "0 to 99" 2 "99 to 999" 3 "1000+"
    label values size_group size_group
    to create a variable that characterizes each district's number of stores according to your scheme. I'm not sure what your final example represents, but perhaps you want to do
    Code:
    tab size_group

    Comment


    • #3
      I am still a newbie and maybe represented the data wrong. This is how my data looks like before encoding:

      Code: tab district

      dist Freq. Percent Cum.

      DistA 10,686 1.02 1.02
      Dist B 10,510 1.01 2.03
      Dist C 10,375 0.99 3.02
      Dist D 10,259 0.98 4.00

      After encoding (encode district, gen(dist_n))
      Code: tab dist_n

      dist_n Freq. Percent Cum.

      1 10,686 1.02 1.02
      2 10,510 1.01 2.03
      3 10,375 0.99 3.02
      4 10,259 0.98 4.00
      and so on..

      When I use:
      gen size_group = 1 if freq <= 99 (I get an error r(111) freq not found)
      I may be wrong but shouldn't the code be gen size_group = 1 if dist_n <= 99

      But the problem with this is that it takes the values (1,2,3) which have no meaning. They don't generate a new variable with only frequency. My goal is to get a range of the number of stores in each district (which is my outcome variable) and tabulate it with other variables such as district education level, etc. So, I want to create a table that would look like:
      Dist size Freq
      0 to 1000 110
      1001 to 2000 210
      2001 to 3000 150
      3000+ 220
      (110 districts with 0 to 1000 registered stores) (210 districts with 1001 to 2000 registered stores) and so on
      Last edited by Johny Daniel; 09 May 2017, 13:30.

      Comment


      • #4
        I may be wrong but shouldn't the code be gen size_group = 1 if dist_n <= 99
        Well, at this point I'm pretty confused about what your data set looks like, so I'm hesitant to comment. But as far as I can figure out from what you've shown, dist_n is just a consecutive sequence number that runs from 1 through however many districts you have in your data set in alphabetic order. So what you propose here would classify the districts based on where they ranked in alphabetic order of their names, not by the number of stores they contain.

        If your data set contains a variable for the name of the district and another variable with the number of stores, then the code I showed in #2 will work if you simply replace -freq- by the actual name of the variable with the number of stores everywhere it appears. I had misunderstood what you showed in #1 to imply that you had a variable named freq (or maybe Freq) that contains the number of stores for each district.

        If you don't have any such variable, then you are starting from someplace different than I imagined. In that case, I suggest you post back and include an example of your data and an explanation of what the variables in it mean. Be sure to use the -dataex- command to do that. Run -ssc install dataex- to get the -dataex- command (if you don't already have it), and then run -help dataex- to read the simple instructions for using it.

        Comment


        • #5
          First, Clyde thanks for taking the time to help me with my stata concern. This is a sample example of my data:
          District Shop Total Employ
          A1 X1 80
          A1 X1 90
          A1 X1 150
          A1 Y1 90
          A1 Y1 55
          A1 Y1 72
          A2 X11 19
          A2 X11 13
          A2 X12 88
          A2 X12 213
          A2 X13 345
          A2 X13 44
          A2 X13 79
          A2 X14 333
          As shown, District A1 has a total of 6 stores and District A2 has 8 stores. There can be more than one of the same store in the district too, for example X1 has three stores in district A1.

          So, I want to create a new variable district_size. Where the value (not the frequency) of each district is the number of stores in that district. For example,
          District No of stores
          A1 6
          A2 8

          Comment


          • #6
            Code:
            by district, sort: gen size = _N
            Your example data was posted in a way that is easy to read by human eyes; and in this case that is all that was needed. But it would have been cumbersome to import into Stata had it been necessary to try out and test some code. As requested previously, in the future always use the -dataex- command to post example data so that those who want to help you can easily create a faithful replica of your Stata example with just a simple copy and paste operation. Doing so will increase your chances of getting a timely helpful response.

            Comment


            • #7
              Clyde gives excellent advice as always. But it occurs to me that Johny may just be thinking about "data" in a way that Stata doesn't.

              Code:
              tab District
              would show the number of stores. Perhaps that is what is wanted.

              Comment


              • #8
                Clyde thank you so much for the solution. That works for me . And in the future I will take your advice to post using dataex.

                Nick, thanks for your comment. But the tab district does not allow me to break districts into certain groups, e.g. district with 1 to 100 stores, 101 to 1000 stores, etc.

                Comment


                • #9
                  You're changing the question back and forth, which is both OK and confusing. My answer in #7 is a solution to #5.

                  If you want tabulate to reflect a classification, then indeed the classification must exist beforehand. Clyde's answers in #2 and #6 already answer that, but in reverse order given the lack of clarity about your data. You create a frequency variable first, then reduce it to a classification.

                  Given a data example as requested in #6 I would be happy to demonstrate.

                  Comment


                  • #10
                    Nick I want to give you an example but I am unable to get dataex on my stata.

                    ssc install dataex
                    checking dataex consistency and verifying not already installed...
                    cannot write in directory D:\StataAdo\ado\plus\d


                    I get this error when trying to do so.

                    Comment


                    • #11
                      That's a matter for your local IT support to explain why that does not work.

                      Comment

                      Working...
                      X