No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graph bar with custom labels

    Dear Statalist

    I have been struggling with the following issue for quite a while now. Unfortunately I am not able to display any dataex as my data is found on a remote desktop at my workplace. Hopefully my description will make do - my apologies!

    And a disclaimer: my issue relates to this question, but I have not been able to grasp exactly how I should approach it though, so therefore I took myself the liberty of asking in a new thread.

    Description of my situation
    I have population data on 0 to 24 year olds. Furthermore, I have three relevant variables: one displaying which age-bracket the individual belongs to (0-4; 5-9; 10-14; 15-19; 20-24) and another categorical variable displaying which of 11 specific illnesses - if any - an individual might have had. An individual might appear more than once (and app. 50% do that) if they have had two or more illnesses. Lastly, I have a variable indicating the individuals gender. All three variables are strings at the moment but can be changed accordingly.

    I am trying to create 11 bar graphs - one for each illness - with the percentage of each sub-group who have had the specific illness on the Y-axis, split into both age-brackets and gender on the X-axis. So for example: the first two bars should reflect the percentage of 0-4 year old boys and girls, respectively, who have had the illness. The next two bars should reflect the percentage of 5-9 year old boys and girls with the same illness, etc. To be more exact: my population consists of almost 2,5 million unique id's but due to the possibility of reappearing, my dataset is almost 4,0 million observations. The aforementioned percentage should only reflect the share of unique individuals.

    As labels, I would like to have the number of individuals who have had the illness and this is where I run into trouble. I am able to create the graphs using -graph bar-, but it is not possible to use N as custom labels. In the aforementioned former question on Statalist, it is argued that -collapse- and -twoway- can be used. If that is applicable in my situation as well, I could really use a description of how to approach it.

    Thany you!
    Last edited by Anders Green; 17 Mar 2019, 04:46.

  • #2
    Extracts from the FAQ Advice:

    12.2 What to say about your data

    We can understand your dataset only to the extent that you explain it clearly.

    The best way to explain it is to show an example. The community-contributed command dataex makes it easy to give simple example datasets in postings. [...]

    If your dataset is confidential [or -- we perhaps need to spell this out -- otherwise difficult or impossible to post], then provide a fake example instead.

    The second best way to explain your situation is to use one of Stata's own datasets and adapt it to your problem. Examples are the auto data and the Grunfeld data (a simple panel dataset). That may be more work for you and you may not find an analog of your problem with such a dataset.

    The worst way to explain your situation is to describe your data vaguely without a concrete example. Note that it doesn't help us much even to be given your variable names. Often that leaves unclear both your data structure and whether variables are numeric or string or their exact contents. If you explain only vaguely, quick answers to your question, or even any answers at all, are less likely.


    • #3
      Dear mr. Cox

      Thank you for the quick reply. I see that the 'best way' is what I thought of myself, but unfortunately I have just recently bought a new computer and I do not have Stata installed yet which means that I am not able to produce even a pseudo-example. Therefore, the 'second best way' is not a possibility either. I am sorry for the inconvenience but nonetheless hope that someone out there is able to see through the example's ambiguity and help me.

      All the best


      • #4
        I see. One implication is that any Stata code will be no use to you today, so why not post an example tomorrow? If someone else puzzles this out sooner, that's more than fine by me.

        The principle of division of labour is important here! If you want someone's help, make the question really good, so the answer is ideally easy for someone who is more fluent on coding.


        • #5
          I agree completely with your point-of-view in relation to the division of labour - and in the future I will try my utmost to heighten the quality of my question so as to demand less and provide more myself.

          I am a university student and the university's IT-department issues the licenses for our personal computeres which easily takes a couple of weeks; I am on the other hand able to access the remote desktop all the time (just not export anything on my own), so - actually - Stata code will be much appreciated. I'll keep on trying myself and keep my fingers crossed that someone puzzles it out.


          • #6
            So, you can make up an example with simplified data using dataex and copy and paste that into Statalist.


            • #7
              We do not have internet access on the remote desktop, so unfortunately not - all these messages are written from my personal computer at which I do not have Stata-access - otherwise that would have been my first choice for sure.

              Sorry for the obscurity.


              • #8
                I am lost here, but good will remains. If you post a good example tomorrow, I will look at it. If someone else gets there earlier, as said, more than fine.