Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating percentages for use in Markstat

    Hello all,

    I am new to Stata and I'm having some trouble working out the easiest way to display percentages in a markstat report.

    I have to prepare a report using markstat which requires me to list simple percentages of many variables. I have a dataset with around 50,000 observations and 20 or so variables, which report on things like ethnicity, health status, location, gender etc. Most of these variables have around 4 or 5 possible categories.

    In the report there are around 50 or so instances of stats like "x percent of the students are female", "x percent of the students have a health condition", and so on. Ideally I would like to be able to calculate these numbers dynamically, so they can go into a markstat document inline, using a single line of code if possible. I'm looking for something like

    Code:
    count if gender == "female" / _N
    but that gives me a type mismatch error. I know that the count command will store a result as r(N), so I could do something like

    Code:
    count if gender == "female"
    display r(N) / _N
    but that's two lines of code which is a bit unwieldy to use inline with markstat. There are also a few occasions where I'll need two different values of r(N) and I'm not sure how to do that.

    Any help would be much appreciated.

    Regards,
    Tex

  • #2
    See also https://www.reddit.com/r/stata/comme...te_percentage/ (I post on Reddit very occasionally)

    https://www.statalist.org/forums/help#crossposting

    I don't have any easy answers here. With this approach, all the pain is in the coding and all the gain comes later. In essence, each bit is really a two-step, to calculate a number and to say how it should be reported -- including how many decimal places are shown. a key point not raised here but discussed in the Reddit thread.

    The good news is that many of your commands will be very similar, so that can be a lot of copy and paste in preparing the file.

    If I were doing this for myself I might write a helper command but it's hard to know quite how general or how flexible that should be. As you are a learner -- we all are, really -- I guess that someone's else command that did some of what you wanted or did it awkwardly is not the best idea for you. You're better off focusing on what can be done with

    count
    summarize
    display


    Some of your percentages will be easiest with taking means over an indicator (dummy) variable.
    Last edited by Nick Cox; 13 Jun 2022, 01:43.

    Comment


    • #3
      One way I thought I might be able to get around would be to use putexcel to make a sort of helper file that would store all of the 50 odd percentages I will need in the report, then convert that excel file to a stata file and just read from that using variable_name[_n].

      However, I can't get stata to store certain results using putexcel. For example, if I want to store _N, that works fine, using for example
      Code:
      putexcel A2 = _N
      But when I try and do that with a count, using something like the following:
      Code:
      count if gender == "Male"
      putexcel B2 = r(N)
      it gives me an 'r not found' error. But if I just write 'display r(N)' it returns the number of males in the dataset, so it seems to be storing something. What am I doing wrong?

      UPDATE: Sorry, should have googled this a bit more, using
      Code:
      putexcel B2 = `r(N)'
      works just fine.
      Last edited by Tex Stevens; 13 Jun 2022, 06:10.

      Comment

      Working...
      X