Too many variables?

Mark Benjamin

Join Date: Jul 2015

Posts: 4
#1

Too many variables?

07 Jul 2015, 04:22

I'm using small Stata 14 admittedly (for learning purposes) which is limited to 100 variables. However for some reason I can't use metan for anything without an error message at the bottom:

"no room to add more variables
Up to 120 variables are allowed with this version of Stata. Versions are available that
allow up to 32,767 variables.
"

But example datasets like the streptokinase set don't have anything like 100 variables. In fact I created my own meta-analysis data - two trials with binary outcomes and the same message turned up when I ran metan.

I also don't get a forest plot after the typical metan table. Not sure if this is somehow related.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35725
#2

07 Jul 2015, 04:29

Many programs create temporary variables to hold results for calculation and/or graphics. Unfortunately they can't do that if it would mean breaking the limits on the number of variables in your Stata. You may be able to solve this problem by dropping variables that are not of interest to a particular calculation and then reading in the original dataset again when you are done.

Code:

help drop '
Comment
Mark Benjamin

Join Date: Jul 2015

Posts: 4
#3

07 Jul 2015, 04:55

It adds these:

_SS float %9.0g Sample size
_LCI float %9.0g Lower CI (ES)
_UCI float %9.0g Upper CI (ES)
_WT float %9.0g I-V weight

but that's only four, and I assume if I drop them and run metan again, they would be recreated anyway.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#4

07 Jul 2015, 05:02

I was referring to temporary variables used while the program runs, not new variables created by the program. You can't usually see any evidence of these temporary variables, but they do count when Stata counts variables. If you want to know more, read or skim the chapter of the User's Guide on programming.
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#5

07 Jul 2015, 05:04

Mark,

What Nick is saying is that programs, including metan, create hidden variables while they are running, which the users never sees but nevertheless count towards the total number of variables in your dataset while they (temporarily) exist. You are correct that metan also leaves behind those four variables, but there will be more than that while the program is running. Hence, Nick's advice is to strip your own dataset down to the bare minimum before running metan (or anything else) to leave more room for this. I would be very surprised if metan were creating anything like 100 temporary variables, and you should only need a maximum of 10 or so variables to run metan (depending on precisely what you're doing, of course).
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#6

07 Jul 2015, 05:06

Sorry, Nick's response appeared while I was posting mine!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#7

07 Jul 2015, 05:10

That's OK; yours develops the point very helpfully.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#8

07 Jul 2015, 07:31

Originally posted by Mark Benjamin View Post

"no room to add more variables
Up to 120 variables are allowed with this version of Stata. Versions are available that
allow up to 32,767 variables.
"
.

So is it 120 variables as in this message? or 100 variables as in the official documentation?
http://www.stata.com/help.cgi?limits

The message looks like it is produced by Stata itself, not metan, since it appears also in
http://hsphsun3.harvard.edu/cgi-bin/...ticle-660.html
and elsewhere.

In this blog post Benjamin Gregory Carlisle describes that metan can't run with 11 variables in memory:
http://www.bgcarlisle.com/blog/2015/05/
which, I agree, is far from 100 or 120 limit and is the bare minimum for all practical purposes.
If metan can't work with Small Stata fundamentally, it should be properly marked so.
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#9

07 Jul 2015, 07:59

That blog post is worrying -- I don't like to think of users equating Stata's licensing model with a form of DRM. I'd be interested to know how many (simultaneously existing) tempvars my own programs produce -- is there a way of doing this, short of temporarily littering my code with noisily summarize? Is it fair to say that built-in Stata programs are designed to be "tempvar-efficient" as far as possible?
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#10

07 Jul 2015, 08:51

David,
1. there is a big difference: Stata licensing model works. As for the DRM.... I was actually referring to the top post only in that blog, the rest was irrelevant.
2. On the tempvars: no, I have asked already a while ago. A similar issue I was interested was whether the program ever "touches" variables during it's work, or they can be safely removed. There is also no way of determining this, except dropping the variables one after the other and trying all sorts of combinations.
3. On as far as possible - there are usually alternatives and tradeoffs. One can always use disk space instead of memory for the purposes the tempvars are created. This doesn't mean that this can be a preferable solution. Tempvar is called routinely in the StataCorp-supplied ado files, and by a number of built-in commands. For ado files you can investigate the efficiency by analyzing the source yourself. For built-ins I think it is safe to assume, unless reasonable suspicions exist (I have none).
Best, Sergiy.
Comment
Mark Benjamin

Join Date: Jul 2015

Posts: 4
#11

08 Jul 2015, 00:35

Thanks for the responses.
I wouldn't be surprised if there are temporary variables, but I have created a stripped-down set of trials - two - with control yes/no and treatment yes/no and nothing else. Thinking about the mechanics, you could potentially go through a few variables converting to and from log scales but hard to see a hundred. Stata's licensing model seems fine to me, but this is odd, especially as right now I am actually learning about meta-analysis so the teaching edition is not so useful without it. It has gone to technical support for further information. I'll upgrade to Stata IC anyway I think.
The funny thing is that the actual metaanalysis seems to work just fine, the only thing missing is the forest plot at the end, and I can even do a funnel plot.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#12

08 Jul 2015, 02:54

Mark: This doesn't have to be about anybody's guesses, as anyone can look at the code of metan and see that it creates lots of temporary variables for lots of purposes. There may be scope for tightening up there, but unfortunately none of the people who last worked with metan seems to be active in this field any more and it's not trivial to revise a very long program.

It is unfortunate that you were recommended, or perhaps rather that various recommendations implied, that you could use metan on Small Stata.

Although it's no comfort to you, it is perhaps revealing that this limitation of metan doesn't seem to have become obvious before, I guess because people who do meta-analysis don't use Small Stata.
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#13

08 Jul 2015, 03:31

If the forestplot is more of a the problem than the meta-analysis itself (as Mark states), that suggests it's the forestplot subroutine which is causing (most of) the problem. I've just taken a look at the metan source code (viewsource metan.ado at the command line) and counted over 80 tempvars created by the forestplot subroutine ("_dispgby") in normal use (of which only 5 or so are dropped soon after creation) ! That is a phenomenal number, and does indeed suggest that this is something course organizers intending to use the metan command need to consider.

Mark: I doubt if it's much help to you, as presumably you just want to get metan working so you can follow your course notes; but I have re-worked the code in metan's forestplot subroutine ("_dispgby") for my own program, forestplot (available within the ipdmetan SSC package). I make it around 35 tempvars created by my program in normal use (of which, again, 5 or so are dropped soon afterwards) -- still a lot, but far less than before. (and I could probably bring it down by another 5 or so.)
Comment
Mark Benjamin

Join Date: Jul 2015

Posts: 4
#14

08 Jul 2015, 04:29

No course notes - I did a diploma years ago and am just refreshing. I think we used revman for the meta-analysis component way back then. I assume that small Stata will run into problems using various sample datasets that I will run into along the way and the upgrade is cheap.
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#15

12 May 2017, 09:35

Just to follow up on this (nearly 2 years later!!):

I have just released (via SSC [1] with thanks to Kit Baum) an update to my program admetan (within the ipdmetan package) such that it now has nearly all of the functionality of metan, but is hopefully a lot more tempvar-efficient. If Mark, or anyone else, still has access to a resource-limited setting such as Mark describes, I would be very interested to learn whether admetan runs successfully, with or without a forest plot.

Thanks,

David.

[1] see http://www.statalist.org/forums/foru...13#post1392313
Comment

Announcement

Too many variables?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment