Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I reached maximal variables in StataBE

    dear all,

    Im currently working on a research project with the German SOEP datasets. I reached the maxvar in my StataBE version and got the following error:
    "no room to add more variables
    Up to 2,048 variables are allowed with this version of Stata. Versions are available that allow up to 120,000 variables."

    I tried to search online for fixes but didn't find anything that seems to work for me yet. So is there any other solution than an upgrade to StataSE?

    Best regards
    Philipp

  • #2
    Let me first encourage you to take a step back and think about whether the research question(s) that you are trying to answer really require(s) more than 2,000 variables.

    If your answer is yes, then remember that the variable limits appear to apply to a single frame, and because you can have many frames, you can play around with frames.
    Last edited by daniel klein; 02 Sep 2022, 04:57.

    Comment


    • #3
      In fact, let me be a little more emphatic: unless you're doing some wild machine learning problems with ultra high dimensional data, you don't need 2000 variables ever. Before we figure out how to solve the more variables problem, let's talk about your question and why you would need or want that many predictors

      Comment


      • #4
        SOEP is panel data, and your Stata dataset should be organized as a panel dataset, with separate observations for each combination of individual and wave in your data - what is often called a long layout. It should not be organized in a wide layout, where each individual has just one observation and variables are repeated for each wave - for example, income2001, income2002, etc.

        I've never needed thousands of variable when analyzing panel data in a long layout, but were I to reshape them into a wide layout, I expect there could be more than 2000 variables.

        If you have not done so already, you should read the Remarks and examples section of the Introduction to xt commands section of the Stata Longitudinal-Data/Panel-Data Reference Manual PDF included in your Stata installation and accessible from Stata's Help menu.

        Comment


        • #5
          Thank you Daniel and Jared, you're right more than 2048 variables aren't necessary in most cases.
          And yes William the SOEP datasets are in panel structure.
          Some more context:
          im am currently putting together a master dataset with variables of interest for our research project. For this I've got some stata code as a guide from other researchers, but we consider a bigger dataset with more co-variables than the other research projects before. For this I import the single data files and look through them to find interesting variables in the household and individual dta sets. The household dta (household questionnaire) wasn't a problem but the pl.dta (individual questionnaire) seems to big for my stata/BE.
          regarding your remarks William: The newer versions of the SOEP (that im using) are in long format from the start (as indicated by the l in pl.dta).

          describing the dta without opening it (as describes in the frames explanation, thanks to Daniel) showed the following code:

          Code:
          describe using pl, short
          
          Contains data                                 SOEP-Core, v37 (EU Edition),
                                                          doi:10.5684/soep.core.v37eu
           Observations:       742,822                  22 Apr 2022 11:27
              Variables:         4,690
          So this seems like one of the rare cases where more then 2048 vars is required :P
          But for my current task it would be sufficient to export a list with the variable names and labels. With the code below I can get this, but the output is too long to display it all. with "set more on" I can copy the output piece by piece but this seems a bit too hard.
          Code:
          describe using pl
          So new question: can I use some command like "asdoc" to generate a word or excel file withe the output from the code above?

          best regards
          Philipp

          Comment


          • #6
            So this seems like one of the rare cases where more then 2048 vars is required :P
            I had to pay $890 for health insurance by my wonderful university (uniquely American thing) which is being refunded to me, and I'm willing to bet $890 that you don't need that many variables. What you need, is to reshape your dataset (look up gtools and greshape, just google gtools stata). William Lisowski says your data are likely wide. You'll need it in long format

            Comment


            • #7
              Originally posted by Jared Greathouse View Post
              I had to pay $890 for health insurance by my wonderful university (uniquely American thing) which is being refunded to me, and I'm willing to bet $890 that you don't need that many variables. What you need, is to reshape your dataset (look up gtools and greshape, just google gtools stata). William Lisowski says your data are likely wide. You'll need it in long format
              i don’t need that many variables, as I just need the var names and labels to consider which variables to include. But to open this dta file would require more than 2048 variables.

              but you’re not right regarding the wide format, as stated in my earlier post, pl.dta is in long format (see the screenshots from the SOEP website).

              but still remaining:
              Code:
              describe using pl
              So new question: can I use some command like "asdoc" to generate a word or excel file withe the output from the code above?
              Attached Files

              Comment


              • #8
                Presumably. I've never used it, but I don't doubt it's possible. My only question would be why? You're outputting like what, 5 lines of text?

                Comment


                • #9
                  I leave out the addition of „short“ and thus get a way longer output than the output window can display.

                  Comment


                  • #10
                    Originally posted by Philipp Jonathan View Post
                    I just need the var names and labels to consider which variables to include. But to open this dta file would require more than 2048 variables.
                    The data provider probably provides a codebook that contains this information. You can ask someone to open the file and convert it into two or more files each containing variables that are within your limit. Or do it yourself in a third party software such as R.

                    Code:
                    install.packages('haven')
                    library(haven)
                    dataframe <- read_dta('mydata.dta')

                    Comment


                    • #11
                      I doubt -asdoc- or any other such command will help because the initial limitation is loading your data into Stata first.

                      You state that you need to see the variable names to know which you need for further work. These are some options:

                      1) Refer to your dataset document to find those variables and note them down in your do file. Presumably you need far fewer than 2048 so this may not be very time-consuming. -import delimited- is somewhat the outlier in terms of syntax in that you cannot arbitrarily choose the subset of variables to load (for reasons that are probably due to having to parse the contents as the file is read). What you could do instead is read in a range of columns (refer to -help import delimited- for details).

                      Code:
                      tempfile chunk1 chunk2
                      
                      import delimited using pl.csv, colrange(1:2048)
                      keep <my var list in first chunk>
                      save using `chunk1', replace
                      
                      import delimited using pl.csv, colrange(2049: )   // if you don't know the max number of variables, but are confident that it is less than 4096, simply omit the end of range value and Stata will read in until the final column. If there are more, you will simply get an error.
                      keep <my var list in second chunk>
                      save using `chunk2', replace
                      
                      use `chunk1', clear
                      merge 1:1 _n using `chunk2', nogen   // note that this concatenates rows horizontally, 1:1 by observation (row position)
                      This method will work but is rather cumbersome.

                      2) You could first import your CSV into Excel and save it back out as an Excel file. Then Stata will let you import any arbitrary variable list using -import excel-. Caution is needed here to check for Excel changing the data, especially around dates.

                      3) If you have access to StatTransfer you can convert the CSV directly to a Stata datafile, after which you can also import arbitrary variable lists using -use-. Similar conversion facilities are available in R (haven package) and probably Python too.

                      Edit: crossed with #9 and #10.
                      Last edited by Leonardo Guizzetti; 03 Sep 2022, 08:00.

                      Comment


                      • #12
                        use lets you load subsets of the data:

                        Code:
                        use varlist using pl.dta
                        You might be able to get the varlist from

                        Code:
                        describe using pl.dta , varlist
                        If you cannot, then usesome (SSC, or https://github.com/kleindaniel81/usesome) is a more direct approach than #11. For example, you load the first 2,048 variables as

                        Code:
                        usesome (1/2048) using pl.dta

                        Comment


                        • #13
                          In #7, O.P. is asking for a way to save the output of describe in a document that can be read using other general software. Consider:

                          Code:
                          capture log close
                          set more off
                          log using description.log, replace
                          
                          describe using pl
                          
                          log close
                          This will create a plain text file, description.log that contains the output of describe. If you want to then import that to .docx or .pdf or .xlsx or the like, that is your pleasure. Or maybe the text file will serve your purposes as is.
                          Last edited by Clyde Schechter; 03 Sep 2022, 10:34.

                          Comment


                          • #14
                            Dear Daniel,
                            the usesome command works. Thank you!

                            Dear Clyde,
                            off course the creating a log file solves my problem, many thanks to you as well!

                            best
                            Philipp

                            Comment


                            • #15
                              PS: I found a website that is also helpful in deciding which variables to include in your SOEP research project: paneldata.org
                              best
                              Philipp

                              Comment

                              Working...
                              X