Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "variable _j contains all missing values" when reshaping long

    Hello everyone,

    I'm a bit surprised to have this output when reshaping long my dataset. Indeed, after looking at threads on the same topic in this forum, I found that in most cases this output appears because the string option must be specified as there could be non-numeric characters in the variable name besides the stub. However in my case I am trying to reshape 814 variables all named varname+### with ### being a number from 1 to 814. I used

    Code:
    rename * varname#, addnumber
    to do this. My code to reshape is the following :

    Code:
    reshape long varname, i(obs_number) j(column)
    Where obs_number being a variable I generated as equal to _n.

    I don't know if any dataex will be useful, but here's a reduced version of my data :

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str17 varname1 str19 varname2 str14 varname3 str17 varname4 byte obs_number
    "ID de la réponse" "Date de soumission" "Dernière page" "Langue de départ"  1
    "INFO"              "INFO"               "INFO"           "INFO"               2
    "INFO"              "INFO"               "INFO"           "INFO"               3
    "INFO"              "INFO"               "INFO"           "INFO"               4
    "INFO"              "INFO"               "INFO"           "INFO"               5
    "INFO"              "INFO"               "INFO"           "INFO"               6
    "INFO"              "INFO"               "INFO"           "INFO"               7
    "INFO"              "INFO"               "INFO"           "INFO"               8
    "INFO"              "INFO"               "INFO"           "INFO"               9
    "INFO"              "INFO"               "INFO"           "INFO"              10
    end
    As expected, the reshaping works on this specific part of the dataset. But since I don't know where exactly the problem is in my 814 variables, I can't spontaneously be more helpful. If you have any lead, I'll gladly show you more of my dataset in a more helpful way.

    Thank you for your help !

    EDIT : I noticed that if I drop half of my varname variables in a way that my dataset goes from varname1 to varname400, the reshape command now works. Could this be a storage problem ? If so, why is the output message related to _j containing missing values ?
    Last edited by Thomas Brot; 03 Mar 2023, 09:02.

  • #2
    It's too late for me to edit now, but I would like to add something. When I write the string option, the code works regardless of the number of variables to reshape. I could just go on with my coding but I have a very curious mind and I still would like to know why it's not working without the string option with all my vars having the right format.

    Comment


    • #3
      This works for me. Notice that I shifted metadata out of observation 1, where it does not belong.

      Code:
      clear
      input str17 varname1 str19 varname2 str14 varname3 str17 varname4 byte obs_number
      "ID de la réponse" "Date de soumission" "Dernière page" "Langue de départ"  1
      "INFO"              "INFO"               "INFO"           "INFO"               2
      "INFO"              "INFO"               "INFO"           "INFO"               3
      "INFO"              "INFO"               "INFO"           "INFO"               4
      "INFO"              "INFO"               "INFO"           "INFO"               5
      "INFO"              "INFO"               "INFO"           "INFO"               6
      "INFO"              "INFO"               "INFO"           "INFO"               7
      "INFO"              "INFO"               "INFO"           "INFO"               8
      "INFO"              "INFO"               "INFO"           "INFO"               9
      "INFO"              "INFO"               "INFO"           "INFO"              10
      end
      
      foreach v of var varname* { 
          label var `v' "`=`v'[1]'"
      }
      
      drop in 1 
      
      reshape long varname, i(obs_number) j(which)

      Comment


      • #4
        Dear Nick: Thank you for your help. Unfortunately, what works in this selected example doesn't work with my dataset, and it's hard to tell where exactly does the problem arises.

        I ran tests by dropping groups of variables to get a better grasp of it. And it seems that the problem is that I'm using more than 800 different values of _j. My code works for varname1 to varname800. If I try to add varname801, the error message shows. The code works for varname14 to varname814.

        Is it due to the fact that I'm using Stata/BE (version 17.0) and my software can't handle more than 800 variables ? I thought this was just for regressions.

        Comment


        • #5
          According to -help limits-, Stata BE allows up to 2,048 variables. While I can imagine that while -reshape- is running, the data set might contain some additional temporary variables, I would be astonished if it would blow up from 814 to over 2,048. So I don't think that's the problem. However, looking at the code in -reshape.ado-, there is a point at which it uses matrices, which have a limit of 800 rows and 800 columns in BE. I think this may be where it is failing.

          You might try looking into one of the user-written commands that have been written to do -reshape long- faster. Perhaps they do not encounter this limitation. Maybe they do, but I think you have little to lose by trying. -tolong- and -greshape- are both available from SSC, the latter as part of the -gtools- package.

          Another approach might be to just find a friend or colleague who has a larger Stata installed to run this for you.
          Last edited by Clyde Schechter; 03 Mar 2023, 12:50.

          Comment


          • #6
            Clyde : Thanks a lot for the suggestion ! I tried using tolong (from SSC) and it worked like a charm. Since I report to someone though, I don't know if they would be fine with using commands from the community (even if, in this case, they are very useful!). Isn't there any workaround ? Maybe I can try splitting the dataset in two parts and then append the two reshaped datasets together.

            EDIT : Actually, I can just make two do-files and if they also have Stata/BE, they'll just have to use mine. Problem solved!
            Last edited by Thomas Brot; 03 Mar 2023, 13:21.

            Comment

            Working...
            X