Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster analysis: problem of memory

    Hello,

    I'm working on a panel data of 3306 variables and 240000 observations ( stata/SE 13.0), when I use cluster analysis "cluster wardslinkage" I receive this message "insufficient memory for ClusterMatrix r(950);" I drop 1000 variables, but I receive the same message.
    clustering code :
    Code:
    cluster wardslinkage PTA, measure(L2)
    Thanks

  • #2
    Hello,

    I'm working on a panel data of 3306 variables and 240000 observations ( stata/SE 13.0), when I use cluster analysis "cluster wardslinkage" I receive this message "insufficient memory for ClusterMatrix r(950);" I drop 1000 variables, but I receive the same message.
    clustering code :
    Code:
    Code:
    cluster wardslinkage PTA, measure(L2)
    Thanks
    Tags: None

    Comment


    • #3
      1. Don't bump. You have been asked not to: https://www.statalist.org/forums/help#adviceextras
      2. Try dropping 3300 variables. That likely will not help, if the method is not sensitive to the number of variables (which I don't expect for this method). Reducing number of observations should help on the other hand. Try clustering for first 100 observations.
      3. https://github.com/scikit-learn/scik...rn/issues/3089
      mentions a similar problem, and gives an indication of memory requirement: roughly 10GB for a 52,674 observations clustering in R.
      Roughly this corresponds to 52,000^2*4 = 10.8GB

      You can confirm with developers, whether the full NxN-distance matrix is stored in Stata, I am pretty sure it should take advantage of it's symmetric nature given L2 (and I am seeing a roughly 450MB memory use for a 10,000 clustering task, where the full matrix size is 10,000^2*8=800,000,000bytes). But if not, and since mata uses doubles for this purpose, expect ~24GB memory requirement just to hold the distances matrix (with quad precision multiply by 2). If you don't have that much, then it fits the message Stata is issuing to you, and the solution is simple, just add a few extra DIMMs to your PC.

      For 240,000 update the figures accordingly.

      Best, Sergiy

      Comment


      • #4
        Thank you for your reply Sergly

        Comment


        • #5
          Memory management is different in some ways in the later versions of Stata. A more modern version might also help.

          Comment

          Working...
          X