Dear Forum Users,
I am working on a binomial logistic regression. My dataset consists of 16 mln observations with 12235 variables (total of 195GB). Currently, I am running the regression on STATA with only 100000 observations out of 16mln. It takes around 24-36 hours to complete the task. I am working on a virtual machine (VMware Horizon Client). Now I need to analyze all dataset. I understand that I need to improve the configurations of the Virtual Machine in order to finish running the regression with all observations in a reasonable time. But I do not how fast it should be in terms of technical properties, like how many processors specifically are needed, etc. How do I identify how strong I want this virtual machine to be? It is easier to ask an IT specialist direct requests rather than just '' please, make it work faster''. Do you have any idea how I can solve this issue?
Kind regards,
Firangiz
I am working on a binomial logistic regression. My dataset consists of 16 mln observations with 12235 variables (total of 195GB). Currently, I am running the regression on STATA with only 100000 observations out of 16mln. It takes around 24-36 hours to complete the task. I am working on a virtual machine (VMware Horizon Client). Now I need to analyze all dataset. I understand that I need to improve the configurations of the Virtual Machine in order to finish running the regression with all observations in a reasonable time. But I do not how fast it should be in terms of technical properties, like how many processors specifically are needed, etc. How do I identify how strong I want this virtual machine to be? It is easier to ask an IT specialist direct requests rather than just '' please, make it work faster''. Do you have any idea how I can solve this issue?
Kind regards,
Firangiz
Comment