As you maybe aware from some of my posts I am trying to get data from the internet to analyse in STATA this is known as data mining. Looking on Wikipedia there are a number of software available for this both free (open source) and paid for. Does anyone have any experience of this software and using it with STATA. Can you recommend one please for use with STATA?
Software as follows:-
Software
Free open-source data mining software and applications
The following applications are available under free/open source licenses. Public access to application source code is also available.
The following applications are available under proprietary licenses.
Software as follows:-
Software
Free open-source data mining software and applications
The following applications are available under free/open source licenses. Public access to application source code is also available.
- Carrot2: Text and search results clustering framework.
- Chemicalize.org: A chemical structure miner and web search engine.
- ELKI: A university research project with advanced cluster analysis and outlier detection methods written in the Javalanguage.
- GATE: a natural language processing and language engineering tool.
- KNIME: The Konstanz Information Miner, a user friendly and comprehensive data analytics framework.
- Massive Online Analysis (MOA): a real-time big data stream mining with concept drift tool in the Java programming language.
- MEPX - cross platform tool for regression and classification problems based on a Genetic Programming variant.
- ML-Flex: A software package that enables users to integrate with third-party machine-learning packages written in any programming language, execute classification analyses in parallel across multiple computing nodes, and produce HTML reports of classification results.
- MLPACK library: a collection of ready-to-use machine learning algorithms written in the C++ language.
- NLTK (Natural Language Toolkit): A suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python language.
- OpenNN: Open neural networks library.
- Orange: A component-based data mining and machine learning software suite written in the Python language.
- R: A programming language and software environment for statistical computing, data mining, and graphics. It is part of the GNU Project.
- scikit-learn is an open source machine learning library for the Python programming language
- Torch: An open source deep learning library for the Lua programming language and scientific computing framework with wide support for machine learning algorithms.
- UIMA: The UIMA (Unstructured Information Management Architecture) is a component framework for analyzing unstructured content such as text, audio and video – originally developed by IBM.
- Weka: A suite of machine learning software applications written in the Java programming language.
The following applications are available under proprietary licenses.
- Angoss KnowledgeSTUDIO: data mining tool.
- Clarabridge: text analytics product.
- KXEN Modeler: data mining tool provided by KXEN Inc..
- LIONsolver: an integrated software application for data mining, business intelligence, and modeling that implements the Learning and Intelligent OptimizatioN (LION) approach.
- Megaputer Intelligence: data and text mining software is called PolyAnalyst.
- Microsoft Analysis Services: data mining software provided by Microsoft.
- NetOwl: suite of multilingual text and entity analytics products that enable data mining.
- OpenText Big Data Analytics: Visual Data Mining & Predictive Analysis by Open Text Corporation
- Oracle Data Mining: data mining software by Oracle Corporation.
- PSeven: platform for automation of engineering simulation and analysis, multidisciplinary optimization and data mining provided by DATADVANCE.
- Qlucore Omics Explorer: data mining software.
- RapidMiner: An environment for machine learning and data mining experiments.
- SAS Enterprise Miner: data mining software provided by the SAS Institute.
- SPSS Modeler: data mining software provided by IBM.
- STATISTICA Data Miner: data mining software provided by StatSoft.
- Tanagra: Visualisation-oriented data mining software, also for teaching.
- Vertica: data mining software provided by Hewlett-Packard.