{"id":149,"date":"2019-02-13T17:23:15","date_gmt":"2019-02-13T14:23:15","guid":{"rendered":"https:\/\/blog.artificialenergy.uk\/?p=149"},"modified":"2019-02-13T17:23:15","modified_gmt":"2019-02-13T14:23:15","slug":"the-best-open-source-machine-learning-frameworks","status":"publish","type":"post","link":"https:\/\/blog.artificialenergy.uk\/?p=149","title":{"rendered":"The Best Open Source Machine Learning Frameworks"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/i0.wp.com\/opensourceforu.com\/wp-content\/uploads\/2016\/12\/Machine-Learning-Framework-350x307.jpg?resize=640%2C561\" alt=\"machine-learning-framework\" class=\"wp-image-26524\"\/><\/figure>\n\n\n\n<p><em>In this article, we present what the author rates as the top eight open source machine learning frameworks.<\/em><\/p>\n\n\n\n<p>Learning may be defined as the process of improving one\u2019s ability to \nperform a task efficiently. Machine learning is another sub-field of \ncomputer science, which enables modern computers to learn without being \nexplicitly programmed. Machine learning has basically evolved from \nartificial intelligence via pattern recognition and computational \nlearning theory. Machine learning explores the area of algorithms, which\n can make high end predictions on data. In recent times, machine \nlearning has been deployed in a wide range of computing tasks, where \ndesigning efficient algorithms and programs becomes rather difficult, \nsuch as email spam filtering, optical character recognition, search \nengine improvement, digital image processing, data mining, etc.<br>\nTom M. Mitchell, renowned computer scientist and professor at Carnegie \nMellon University, USA, defined machine learning as: \u201cA computer program\n is said to learn from experience E with respect to some class of tasks T\n and performance measure P, if its performance at tasks in T, as \nmeasured by P, improves with experience E.\u201d<br>\nMachine learning tasks are broadly classified into three categories, \ndepending on the nature of the learning \u2018signal\u2019 or \u2018feedback\u2019 available\n to a learning system.<\/p>\n\n\n\n<ul><li><em>Supervised learning<\/em> is regarded as a machine learning task \nof inferring a function from labelled training data. In supervised \nlearning, each example is a pair consisting of an input object (vector) \nand a desired output value (supervisory signal).<\/li><li><em>Unsupervised learning: &nbsp;<\/em>This is regarded as the machine \nlearning task of inferring a function to describe hidden structures from\n unlabelled data. It is closely related to the problem of density \nestimation in statistics.<\/li><li><em>Reinforcement learning<\/em> is an area of machine learning that \nis linked to how software agents take actions in the environment so as \nto maximise some notion of cumulative reward. It is applied to diverse \nareas like game theory, information theory, swarm intelligence, \nstatistics and genetic algorithms. In machine learning, the environment \nis formulated as a Markov decision process (MDP) due to dynamic \nprogramming techniques.<\/li><\/ul>\n\n\n\n<p>The application of machine learning to diverse areas of computing is \ngaining popularity rapidly, not only because of cheap and powerful \nhardware, but also because of the increasing availability of free and \nopen source software, which enable machine learning to be implemented \neasily. Machine learning practitioners and researchers, being a part of \nthe software engineering team, continuously build sophisticated \nproducts, integrating intelligent algorithms with the final product to \nmake software work more reliably, quickly and without hassles.<br>\nThere is a wide range of open source machine learning frameworks \navailable in the market, which enable machine learning engineers to \nbuild, implement and maintain machine learning systems, generate new \nprojects and create new impactful machine learning systems.<br>\nLet\u2019s take a look at some of the top open source machine learning frameworks available.\n\nAdvertisement\n\n\n\n\n\n\n\n\n<\/p>\n\n\n\n<p><strong>Apache Singa<\/strong><br>\nThe Singa Project was initiated by the DB System Group at the National \nUniversity of Singapore in 2014, with a primary focus on distributed \ndeep learning by partitioning the model and data onto nodes in a cluster\n and parallelising the training. Apache Singa provides a simple \nprogramming model and works across a cluster of machines. It is \nprimarily used in natural language processing (NLP) and image \nrecognition. A Singa prototype accepted by Apache Incubator in March \n2015 provides a flexible architecture of scalable distributed training \nand is extendable to run over a wide range of hardware.<br>\nApache Singa was designed with an intuitive programming model based on \nlayer abstraction. A wide variety of popular deep learning models are \nsupported, such as feed-forward models like convolutional neural \nnetworks (CNN), energy models like Restricted Boltzmann Machine (RBM), \nand recurrent neural networks (RNN).&nbsp; Based on a flexible architecture, \nSinga runs various synchronous, asynchronous and hybrid training \nframeworks.<br>\nSinga\u2019s software stack has three main components: Core, IO and Model. \nThe Core component is concerned with memory management and tensor \noperations. IO contains classes for reading and writing data to the disk\n and the network. Model includes data structures and algorithms for \nmachine learning models.<\/p>\n\n\n\n<p>Its main features are:<\/p>\n\n\n\n<ul><li>Includes tensor abstraction for strong support for more advanced machine learning models<\/li><li>Supports device abstraction for running on varied hardware devices<\/li><li>Makes use of <em>cmake<\/em> for compilation rather than <em>GNU autotool<\/em><\/li><li>Improvised Python binding and contains more deep learning models like VGG and ResNet<\/li><li>Includes enhanced IO classes for reading, writing, encoding and decoding files and data<\/li><\/ul>\n\n\n\n<p>The latest version is 1.0.<br>\n<strong>Website:<\/strong> <a href=\"http:\/\/singa.apache.org\/en\/index.html\"><em>http:\/\/singa.apache.org\/en\/index.html<\/em><\/a><\/p>\n\n\n\n<p><strong>Shogun<\/strong><br>\nShogun was initiated by Soeren Sonnenburg and Gunnar Raetsch in 1999 and\n is currently under rapid development by a large team of programmers. \nThis free and open source toolbox written in C++ provides algorithms and\n data structures for machine learning problems. Shogun Toolbox provides \nthe use of a toolbox via a unified interface from C++, Python, Octave, \nR, Java, Lua and C++; and can run on Windows, Linux and even MacOS. \nShogun is designed for unified large-scale learning for a broad range of\n feature types and learning settings, like classification, regression, \ndimensionality reduction, clustering, etc. It contains a number of \nexclusive state-of-art algorithms, such as a wealth of efficient SVM \nimplementations, multiple kernel learning, kernel hypothesis testing, \nKrylov methods, etc.<br>\nShogun supports bindings to other machine learning libraries like \nLibSVM, LibLinear, SVMLight, LibOCAS, libqp, VowpalWabbit, Tapkee, SLEP,\n GPML and many more.<br>\nIts features include one-time classification, multi-class \nclassification, regression, structured output learning, pre-processing, \nbuilt-in model selection strategies, visualisation and test frameworks; \nand semi-supervised, multi-task and large scale learning.<br>\nThe latest version is 4.1.0.<br>\n<strong>Website:<\/strong><a href=\"http:\/\/www.shogun-toolbox.org\/\"><em> http:\/\/www.shogun-toolbox.org\/<\/em><\/a><\/p>\n\n\n\n<p><strong>Apache Mahout<\/strong><br>\nApache Mahout, being a free and open source project of the Apache \nSoftware Foundation, has a goal to develop free distributed or scalable \nmachine learning algorithms for diverse areas like collaborative \nfiltering, clustering and classification. Mahout provides Java libraries\n and Java collections for various kinds of mathematical operations.<br>\nApache Mahout is implemented on top of Apache Hadoop using the MapReduce\n paradigm. Once Big Data is stored on the Hadoop Distributed File System\n (HDFS), Mahout provides the data science tools to automatically find \nmeaningful patterns in these Big Data sets, turning this into \u2018big \ninformation\u2019 quickly and easily.<\/p>\n\n\n\n<ul><li><em>Building a recommendation engine:<\/em> Mahout provides tools for building a recommendation engine via the Taste library\u2013 a fast and flexible engine for CF.<\/li><li><em>Clustering with Mahout:<\/em> Several clustering algorithms are supported by Mahout, like Canopy, k-Means, Mean-Shift, Dirichlet, etc.<\/li><li><em>Categorising content with Mahout:<\/em> Mahout uses the simple Map-Reduce-enabled na\u00efve Bayes classifier.<br>\nThe latest version is 0.12.2.<br>\n<strong>Website:<\/strong> <a href=\"https:\/\/mahout.apache.org\/\"><em>https:\/\/mahout.apache.org\/<\/em><\/a><\/li><\/ul>\n\n\n\n<p><strong>Apache Spark MLlib<\/strong><br>\nApache Spark MLlib is a machine learning library, the primary objective \nof which is to make practical machine learning scalable and easy. It \ncomprises common learning algorithms and utilities, including \nclassification, regression, clustering, collaborative filtering, \ndimensionality reduction as well as lower-level optimisation primitives \nand higher-level pipeline APIs.<br>\nSpark MLlib is regarded as a distributed machine learning framework on \ntop of the Spark Core which, mainly due to the distributed memory-based \nSpark architecture, is almost nine times as fast as the disk-based \nimplementation used by Apache Mahout.<br>\nThe various common machine learning and statistical algorithms that have been implemented and included with MLlib are:<\/p>\n\n\n\n<ul><li>Summary statistics, correlations, hypothesis testing, random data generation<\/li><li><em>Classification and regression:<\/em> Supports vector machines, logistic regression, linear regression, na\u00efve Bayes classification<\/li><li>Collaborative filtering techniques including Alternating Least Squares (ALS)<\/li><li>Cluster analysis methods including k-means and Latent Dirichlet Allocation (LDA)<\/li><li>Optimisation algorithms such as stochastic gradient descent and limited-memory BGGS<br>\nThe latest version is 2.0.1.<br>\n<strong>Website:<\/strong> <a href=\"http:\/\/spark.apache.org\/mllib\/\"><em>http:\/\/spark.apache.org\/mllib\/&nbsp;<\/em><\/a><\/li><\/ul>\n\n\n\n<p><strong>TensorFlow<\/strong><br>\nTensorFlow is an open source software library for machine learning \ndeveloped by the Google Brain Team for various sorts of perceptual and \nlanguage understanding tasks, and to conduct sophisticated research on \nmachine learning and deep neural networks. It is Google Brain\u2019s second \ngeneration machine learning system and can run on multiple CPUs and \nGPUs. TensorFlow is deployed in various products of Google like speech \nrecognition, Gmail, Google Photos and even Search.<br>\nTensorFlow performs numerical computations using data flow graphs. These\n elaborate the mathematical computations with a directed graph of nodes \nand edges. Nodes implement mathematical operations and can also \nrepresent endpoints to feed in data, push out results or read\/write \npersistent variables. <em>Edges<\/em> describe the input\/output relationships between nodes. <em>Data edges<\/em> carry dynamically-sized multi-dimensional data arrays or tensors.<br>\nIts features are listed below.<\/p>\n\n\n\n<ul><li><em>Highly flexible:<\/em> TensorFlow enables users to write their \nown higher-level libraries on top of it by using C++ and Python, and \nexpress the neural network computation as a data flow graph.<\/li><li><em>Portable:<\/em> It can run on varied CPUs or GPUs, and even on mobile computing platforms. It also supports Docker and running via the cloud.<\/li><li><em>Auto-differentiation:<\/em> TensorFlow enables the user to define\n the computational architecture of predictive models combined with \nobjective functions, and can handle complex computations.<\/li><li><em>Diverse language options:<\/em> It has an easy Python based interface and enables users to write code, and see visualisations and data flow graphs.<br>\nThe latest version is 0.10.0.<br>\n<strong>Website:<\/strong><em> www.tensorflow.org&nbsp;<\/em><\/li><\/ul>\n\n\n\n<p><strong>Oryx 2<\/strong><br>\nOryx 2 is a realisation of Lambda architecture built on Apache Spark and\n Apache Kafka for real-time large scale machine learning. It is designed\n for building applications and includes packaged, end-to-end \napplications for collaborative filtering, classification, regression and\n clustering.<br>\nOryx 2 comprises the following three tiers.<\/p>\n\n\n\n<ul><li><em>General Lambda architecture tier:<\/em> Provides batch, speed and serving layers, which are not specific to machine learning.<\/li><li>Specialisation on top which, in turn, provides machine learning abstraction to hyperparameter selection, etc.<\/li><li>End-to-end implementation of the same standard machine learning \nalgorithms as an application (ALS, random decision forests, k-means) on \ntop.<\/li><\/ul>\n\n\n\n<p>Oryx 2 consists of the following layers of Lambda architecture as well as connecting elements.<\/p>\n\n\n\n<ul><li><em>Batch layer:<\/em> Used for computing new results from historical data and previous results.<\/li><li><em>Speed layer:<\/em> Produces and publishes incremental model updates from a stream of new data.<\/li><li><em>Serving layer:<\/em> Receives models and updates, and implements a synchronous API, exposing query operations on results.<\/li><li><em>Data transport layer:<\/em> Moves data between layers and takes input from external sources.<br>\nThe latest version is 2.2.1.<br>\n<strong>Website:<\/strong><a href=\"http:\/\/oryx.io\/\"><em> http:\/\/oryx.io\/<\/em><\/a><\/li><\/ul>\n\n\n\n<p><strong>Accord.NET<\/strong><\/p>\n\n\n\n<p>Accord.NET is a .NET open source machine learning framework for \nscientific computing, and consists of multiple libraries for diverse \napplications like statistical data processing, pattern recognition, \nlinear algebra, artificial neural networks, image and signal processing,\n etc.<br>\nThe framework is divided into libraries via the installer, compressed \narchives and NuGet packages, which include Accord.Math, \nAccord.Statistics, Accord.MachineLearning, Accord.Neuro, Accord.Imaging,\n Accord.Audio, Accord.Vision, Accord.Controls, Accord.Controls.Imaging, \nAccord.Controls.Audio, Accord.Controls.Vision, etc.<br>\nIts features are:<\/p>\n\n\n\n<ul><li>Matrix library for an increase in code reusability, and gradual change of existing algorithms over standard .NET structures.<\/li><li>Consists of more than 40 different statistical distributions like hidden Markov models and mixture models.<\/li><li>Consists of more than 30 hypothesis tests like ANOVA, two-sample, multiple-sample, etc.<\/li><li>Consists of more than 38 kernel functions like KVM, KPC and KDA.<\/li><\/ul>\n\n\n\n<p>The latest version is 3.1.0.<br>\n<strong>Website:<\/strong><a href=\"http:\/\/www.accord-framework.net\"><em> www.accord-framework.net<\/em><\/a><\/p>\n\n\n\n<p><strong>Amazon Machine Learning (AML)<\/strong><\/p>\n\n\n\n<p>Amazon Machine Learning (AML) is a machine learning service for \ndevelopers. It has many visualisation tools and wizards for creating \nhigh-end sophisticated and intelligent machine learning models without \nany need to learn complex ML algorithms and technologies. Via AML, \npredictions for applications can be obtained using simple APIs without \nusing custom prediction generation code or complex infrastructure.<\/p>\n\n\n\n<p>AML is based on simple, scalable, dynamic and flexible ML technology \nused by Amazon\u2019s \u2018Internal Scientists\u2019 community professionals to create\n Amazon Cloud Services. AML connects to data stored in Amazon S3, \nRedshift or RDS, and can run binary classification, multi-class \ncategorisation or regression on this data to create models.<br>\nThe key contents used in Amazon ML are listed below.<\/p>\n\n\n\n<ul><li><em>Datasources:<\/em> Contain metadata associated with data inputs to Amazon ML.<\/li><li><em>ML models:<\/em> Generate predictions using the patterns extracted from the input data.<\/li><li><em>Evaluations:<\/em> Measure the quality of ML models.<\/li><li>Batch predictions asynchronously generate predictions for multiple input data observations.<\/li><li>Real-time predictions synchronously generate predictions for individual data observations.<\/li><\/ul>\n\n\n\n<p>Its key features are:<\/p>\n\n\n\n<ul><li>Supports multiple data sources within its system.<\/li><li>Allows users to create a data source object from data residing in Amazon Redshift \u2013 the data warehouse Platform as a Service.<\/li><li>Allows users to create a data source object from data stored in the MySQL database.<\/li><li>Supports three types of models: binary classification, multi-class classification and regression.<\/li><\/ul>\n\n\n\n<p><strong>Website:<\/strong> <a href=\"https:\/\/aws.amazon.com\/machine-learning\/\"><em>https:\/\/aws.amazon.com\/machine-learning\/ <\/em><\/a><\/p>\n\n\n\n<p> By <a href=\"https:\/\/opensourceforu.com\/author\/anand-nayyar\/\">Dr Anand Nayyar<\/a> &#8211;                      January 17, 2017 <br><br>https:\/\/opensourceforu.com\/2017\/01\/best-open-source-machine-learning-frameworks\/<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we present what the author rates as the top eight open source machine learning frameworks<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[40,67,4,39],"tags":[73,74],"_links":{"self":[{"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=\/wp\/v2\/posts\/149"}],"collection":[{"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=149"}],"version-history":[{"count":1,"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=\/wp\/v2\/posts\/149\/revisions"}],"predecessor-version":[{"id":150,"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=\/wp\/v2\/posts\/149\/revisions\/150"}],"wp:attachment":[{"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=149"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=149"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.artificialenergy.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}