Yapay Zeka – Figment by figment of my imagination

In this article, we present what the author rates as the top eight open source machine learning frameworks.

Learning may be defined as the process of improving one’s ability to perform a task efficiently. Machine learning is another sub-field of computer science, which enables modern computers to learn without being explicitly programmed. Machine learning has basically evolved from artificial intelligence via pattern recognition and computational learning theory. Machine learning explores the area of algorithms, which can make high end predictions on data. In recent times, machine learning has been deployed in a wide range of computing tasks, where designing efficient algorithms and programs becomes rather difficult, such as email spam filtering, optical character recognition, search engine improvement, digital image processing, data mining, etc.
Tom M. Mitchell, renowned computer scientist and professor at Carnegie Mellon University, USA, defined machine learning as: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Machine learning tasks are broadly classified into three categories, depending on the nature of the learning ‘signal’ or ‘feedback’ available to a learning system.

Supervised learning is regarded as a machine learning task of inferring a function from labelled training data. In supervised learning, each example is a pair consisting of an input object (vector) and a desired output value (supervisory signal).
Unsupervised learning: This is regarded as the machine learning task of inferring a function to describe hidden structures from unlabelled data. It is closely related to the problem of density estimation in statistics.
Reinforcement learning is an area of machine learning that is linked to how software agents take actions in the environment so as to maximise some notion of cumulative reward. It is applied to diverse areas like game theory, information theory, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is formulated as a Markov decision process (MDP) due to dynamic programming techniques.

The application of machine learning to diverse areas of computing is gaining popularity rapidly, not only because of cheap and powerful hardware, but also because of the increasing availability of free and open source software, which enable machine learning to be implemented easily. Machine learning practitioners and researchers, being a part of the software engineering team, continuously build sophisticated products, integrating intelligent algorithms with the final product to make software work more reliably, quickly and without hassles.
There is a wide range of open source machine learning frameworks available in the market, which enable machine learning engineers to build, implement and maintain machine learning systems, generate new projects and create new impactful machine learning systems.
Let’s take a look at some of the top open source machine learning frameworks available. Advertisement

Apache Singa
The Singa Project was initiated by the DB System Group at the National University of Singapore in 2014, with a primary focus on distributed deep learning by partitioning the model and data onto nodes in a cluster and parallelising the training. Apache Singa provides a simple programming model and works across a cluster of machines. It is primarily used in natural language processing (NLP) and image recognition. A Singa prototype accepted by Apache Incubator in March 2015 provides a flexible architecture of scalable distributed training and is extendable to run over a wide range of hardware.
Apache Singa was designed with an intuitive programming model based on layer abstraction. A wide variety of popular deep learning models are supported, such as feed-forward models like convolutional neural networks (CNN), energy models like Restricted Boltzmann Machine (RBM), and recurrent neural networks (RNN). Based on a flexible architecture, Singa runs various synchronous, asynchronous and hybrid training frameworks.
Singa’s software stack has three main components: Core, IO and Model. The Core component is concerned with memory management and tensor operations. IO contains classes for reading and writing data to the disk and the network. Model includes data structures and algorithms for machine learning models.

Its main features are:

Includes tensor abstraction for strong support for more advanced machine learning models
Supports device abstraction for running on varied hardware devices
Makes use of cmake for compilation rather than GNU autotool
Improvised Python binding and contains more deep learning models like VGG and ResNet
Includes enhanced IO classes for reading, writing, encoding and decoding files and data

The latest version is 1.0.
Website: http://singa.apache.org/en/index.html

Shogun
Shogun was initiated by Soeren Sonnenburg and Gunnar Raetsch in 1999 and is currently under rapid development by a large team of programmers. This free and open source toolbox written in C++ provides algorithms and data structures for machine learning problems. Shogun Toolbox provides the use of a toolbox via a unified interface from C++, Python, Octave, R, Java, Lua and C++; and can run on Windows, Linux and even MacOS. Shogun is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, dimensionality reduction, clustering, etc. It contains a number of exclusive state-of-art algorithms, such as a wealth of efficient SVM implementations, multiple kernel learning, kernel hypothesis testing, Krylov methods, etc.
Shogun supports bindings to other machine learning libraries like LibSVM, LibLinear, SVMLight, LibOCAS, libqp, VowpalWabbit, Tapkee, SLEP, GPML and many more.
Its features include one-time classification, multi-class classification, regression, structured output learning, pre-processing, built-in model selection strategies, visualisation and test frameworks; and semi-supervised, multi-task and large scale learning.
The latest version is 4.1.0.
Website: http://www.shogun-toolbox.org/

Apache Mahout
Apache Mahout, being a free and open source project of the Apache Software Foundation, has a goal to develop free distributed or scalable machine learning algorithms for diverse areas like collaborative filtering, clustering and classification. Mahout provides Java libraries and Java collections for various kinds of mathematical operations.
Apache Mahout is implemented on top of Apache Hadoop using the MapReduce paradigm. Once Big Data is stored on the Hadoop Distributed File System (HDFS), Mahout provides the data science tools to automatically find meaningful patterns in these Big Data sets, turning this into ‘big information’ quickly and easily.

Building a recommendation engine: Mahout provides tools for building a recommendation engine via the Taste library– a fast and flexible engine for CF.
Clustering with Mahout: Several clustering algorithms are supported by Mahout, like Canopy, k-Means, Mean-Shift, Dirichlet, etc.
Categorising content with Mahout: Mahout uses the simple Map-Reduce-enabled naïve Bayes classifier.
The latest version is 0.12.2.
Website: https://mahout.apache.org/

Apache Spark MLlib
Apache Spark MLlib is a machine learning library, the primary objective of which is to make practical machine learning scalable and easy. It comprises common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction as well as lower-level optimisation primitives and higher-level pipeline APIs.
Spark MLlib is regarded as a distributed machine learning framework on top of the Spark Core which, mainly due to the distributed memory-based Spark architecture, is almost nine times as fast as the disk-based implementation used by Apache Mahout.
The various common machine learning and statistical algorithms that have been implemented and included with MLlib are:

Summary statistics, correlations, hypothesis testing, random data generation
Classification and regression: Supports vector machines, logistic regression, linear regression, naïve Bayes classification
Collaborative filtering techniques including Alternating Least Squares (ALS)
Cluster analysis methods including k-means and Latent Dirichlet Allocation (LDA)
Optimisation algorithms such as stochastic gradient descent and limited-memory BGGS
The latest version is 2.0.1.
Website: http://spark.apache.org/mllib/

TensorFlow
TensorFlow is an open source software library for machine learning developed by the Google Brain Team for various sorts of perceptual and language understanding tasks, and to conduct sophisticated research on machine learning and deep neural networks. It is Google Brain’s second generation machine learning system and can run on multiple CPUs and GPUs. TensorFlow is deployed in various products of Google like speech recognition, Gmail, Google Photos and even Search.
TensorFlow performs numerical computations using data flow graphs. These elaborate the mathematical computations with a directed graph of nodes and edges. Nodes implement mathematical operations and can also represent endpoints to feed in data, push out results or read/write persistent variables. Edges describe the input/output relationships between nodes. Data edges carry dynamically-sized multi-dimensional data arrays or tensors.
Its features are listed below.

Highly flexible: TensorFlow enables users to write their own higher-level libraries on top of it by using C++ and Python, and express the neural network computation as a data flow graph.
Portable: It can run on varied CPUs or GPUs, and even on mobile computing platforms. It also supports Docker and running via the cloud.
Auto-differentiation: TensorFlow enables the user to define the computational architecture of predictive models combined with objective functions, and can handle complex computations.
Diverse language options: It has an easy Python based interface and enables users to write code, and see visualisations and data flow graphs.
The latest version is 0.10.0.
Website: www.tensorflow.org

Oryx 2
Oryx 2 is a realisation of Lambda architecture built on Apache Spark and Apache Kafka for real-time large scale machine learning. It is designed for building applications and includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering.
Oryx 2 comprises the following three tiers.

General Lambda architecture tier: Provides batch, speed and serving layers, which are not specific to machine learning.
Specialisation on top which, in turn, provides machine learning abstraction to hyperparameter selection, etc.
End-to-end implementation of the same standard machine learning algorithms as an application (ALS, random decision forests, k-means) on top.

Oryx 2 consists of the following layers of Lambda architecture as well as connecting elements.

Batch layer: Used for computing new results from historical data and previous results.
Speed layer: Produces and publishes incremental model updates from a stream of new data.
Serving layer: Receives models and updates, and implements a synchronous API, exposing query operations on results.
Data transport layer: Moves data between layers and takes input from external sources.
The latest version is 2.2.1.
Website: http://oryx.io/

Accord.NET

Accord.NET is a .NET open source machine learning framework for scientific computing, and consists of multiple libraries for diverse applications like statistical data processing, pattern recognition, linear algebra, artificial neural networks, image and signal processing, etc.
The framework is divided into libraries via the installer, compressed archives and NuGet packages, which include Accord.Math, Accord.Statistics, Accord.MachineLearning, Accord.Neuro, Accord.Imaging, Accord.Audio, Accord.Vision, Accord.Controls, Accord.Controls.Imaging, Accord.Controls.Audio, Accord.Controls.Vision, etc.
Its features are:

Matrix library for an increase in code reusability, and gradual change of existing algorithms over standard .NET structures.
Consists of more than 40 different statistical distributions like hidden Markov models and mixture models.
Consists of more than 30 hypothesis tests like ANOVA, two-sample, multiple-sample, etc.
Consists of more than 38 kernel functions like KVM, KPC and KDA.

The latest version is 3.1.0.
Website: www.accord-framework.net

Amazon Machine Learning (AML)

Amazon Machine Learning (AML) is a machine learning service for developers. It has many visualisation tools and wizards for creating high-end sophisticated and intelligent machine learning models without any need to learn complex ML algorithms and technologies. Via AML, predictions for applications can be obtained using simple APIs without using custom prediction generation code or complex infrastructure.

AML is based on simple, scalable, dynamic and flexible ML technology used by Amazon’s ‘Internal Scientists’ community professionals to create Amazon Cloud Services. AML connects to data stored in Amazon S3, Redshift or RDS, and can run binary classification, multi-class categorisation or regression on this data to create models.
The key contents used in Amazon ML are listed below.

Datasources: Contain metadata associated with data inputs to Amazon ML.
ML models: Generate predictions using the patterns extracted from the input data.
Evaluations: Measure the quality of ML models.
Batch predictions asynchronously generate predictions for multiple input data observations.
Real-time predictions synchronously generate predictions for individual data observations.

Its key features are:

Supports multiple data sources within its system.
Allows users to create a data source object from data residing in Amazon Redshift – the data warehouse Platform as a Service.
Allows users to create a data source object from data stored in the MySQL database.
Supports three types of models: binary classification, multi-class classification and regression.

Website: https://aws.amazon.com/machine-learning/

By Dr Anand Nayyar – January 17, 2017

https://opensourceforu.com/2017/01/best-open-source-machine-learning-frameworks/

I returned from holiday this summer and met a friend who asked where I had been. I told him I visited Istanbul and had taken advantage of two other national trips, which were super deals and in comparison no more than the price of second class seat on a national-rail train, one way ticket. One of these was a five day tour of the south mediterranian coast to include Antalya and Bodrum. The other, a three day trip to Cappadocia. Both of these trips included inter city flights, half board hotels and tour guides supervising the group professionally on designated coaches making the trips extremely convenient from collection points to and from the airport. The pretty much all inclusive five day trip including eturn flights from Istancbul cost just 139 pounds. The Capadocia tour was even less.

My friend said he had always fancied Turkey as a holiday destination but he said he was particularly avert to overwheming attitudes of certain sales people trying to draw tourists in to their shops to buy things that they didn’t really want. I understood eactly what he was talking about as I had myself experienced that difficult scenario upon former trips abroad. He was right, the last thing a tourist wants is to be pestered by salesmen with poor English trying to lure them off the street and into their shops to sell them souveneirs that they didn’t want or even perhaps they had already bought and they were no longer in shopping mode but just sigh seeing or trying to get to places they had planned to see.

Thankfully, this tradition of pestering tourists and in some cases playing with their emotions to buy goods they didn’t have any interest in. In recent years the Turkish government has not only outlawed street beggars but men stood outside independent stores drawing customers in are no longer a problem either.

Back in the UK and recently my combination boiler, which is over 15 years old broke down. I had already exhausted all possible means of support to keep the boiler running as the manufacturer with whom I had a maintenance contract had alreay written to me to say they would no longer be supporting this boiler and our maintenance contract was terminated accordingly. All third party service engineers who had visited to service the boiler since also advised that I should simply give up on it and they would be happy to install a new one, at a cost that equates to almost 300 hours of work at minimum wage rates… In other words, for a lot of money!

I have one way or another managed to keep my boiler going for at least five years since the industry totally gave up on it and and recently came across another failure mode. I had turned off the Central Heating functions at the start of summer as the weather settled into summer temperatures so heating was not needed. At the end of summer temperatures dropped again so I tried to turn on the Central Heating and it simply didn’t work even though hot water was still consistent and working as well as expected.

I got out my installation manual for the boiler and followed the diagnotics flow charts. This helped me to narrow it down to a component. Now all I had to do was find a spare part and replace it. It was at this point that I noticed a similarity between the pushy sales people on holiday in the past, right here at the cradle of advanced technologies and high standards of living.

So, it turns out that the primative Artificial Intelligence(AI) used in modern search engines used across Europe and in this instance I’m talking about eBay, I put in my search criteria of the part number and not many hits since this is an old part number and no doubt the original manufacturer had sold off that part of the business to companies who had changed part numbers to suit their business, so the information I was looking for was buried and difficult to find. Obviously, eBay like many other on-line retail platforms are geared to sell ‘something’ even if not what I am looking for. So, in an effort to eliminate the possibility of changed part numbers and other possible boilers where the same component might have been used, I began entering a description of the part instead of the original part number.. For example a permutation of keywords including “Worcester Bosh boiler Cenral Heating Temperature sensor” and relentlessly the site spewed out dozens of options ranging from complete boilers to hundreds of parts that I am not at all interested.

This where the similarities of pushy sales people on holiday dawned on me. This is not the first instance where I have specified exactly what I am looking for to a search engine and in return, it has suggested a thousand and one other things that I have no interest in buying. And of course nowdays it is not uncommon for cookies to kick in so search engine service providers fire off our seach behaviour online to thousands of ‘associates’ who all take advantage of our geolocation, keywords used and throw useless information at us about what they think we should be looking for and ultimately buying. I wondered if this was any different to pushy sales people who approach us while we are trying to enjoy our holidays abroad to sell us useless products that we have no interest in whatsoever.

It turns out that my friend is actively looking for a destination for a holiday and I hear he has settled for Majorca – again, presumably because they will not be encouraged to spend money on things they do not want there even though back at home, where ever that is, during our daily routines to source items of interst, we are persistently bothered by virtual sales entities to stray away from what we are looking for on a daily basis and this doesn’t seem to bother us, yet the thought of a human equivalent that might approach us while we are on holiday forms the basis of where we will not be going on holiday.

Category: Yapay Zeka

Protected: yapay zeka

The Best Open Source Machine Learning Frameworks

GPDR and pushy sales people on holiday