Pattern recognition is identified as a key human skill that has supported the rise of people to become the dominant species.
However, there are limits to this crucial ability, especially when confronted with masses of information that differ only slightly. Small, but significant, variations are more easily recognised by machines that can minutely inspect and compare differences without fatigue and with low margins of error.
Humans are, thus, using machines to augment their pattern-recognition capability by teaching machines how to recognise patterns and correlate seemingly disparate data to gain new insights.
“Specialised machine-learning algorithms are used to evaluate large quantities of data and derive and/or exploit relationships in the data,” says IBM Watson Advanced Cognitive Technology and Solutions data scientist Stefan van der Stockt.
“The basic idea behind machine learning is that we want to learn relationships and corelationships between the different elements of the data, whether it be recognising a face or identifying a potentially cancerous lesion on an X-ray image,” says Council for Scientific and Industrial Research (CSIR) Mobile Intelligent Autonomous Systems unit principal researcher Dr Benjamin Rosman.
Machine-learning algorithms are designed to determine which features best describe the data and thereby extract latent patterns, adds CSIR Mobile Intelligent Autonomous Systems unit data science senior researcher Nyalleng Moorosi.
“Pattern-recognition algorithms are typically used for predictive analytics. The first phase is to determine the patterns and then, by projecting the patterns onto other similar data, predict the behaviour of the data,” she says.
Two major categories of machine learning algorithms are classification and clustering. Classification involves the program assigning data to specific categories, while clustering involves the program grouping data based on the similarity of characteristics. Both types are used to determine features and relationships in the data.
Machine learning diverges into supervised and unsupervised learning. Unsupervised machine learning uses clustering, while supervised machine learning relies on human knowledge and classification, Moorosi adds.
Expert Foundations
Supervised learning involves building expert models using hand-labelled examples to process data and then iteratively refining the accuracy of the models, says Van der Stockt.
Supervised machine learning involves training the machine learning algorithm to recognise specific data characteristics and patterns, elaborates Moorosi. The main use is to solve for a known pattern or output.
Supervised systems are typically given large amounts of data as part of the testing and training phase to refine the model and improve accuracy. This method of testing data against hypotheses, assessing the output and refining the model is what painstaking scientific research involves every day.
“Machine learning systems provide the capabilities to do this at a much more massive scale and rapid pace than has been possible without detracting from robust scientific practices,” explains Rosman.
A complex use of supervised machine learning systems is to support medical professionals and experts.
By training the machine learning model’s accuracy on hundreds or thousands of X-ray images of known cancerous lesions, the system will be able to develop a highly refined model of the characteristics of cancerous lesions in X-ray images, says Rosman.
“It can then compare new X-ray images of lesions against the model and provide an estimation of how strongly the new images correlate to images of known cancerous lesions, and thereby help an oncologist to make an informed and medically sound decision.”
The IBM Watson for Oncology cognitive system is a pretrained decision support tool for oncologists, highlights IBM Watson Platform sales leader for IBM Middle East & Africa Andrew Quixley.
“Watson for Oncology is trained on millions of pages of relevant information, including the most recently published papers and journals. This helps oncologists to ensure that their diagnosis of the patient’s condition and the treatment recommendations reflect the entire body of domain knowledge, as well as the specific information about the patient,” he explains.
“Machine learning systems produce probabilities based on data associations and characteristics that can help humans to make more accurate, data-driven decisions. It helps to reduce the uncertainty of decisions by basing the decisions on large volumes of data,” says analytics company SAS Data Management business solutions manager Aneshan Ramloo.
While machines can make decisions faster than humans, the analytics models and machine learning systems serve to support strategic decision-making and must still be deployed and used according to a business’s strategy and in accordance with legal and regulatory requirements, he says.
“In healthcare, machine learning is used to improve diagnostics and treatments of patients – thereby improving patient out- comes – and to manage pharmaceutical supply chains, as well as for drug discovery research. The healthcare sector is a good example of how broadly applicable machine learning and analytics systems are,” says Ramloo.
Skilled User
Supervised and unsupervised methods depend heavily on analytics experience, as most of the work involves preparing, processing and analysing data, notes Van der Stockt.
The use of machine learning systems and analytics by businesses requires human resources with a broad range of fundamental and softer skills, ranging from a thorough understanding of data structural and stochastic principles to social skills and team participation, highlights University of Pretoria Department of Statistics senior lecturer Dr Frans Kanfer.
The growth in available data sources including unstructured data sources, increased parallel computing power and the lower cost of distributed data storage adds value only if the data are effectively analysed, he explains.
Tertiary machine learning courses focus on the use of these systems as scientific and business tools and are, therefore, cross- disciplinary courses. Formal training in or combinations between statistics, computer science, informatics, mathematics, electronic and computer engineering are required data science training components, he says.
“Courses currently available are intensive courses at master’s level and are designed to address the business case for and effective use of machine learning and analytics systems. General information about programming, statistics and computer science are included so that users are comfortable manipulating data to build predictive models and, thereby, effectively interrogating the data and deriving the expected value from machine learning and analytics systems,” he says.
“Developing a supervised machine learning system requires a highly skilled expert who knows the expected output of the algorithm and a skilled algorithm writer knowledgeable in the mathematical principles underlying the algorithms,” emphasises Rosman.
To develop the algorithms, it is important to have a deep understanding of mathe- matical and statistical principles. It is also important to understand the trade-offs between, for example, the speed at which an actionable output can be provided (usually inverse to the number of data characteristics the model assesses and the complexity of the model) and the accuracy of the output, which typically improves as more variables are included, but takes longer to process, he adds.
Humachine Experts
Data scientists must have domain knowledge. A deep understanding of a business, industry or technical topic is necessary to effectively leverage machine-learning systems, avers Ramloo.
South African machine learning start-up Snode believes that the most effective use of machine-learning systems is to augment the capabilities of people.
“People can do many things that machines cannot, but, conversely, cannot do many things that machines can easily do. Machine-learning systems can, therefore, add significantly to a professional’s capabilities. We believe that this is the best way to apply these systems,” says Snode CIO Nithen Naidoo.
Intelligence amplification, as Snode terms its machine-learning-based systems, requires three specific supporting elements: a fusion of data from many different data sources; interactive data visualisation systems, which also serve as human-machine interfaces; and machine-assisted analytics.
“Machine learning is not the answer to everything and, from our perspective, is an advanced form of statistical analysis, albeit in a scalable, useable format,” Naidoo states.
An effective hybrid human-machine system will provide the most value – for example by providing the resources and information to help an inexpert user perform to minimum standards or helping experts use their detailed knowledge to add more value to more processes and, thereby, improve their productive capabilities and strategic roles, he says.
Data Silhouette
A powerful use of machine-learning systems is to help researchers categorise and navigate vast amounts of data, as they enable them to create automated ways of detecting patterns and correlations in data, avers Rosman.
Unsupervised learning algorithms aim to automatically discover new insights into the data and present these insights to humans, confirms Van der Stockt.
Moorosi avers that machine-learning systems should not be restricted unnecessarily because they function best when given access to all available datasets to effectively fulfil their purpose of extracting patterns not recognised by people.
“Machine-learning tools and the patterns they identify provide a ‘silhouette’ in the data to direct further investigation. Machine learning is, therefore, a powerful tool for discovery. This is typically what is meant by data mining,” avers Van der Stockt.
Additionally, these systems can also help users identify voids in the data and, thereby, help them to determine what additional data they require to make an algorithm more accurate and effective, he highlights.
Artificial intelligence systems – complex ensembles of supervised and unsupervised learning algorithms – are helping medical researchers to discover potentially effective new molecules and analyse genomic data.
IBM Watson for Drug Discovery is helping researchers at Barrow Neurological Institute, in the US, identify new targets for amyotrophic lateral sclerosis research. Watson can accelerate the identification of novel drug candidates and novel drug targets by harnessing the potential of Big Data.
Similarly, researchers at the New York Genome Center (NYGC), Rockefeller University and NYGC member institutions in July completed a proof of concept study that illustrated the potential of IBM Watson for Genomics to analyse complex genomic data from DNA sequencing of whole genomes.
The study also showed that whole genome sequencing identified more clinically actionable mutations than the current standard of examining a limited subset of genes, known as a targeted panel. Whole genome sequencing requires significant manual analysis; therefore, artificial intelligence can help doctors identify potential therapies from whole genome sequencing for more patients in less time.
Edited by: Martin Zhuwakinyu
Creamer Media Senior Deputy Editor
EMAIL THIS ARTICLE SAVE THIS ARTICLE
ARTICLE ENQUIRY
To subscribe email subscriptions@creamermedia.co.za or click here
To advertise email advertising@creamermedia.co.za or click here