Artificial intelligence has become mainstream since a few years and has generated a lot of interests from the governments and private organisations. Several governments have created ministerial positions for the subject, and militaries around the world are preparing for an AI based weapons race. The hype around AI and its anticipated impact is at par with the industrial revolution.
In this article, we will try to dig down to the basics and explore why artificial intelligence has become mainstream only now, and the promise it holds.
The idea behind machine learning was first suggested by Alan Turing in 1950 in his paper “Computing Machinery and Intelligence“, where he explores if a computer can do a task indistinguishably from a human consistently. This idea is commonly known as the Turing’s test or Imitation Game. Turing also explored how such a “learning machine” might be created, and predicted that through technological advances, a machine will be programmed by the end of the century which will be able to play the imitation game. He also suggested that problems like playing chess and understanding natural language are good areas to start.
Early work in the area of intelligent machines adopted one of two broad approaches: rule based, and statistical.
Some researchers worked on heuristics based systems and believed that human behavior or intelligence could be effectively captured through a set of elaborate rules. This approach required building exhaustive grammars and formal systems to encompass a problem domain. Expert systems which saw a rise in the 1970s and 80s were part of this approach.
Other researchers leaned towards a statistical approach and leveraged several principles of statistics and mathematics to create machines which seemed to “learn”. Early efforts were focused more on pattern classification, and with time several sophisticated tools were developed which were used to solve problems in many domains.
The statistics approach required considerable data for experimentation and learning. This approach also required a lot of computing resources. These limitations resulted in a slow pace of research in this area until the 1990s.
In parallel to these efforts, some researchers were also working on artificial neural networks, which they designed on the likes of a human mind. Early research in neural networks was focused on mimicking the human brain and learning patterns. Later the research deviated from biological models and more towards solving domain specific problems.
Early neural networks were simple and consisted of a few artificial neurons connected to each other. These neurons receive input signals which are assigned weights. The sum of the weighted signals determines whether the neuron passes the signal onwards or not. When combined with mathematical optimization techniques, each neuron progressively learns to assign better weight values to the input signals. When these neurons are arranged in layers, each layer incrementally learns more abstract ideas.
Like the statistical approach to machine learning, the research on neural networks was slow until the 1990s due to lack of computing resources and data for learning.
By the 1990s, the researchers who believed in rule based systems were starting to hit limitations. In real world systems, the information is not always certain and a lot of determinations are subjective. In areas such as speech recognition, statistical models such as Hidden Markov Models were already gaining success. Some limited image processing tasks such as recognising handwritten ZIP codes were being done with neural networks, but their training required a long time and a lot of computing resources.
However, a few technological and scientific breakthroughs lead to greater adoption and progress in neural networks over the next two decades. The first obvious factor was improvement in capability of computers along with their cost becoming much more affordable. Another factor was that more and more data was being digitized especially with the widespread use of the internet. This made the challenge of availability of training data much easier – at least for the large internet companies.
The deep learning revolution started around the year 2010. In 2009, the researchers at Google attempted to use an Nvidia GPU for training a neural network. They found out that GPUs were about a 100 times faster than CPUs for training neural networks. While neural networks with several layers (deep neural networks) were around for a long time, now it was actually feasible to train them over large data sets.
In the early 2010s, the researchers found that using lower quality data for training neural networks doesn’t hurt the ability of the trained model to make accurate predictions. The reason is that deep neural networks are very good at handling uncertainty. So instead of using high resolution images to train image classification models, and using segmented speech data (manually separated words or phonemes) to train speech recognition models; suddenly the researchers and engineers were using low quality images, and noisy, unsegmented speech data to train deep learning models with much higher accuracies.
The essence of end to end deep learning is that a deep neural network trained on GPUs over large size of low quality data with a lot of connected layers can learn patterns very effectively from low quality, raw data. There is very little need for pre-processing or identifying features of the data through usual scientific means; the deep neural network will do that itself.
Images courtesy Christopher Manning & Russ Salakhutdinov, JSM, 2018-07-29
With these revolutionary breakthroughs, deep learning has been used to obtain much better results in solving old problems. Some of the use-cases such as image classifications have surpassed human performance as well.
What you think can be done by humans can be done by deep learning given you narrow down the problem statement and collect enough data.
While deep learning holds a lot of promise, it doesn’t come without its challenges. Deep learning works well only when you have a lot of annotated data available for training. Many times people who actually have access to such data are not aware of the power it can unleash. Management of such datasets and their training also requires a lot of computing power. Once you are aware of the power of deep learning and have an actual use-case with commercial value, you will start to see the possibilities.
Imroz has worked on some very interesting problems using deep learning. We have built a facial recognition system which gives close to 100% accuracy. We have also worked on a wildlife detection system. We are currently working on an advanced use case in bio-medical imaging and a neural-networks based disease prediction system. We have worked hard to obtain skills and develop vision which is required to build successful products using deep learning; and we would like to work with people who want to change the world.