Soon your smartphone camera will be supercharged with artificial intelligence (AI). Deep learning in particular, and AI more generally, will be fundamental to how we use and share photos and videos taken from smartphone cameras in the near future. Today, virtually all the AI and deep learning algorithms sit in the cloud, with Google Photos and iPhoto sorting and tagging photos, with the capability of identifying photos of flowers, beaches, or the sunsets by simply processing image pixels. The algorithms have recently experienced impressive performance gains with each year seeing university and corporate research teams competing with each other in terms of detecting and classifying images and reducing error rates. With the training and processing of the photos done in the cloud, the mobile phone is simply a device for capturing the image. However, this is likely to change with chipmakers such as Qualcomm and device makers like Apple pushing for on-device processing of AI algorithms. Qualcomm has talked about its Snapdragon 820 processors being used to enable deep learning algorithms using the company’s Zeroth Platform. Qualcomm’s Snapdragon Scene Detect technology allows the phone to recognize images and objects, tag, and classify them. Rather than have the algorithms in the cloud, the algorithms are present on the device and can be trained locally.
At WWDC 2016, Apple announced the availability of neural network libraries that can be accessed through application programming interfaces (APIs). The primary application for developers is image recognition, with Apple having pre-trained these algorithms offline and ported them onto these libraries. We will have to wait and see what developers come up with, but with AI processing being done on the device, Apple is promoting a new local AI paradigm, which is largely governed by its belief in maintaining user privacy.
With both device and chipset makers moving toward on-device AI, computer vision and image recognition are the first and most viable applications. It has also been suggested that, by the year 2020, RGB images will carry neural network “deep interpretation vectors”. Images are likely to have these vectors saved alongside the image, which will be generated by the algorithms and will also be used by them for training, or will simply be shared through APIs with other apps and services. This means that phones will possess the uncanny ability to tell you what you have just captured on your smartphone camera and put that to good use. This functionality has powerful implications for context-based services, which will be able to link images and videos captured on the phone to other applications. Also, network and cloud connectivity is not necessary for performing powerful image recognition and classification. This could also have big implications for wearable cameras, both action cameras and lifelogging cameras that can extract useful information from pixels and learn more about you, as well as your likes and dislikes, from the images and videos that you capture.
In 10 years, I believe it will become commonplace to have a local AI-based virtual digital assistant (VDA) that will be able to suggest further actions based on the images and videos taken on your phone. This could be a suggestion about places to eat, the weather forecast, the quality of the image itself, similar products for purchase, places of interest, and many other applications and services that could be untapped. I also see healthcare applications, some of which have already become reality such as skin cancer detectors, which will allow diagnosing of medical conditions by simply taking selfies. All of this will become possible because of powerful AI algorithms running locally on the device, which will instantly be able to extract data and information from your smartphone camera, and then share it through APIs with other applications and services. Your smartphone camera is about to get new superpowers with computer vision, deep learning, and AI helping accelerate context-based services for mobile.