Scientists at the Massachusetts Institute of Technology have stumbled upon an interesting problem with machine learning and image classification. This problem, if not solved, could be harmless or could be deadly, depending on what the system is being used for. Simply put, a model could look at an image and make a prediction based on information that we humans can’t make sense of, and it could be wrong.
Image classification is used in both medical diagnostics and autonomous driving. The aim is to train a neural network to understand an image in a similar way that a human does. MIT explained all of this in a blog post which started by pointing out just how much we don’t know in regards to how neural networks make their decisions.
Yes, the decision-making process itself is unknown. What is known is that they can be trained to learn. It’s the how they learn that isn’t really known.
Although this isn’t new, the problem that MIT scientists have identified and called “overinterpretation” is new. It’s an issue that could affect both medical diagnostics and autonomous driving. Overinterpretation is simply an algorithm making a “confident” prediction based on details it sees that we humans can’t understand, creating a prediction it shouldn’t.
For example, it could see something in a patterned background that we can’t see or can’t understand and take that into account for a decision it makes. It doesn’t make sense to us, but for whatever reason, the computer sees and uses it to make a prediction or decision.
MIT noted that this subtle type of failure is something that an AI model trying to classify (or identify) an image could encounter, and it could be problematic for situations involving high-stakes environments.
This worries the researchers because 1) situations like medical diagnostics for diseases that need immediate attention or 2) split-second decisions for self-driving cars could be affected by this.
Autonomous vehicles rely on systems that are able to accurately understand their surroundings almost immediately while making quick and safe decisions. The system uses specific backgrounds, edges, or even patterns in the sky to determine whether an object is a traffic light, a street sign, or something else. What worried the MIT scientists is that they found that neural networks trained on datasets such as CIFAR-10 and ImageNet are suffering from overinterpretation. [Editor’s note: I have actually had a problem that probably falls into this category in my Tesla Model 3. It sometimes sees a reflection on a specific traffic light that it interprets as the light turning from red to green due to the angle, proximity, and lighting of the traffic light. —Zach]
They noted that models trained on CIFAR-10 were making confident predictions even though 95% of the input images were missing and the images that were there were senseless to humans. Brandon Carter, who is an MIT Computer Science and Artificial Intelligence Laboratory Ph.D. student and the lead author on a paper about the research, explained the problem in further detail:
“Overinterpretation is a dataset problem that’s caused by these nonsensical signals in datasets. Not only are these high-confidence images unrecognizable, but they contain less than 10 percent of the original image in unimportant areas, such as borders. We found that these images were meaningless to humans, yet models can still classify them with high confidence.”
An example cited is an app that tells you whether or not something is a hot dog. Actually, a better example, in my opinion, is those apps that tell you what type of plant you have based on an image of the plant or a leaf from the plant. I came across this scenario when someone gave me a shrimp plant that hadn’t yet started blooming. When using those plant apps, they identified the plant as poison oak.
The system processes individual pixels from several pre-labeled images for the network to learn. In those apps, they are supposed to be able to identify the plant (or hot dog as the MIT blog post used) based on pixels. A key challenge is that machine-learning models can latch onto these pixels, or what we perceive to be senseless subtle signals, and image classifiers trained on datasets such as Imagenet can then make what seem like reliable predictions based on what they “see.”
Regarding my plant, I downloaded 2–3 of those apps, and after several photos, I had various answers, including the correct one. Although that was pretty funny, it wouldn’t funny if an AI in an autonomous vehicle was to mistake a person wearing a green shirt crossing the street for a green light. Although that scenario is a bit of a stretch at the moment, given some of the advances Tesla has made in AI, something like it could happen. Never say never.
Carter noted that this brings up an important question as to how datasets can be modified to be trained to mimic how a human would think about classifying images.
“There’s the question of how we can modify the datasets in a way that would enable models to be trained to more closely mimic how a human would think about classifying images, and therefore, hopefully, generalize better in these real-world scenarios, like autonomous driving and medical diagnosis, so that the models don’t have this nonsensical behavior.”
For now, the overinterpretation is happening with pictures extracted from public domains and then classified. However, in the case of Tesla as an example, it has hundreds or thousands of people working every day to identify images the cars see and classify/label them correctly. “While it may seem that the model is the likely culprit here, the datasets are more likely to blame,” MIT aptly notes.
Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here.
Latest CleanTechnica TV Video
CleanTechnica uses affiliate links. See our policy here.