Artificial Intelligence Struggles to Describe Images for Disabled Internet Users

2021-05-21
Thomas
Thomas Smith
Community Voice

https://img.particlenews.com/image.php?url=1zmboB_0a6qcowW00
An example of an image which is challenging to describe with AIPhoto by Eye for Ebony on Unsplash

Yesterday was Global Accessibility Awareness Day (GAAD), a time to celebrate and explore the many ways users and publishers can make the Internet more accessible to users with disabilities.

Image alt text is a key element of web accessibility. About 7 million people in the United States have a visual disability, which can make it challenging to navigate an Internet which has become increasingly driven by graphics and videos. Many of these users access websites via screen readers, special software programs which transform webpages into audio, allowing visually impaired users to navigate and interact with them.

Screen reader technology has improved dramatically as the Internet has matured, and many modern web standards increase usability dramatically for visually impaired website visitors. But in many cases, screen readers still have a literal blind spot: images.

Making images accessible is a major challenge, but an incredibly important one for photographers, image licensors, publishers and web developers alike. The main technology for increasing images’ accessibility is alt text. Alt text is written text which can be attached to an image online, and often describes the visual contents of the image. It can be embedded into the metadata of the image itself, embedded into the webpage in which the image is displayed, or both.

When screen reader software finds an image which has alt text, it can read the alt text aloud, which gives the user a sense for what the image depicts, even if they can’t see its visual content. Most web platforms allow designers to add alt text to their images, and social media platforms increasingly support alt text, too.

Alt text is a good solution from a technical perspective. But in practice, alt text has to be well-written in order to be informative and useful. Writing alt text well is more challenging than you might expect — especially for images which depict people, events or complex topics. Take, for example, the image at the top of this article. How would you describe it? Maybe you would say “A smiling African-American woman wearing purple lipstick and a purple scarf.” At first glance, that seems like a pretty good description of the image.

But pause for a moment and think about each part. You’ve presumably never met the person in the image. How can you know that the person is African-American? Maybe the photo was taken outside the United States. Maybe they have no connection to America at all. And for that matter, how do you know that the person depicted is a woman? You can’t look at a person and know their race or gender. Maybe the person in the image identifies as having a non-binary gender, or another gender identity.

Given how many images there are online — and the pressing challenges of accessing the web for visually impaired users — many companies have turned to Artificial Intelligence in order to automatically add alt text to image. That sounds like a good idea, but in practice it rarely works well. If a human struggles to accurately describe an image like the one at the top of this article, than a machine will almost certainly fare worse. Sometimes much worse — AI is notoriously biased, and these biases can alter the alt text that computers generate. Sometimes the results are offensive, but more often they’re inaccurate, or just overly generic.

You can see an example of this if you use Powerpoint or another Microsoft product, as Microsoft was early to the AI alt text party. Add an image to your Powerpoint presentation, and then right click on it and select “Edit Alt Text”. Powerpoint will pre-populate the image’s alt text field with an automatically generated description. For the image at the top of this article, it generates the text: “A picture containing person, outdoor, grass.”

On the one hand, that’s somewhat useful. The image does indeed include those attributes, and Powerpoint is smart to avoid making assumptions about the person’s race or gender. But a lot is missing from that description. The image’s emotional content, for example, is totally absent. The image shows joy, or at least excitement. That’s totally missing from the AI alt text. So, too, is any description of the person’s clothing or fashion choices, which appear to be a critical element of this specific image. AI-generated alt text can give a vague sense for an image’s content, but the deeper elements of its composition and meaning are often lost to AI algorithms.

In response to this, companies are taking two different approaches. One approach is to improve AI alt text, often by involving people in its creation. That’s the strategy applied by companies like CloudSight. The company is attempting to train AI to actually understand the content of an image and generate accurate descriptions based on that understanding, not to simply tag objects that it sees in the image. The company also specializes in hybrid recognition, which combines humans and machines for more accurate descriptions.

Another approach is that employed by Scribely, a company which specializes in super-high-quality, human-written alt text. Scribely’s founder Caroline Desrosiers told me that the company is organized as a “tribe” of specialized writers who band together to offer services to brands, photographers, artists and other users who need well-researched, carefully written alt text for their content. Scribely specializes in generating alt text which captures the content and feel of images, making them come alive — even when described verbally in text read aloud through a screen reader.

For an Instagram image of the company’s founder, for example, the Scribely team wrote the alt text description: “Scribely CEO Caroline Desrosiers lounges in a wooden Adirondack chair with a cup of coffee and her laptop. A thermos rests next to her on top of a tree stump. Behind her, the lush and overgrown foliage of Sonoma Valley.” That’s a hell of a lot richer than alt text written by a machine, and generates a much clearer mental picture of what was depicted in the original image.

In fact, try this experiment. Read that description again, and be mindful of the mental picture it conjures up for you. Then, go to the original Instagram post and take a look at the real image. It’s a good bet that your mental picture and the actual image had a lot in common — or at least far more than you’d get from a AI description like “person, chair, trees”.

Now imagine that for every image you encounter online, you’re only able to have a textual description; that’s what the actual experience of surfing the web is like for the millions of Americans who use screen readers. Wouldn’t you much prefer to have a detailed, human-written description than those generated by a computer which randomly labels objects without understanding the image’s context and emotional feel?

Writing human-generated alt text, of course, is expensive — at least relative to the cheap step of running an image through an AI program, which usually costs fractions of a penny. Especially for publishers and other companies who sell content, though, good alt text can actually be a source of major profits, not just a cost. Most search engines view web pages through a system which is similar to a screen reader. Like a blind user, search engine bots can’t see images — they have to rely on textual descriptions, either those in the image’s alt text or descriptions generated by their own AI.

Feeding a search engine a well-written piece of alt text gives it a much better sense of what’s present visually on a webpage, and can lead to higher rankings and more earnings from a company’s content. In their list of Search Engine Optimization best practices, Google includes alt text and tells developers to “focus on creating useful, information-rich content that uses keywords appropriately and is in context of the content of the page” when writing it. Desrosiers told me that many of her clients come to Scribely primarily to improve their search rankings, and the accessibility benefits become an added bonus. According to Desrosiers, improving accessibility is still Scribely’s primary reason for existing, but she loves the fact that alt text aligns accessibility and profits in such a positive way.

Especially if you work in the content industry, take a moment this Global Accessibility Awareness Day to develop a deeper understanding of alt text, and then apply what you learn at your own company, doing your part to make the web more accessible.

Even if you’re a user and not a platform creator, you can help too. When uploading content to a social media platform like Instagram — or a website like Yelp or Google Local — take a moment to write a full description of your images whenever the platform gives you the chance. You know your images better than anyone else — by writing your own description, you’re often able to prevent the platform from relying only on AI to describe your image’s content, and that can do wonders to make your content more accessible for those with visual impairments.

This is third-party content from NewsBreak’s Contributor Program. Join today to publish and share your own content.

Thomas
36.2k Followers
Thomas Smith
Award-winning entrepreneur, and the co-founder and CEO of Gado Images. Thomas writes, speaks and consults about artificial intellig...