By leveraging AI, machine vision and spatial audio, ARIA Research is developing advanced augmented reality glasses that could be a boon for the visually impaired.
Not so long ago, technology entrepreneur Rob Yearsley’s priorities changed dramatically when his son was diagnosed with a lifelong disability.
“We got [an] insight into the world of managing disability as a family, and a visceral understanding of how that impacts your ability to function,” he said.
At the time, Yearsley was exploring the development of augmented reality (AR) glasses – a technology he describes as “a solution in search of a problem” that hadn’t found its market niche.
“From an engineering perspective, one of the central problems is the display,” he said. “The visual system inside AR glasses is very bulky, takes a lot of energy, and has to have a big battery.”
This prompted Yearsley to reconsider the product without the display, essentially transforming it into an AI-based vision computer, and to consider who would find such a device useful.
“I explored a few ideas, including developing glasses for special operations by the US military and firefighting,” he said.
In what he described as a “bolt from the blue”, Yearsley realised who his target market needed to be – the visually impaired.
“Once we made that leap, we quickly discovered human expert echolocation,” he said.
This skill was pioneered by Daniel Kish – who, after losing both eyes to cancer before the age of two, learned to navigate surroundings by clicking his tongue to emit sound pulses and interpreting the echoes – which is accurate enough to ride a bike and avoid collisions.
Through discussions with Kish about enhancing echolocation, Yearsley discovered improvements could be made to the ease of use and accuracy of the technique using technology. This formed the genesis of ARIA Research, co-founded by Yearsley and Mark Harrison, which develops advanced AR glasses by leveraging AI, machine vision and spatial audio.
Turning visuals into sound
The idea behind ARIA was to map visual spatial perception into a format that someone without vision could understand through sound.
ARIA’s system uses machine vision to map the environment, build a three-dimensional virtual model, immerse the user in that computer-generated space, and render the sound of objects in their actual locations, in real time.
The ARIA glasses process information rapidly, between 20 and 40 times a second, creating a consistent illusion of object locations even during rapid movement.
Instead of using power-hungry lidar for spatial accuracy, ARIA uses VI-SLAM (visual-inertial simultaneous localisation and mapping), camera-based solution with stereoscopic depth perception, incorporating human movement.
“The 3D model is also created with the user’s head pose, tracking their head and space in real time to maintain a relational understanding between the user and surrounding objects, to enable the glasses to deliver its audio first-person perspective of the user’s environment.” Yearsley said.
Human perception relies heavily on movement and parallax – the difference between fixed and moving objects in space.
“We used these phenomena to help the user develop a mental model of where things are in space,” Yearsley said. “The spatial audio engine processes this information through binaural audio speakers in the glasses, maintaining a consistent illusion even during movement.”
Humans excel at lateral hearing within around five degrees of arc, allowing accurate sound localisation. However, we struggle as a species with interpreting distance and height in sound.
To address this, ARIA provides volume and echo cues for distance and describes objects through sound to indicate their height, and subtly modulates the objects’ sounds to intuitively, but explicitly indicate their azimuths and heights.
Overcoming design challenges
As Yearsley knows all too well, creating AR glasses requiring specific high-powered processing is no mean feat.
The glasses need to perform around five trillion operations per second while maintaining a power budget of approximately two watts, within a device weighing less than 70 grams. They also need to efficiently conduct simultaneous location and mapping.
“Our first proof of concept system ran for 35 minutes, and we’re currently up to about four hours in our current prototype,” Yearsley said. “Our end goal is an all-day device, and we’ve made significant progress in providing persistent spatial perception throughout the day in a very small form factor.”
Navigating these hurdles required a lot of creative thinking.
“Most machine vision is developed out of robotics, which assumes you need perfect data for algorithms to make good decisions,” he said. “Our use case is different because you don’t need perfect data; there’s a highly intelligent human in the middle who interprets spatial cues outside of the glasses.”
Users already process the natural soundscape around them, so it’s a question of providing enough data to fill in the gaps; portraying in sound the objects that a sighted person (and ARIA’s cameras) would see, but that are silent.
“This is a very different approach to traditional machine vision, where there’s more data and accuracy, and sharper algorithms,” Yearsley said. “ARIA’s model is the inverse, doing as little as possible while still providing complete perception.”
Putting safety first
As ARIA smart glasses are being developed as a medical device, safety is paramount.
Machine vision systems are only as good as their training data, so ARIA is launching the device as a secondary aid, supplementing primary safety tools such as canes and guide dogs.
The goal is for the technology to eventually complement or substitute these aids as the machine vision matures, making it reliable enough to take on the primary safety for a visually impaired person and their perception.
Key to this vision is considering vision impairment as an information access problem rather than a disability.
“Nearly 90 per cent of blind people rely on other humans as their primary mobility aid, with only 6–7 per cent using a guide dog and 10–11 per cent using a cane,” Yearsley said.
Having access to information about surrounding objects in space will be life changing and empowering. With ARIA, vision impaired people have much the same perceptive information as sighted people, enabling greater agency, autonomy, and independence, and levelling the social playing field vis-a-vis people with unimpaired eyesight.
Ethically expanding the model
As users wear ARIA glasses in diverse environments, the model will continuously improve, with its ability to go into more detail always being a moving target.
“The bar for a great experience is always going to increase as people’s perception of the world – and their access to it – opens up,” Yearsley said.
However, as the camera consistently captures information and interprets it via AI, this raises privacy concerns.
To preserve user privacy, processing is encrypted and coded on the device, with data ownership remaining with the individual. Any data used to improve the model is strictly opt-in.
“It’s easy for these things to erode, but if you bake that design into the engineering itself, it gets a lot harder – and we made that decision early to go in that direction.”
Making easy-to-learn sounds
To ensure the sound design is intuitive, ARIA started with simple, real-world interactions. In an initial test, a visually impaired user heard the “tink” of a glass of water and reached out to pick it up.
Following neuro-linguistic programming principles, ARIA’s sound design creates strong physical associations when helping users process new information.
“The illusion is complete – the ARIA-created ‘glass’ sound sounds like a glass would in the real world – so in the user’s mind, it becomes real,” Yearsley said.
Early successes build user confidence, allowing them to expand their interactions incrementally.
“They might then hear a door handle, be able to walk around to find it and get out of the room.”
Bringing humour and joy into the sound design is critical too.
“We’ve discovered that it’s much easier to help someone learn a new sound if it’s fun,” Yearsley said.
For example, because they eat up money. ARIA sonically represents clocks as the iconic “cuckoo clock” sound.
“People found that amusing and memorable,” he said. “So there’s these little opportunities to portray the world in an interesting and textured way.”
People who are visually impaired make trade offs every day, considering if it’s worth their while to go outside in an environment they are unfamiliar with to get something done.
“The answer is often no, because it’s just too hard, leading to a high incidence of depression and social isolation. So if we can remove the barriers around that information access, we can have an impact on all these other related comorbidities that come from not being able to perceive information.”
Absolutely love it thank you for sharing 👍