MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Thursday, 02 May 2024

Apple may teach an AI system to make Siri perform tasks for you

The Apple research paper describes how the company has been developing Ferret-UI, a generative Al system that focuses on making sense of app screens

Mathures Paul Published 11.04.24, 10:44 AM
File picture of Apple Siri being used on an iPhone.  

File picture of Apple Siri being used on an iPhone.   The Telegraph

Apple’s new Ferret-UI multimodal large language model may help artificial intelligence systems better understand smartphone screens, like on the iPhone, according to a research paper released this week. The Apple research paper describes how the company has been developing Ferret-UI, a generative Al system that focuses on making sense of app screens.

The paper is vague about the potential applications of this but the most exciting possibility would be to power a much more advanced Siri voice assistant. Titled ‘Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs’, it discusses MLLMs that are similar to text-based large language models behind the likes of ChatGPT, but also include images, audio and video.

ADVERTISEMENT

“Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (for example icons, texts) than natural images, we incorporate any resolution’ on top of Ferret to magnify details and leverage enhanced visual features. Specifically, each screen is divided into two sub-images based on the original aspect ratio (that is horizontal division for portrait screens and vertical division for landscape screens). Both sub-images are encoded separately before being sent to LLs. We meticulously gather training samples from an extensive range of elementary UI tasks, such as icon recognition, find text, and widget listing. These samples are formatted for instruction-following with region annotations to facilitate precise referring and grounding,” the study mentions.

The AI system can be useful tool for evaluating the effectiveness of a UI. A developer could create a draft version of an app, then let Ferret-UI determine how easy or difficult it is to understand, and to use. Second, it could have accessibility applications. Rather than a simple screen-reader reading everything on an iPhone screen to a blind person, for example, it summarises what the screen shows, and lists the options available. The user could then tell iOS what they want to do, and let the system do it for them.

Follow us on:
ADVERTISEMENT