* Please refer to the English Version as our Official Version.
Imagine you are editing a video on your smartphone and need to add appropriate sound effects to it; Or you want to generate custom sounds for setting ringtones, alarms, or posting on social media. You don't need to search or purchase audio clips online, just enter a description, such as' gentle waves at sunset ', and after a few seconds, your device will generate the appropriate sound for you, even without the need for internet connection. Thanks to the new collaboration between Arm and Stability AI, this technology that directly generates audio on the end side has become a reality.
Arm collaborates with Stability AI to accelerate the response speed of text to audio conversion
Stability AI is a company dedicated to the development of artificial intelligence (AI) models in the fields of image, video, 3D, and audio. And Arm KleidiAI can provide optimized performance critical routines (i.e. microkernels) specifically designed for Arm CPUs. The integration of KleidiAI with XNNPack library and ExecutuTorch framework, as well as the optimization of Stability AI itself, has brought significant AI performance improvements to the text to audio open model "Stable Audio Open" of Stability AI.
The stunning results include a significant reduction in AI generation time for text to audio from minutes to seconds, and a 30 fold increase in response speed. The Stable Audio Open model runs entirely on Arm CPU based smartphones without the need for networking, making it a pioneering move for text to audio AI.
Stability AI utilizes KleidiAI's automatic acceleration function to accelerate the response speed of the model, thereby improving the performance of end-to-end AI without affecting quality. The performance improvement brought by KleidiAI eliminates the need for Stable Audio Open model users to invest additional development effort, saving time and costs. Arm and Stability AI will continue to collaborate to achieve more performance leaps and bring better AI user experiences.
The significant performance improvement indicates that targeted hardware and software integration has made AI applications that were previously unattainable feasible on mobile devices, thereby driving future innovation opportunities. Arm technology drives 99% of smartphones worldwide, which means billions of smartphone users can now access advanced AI audio features.
Jointly tackle complex AI challenges
The Stable Audio Open model has excellent efficiency, but it is still not easy to run the model directly from the end side on the CPU of a smartphone. In the initial attempt, the generation time of a single audio sample exceeds four minutes, which is not very acceptable for end users.
By collaborating with Arm, Stability AI distills the training parameters of the model to a scale suitable for mobile devices. Then, using a new distillation model and leveraging the KleidiAI performance acceleration brought by the integration of XNNPack and ExecutuTorch, audio clips were generated within seconds on mobile Arm CPUs.
Prem Akkaraju, CEO of Stability AI, said, "As more and more professional creative workers and businesses adopt generative AI to help improve their production processes, our models and workflows must be readily available for builders and creators to use, which is crucial. We are pleased to collaborate with Arm on this matter. The Arm platform is widely used throughout the ecosystem, from servers to smartphones, and Arm is committed to accelerating AI models in various mainstream frameworks by integrating Arm Kleidi into the software stack. Therefore, Arm is our top choice
The Rise of Text to Audio AI
Since 2022, Stability AI has always been at the forefront of the development of generative AI, causing a sensation with its industry-leading image model, Stable Diffusion. Building on the success of Stable Diffusion, the company subsequently launched Stable Audio, one of the first fully licensed audio models designed to generate high-quality music and sound effects through text prompts. These AI models rank among the top on major platforms such as Hugging Face, with millions of users, forming an active technology community.
Everyone can enjoy advanced audio AI experience
This achievement is just the beginning of the cooperation between the two parties. Arm and Stability AI have planned more performance optimization measures aimed at bringing users a better user experience. By working together, Arm is laying the foundation for end-to-end AI in the fields of audio, image, video, and 3D, reshaping the way everyone creates content and interacts with digital media. By distilling advanced models and deploying optimized software on commonly used hardware devices, we pave the way for the future and enable everyone to directly enjoy advanced AI applications, models, and experiences through their pocket devices.