On January 24th, at the “New Architecture of Large Language Model” conference in Shanghai, Rock AI, a subsidiary of Shanghai Stonehill Technology Co., officially introduced the Yan Model, a groundbreaking large language model that is set to revolutionize the field of artificial intelligence. What sets the Yan Model apart is that it doesn’t rely on the popular Attention mechanism or the Transformer architecture that is commonly used in large language models.
The Yan Model boasts impressive advantages compared to Transformer models with similar parameters. It offers training efficiency that is seven times higher, five times the inference throughput, and three times the memory capacity. Additionally, it supports lossless operation on CPUs, reduces hallucination in expressions, and provides 100% support for private deployment applications.
Liu Fanping, the CEO of Rock AI, emphasized the significance of the Yan Model, stating, “We hope that the Yan architecture can serve as the infrastructure for the artificial intelligence field, and to establish a developer ecosystem in the AI domain. Ultimately, we aim to enable anyone to use general-purpose large models on any device, providing more economical, convenient, and secure AI services, and to promote the construction of an inclusive artificial intelligence future.”
To address the limitations of the Transformer architecture, which include high computational power consumption, extensive memory usage, high costs, and difficulties in processing long sequence data, the Yan Model introduces its own generative “Yan Architecture.” This new architecture allows for lossless inference of infinitely long sequences on consumer-grade CPUs. It achieves the performance effects of a large model with hundreds of billions of parameters using only tens of billions of parameters, making it ideal for low-cost and easy deployment of large models in enterprises.
During the press conference, the research team presented empirical comparisons between the Yan Model and a Transformer model of the same parameter scale. The results demonstrated that the Yan model outperforms the Transformer model in terms of training efficiency, inference throughput, and memory capacity.
Apart from these advantages, the Yan Model excels in handling long sequences, which has been a challenge for the Transformer. The Yan Model has the potential to achieve inference of unlimited length, opening up possibilities for its applications in high-risk areas such as healthcare, finance, and law.
The hardware advantage of the Yan Model is also worth noting. Unlike other large models that require compression or pruning and can only run on specialized hardware, the Yan Model can run on mainstream consumer-grade CPUs, significantly expanding the potential applications of large models across various industries.
Looking to the future, Liu Fanping expressed ambitious goals for Rock AI. The company intends to develop a full-modality real-time human-computer interaction system, achieve end-side training, and integrate training and inference. The ultimate aim is to create an intelligent loop for general artificial intelligence, connecting perception, cognition, decision-making, and action. This will provide more options for the foundational platform of large models in research areas such as general-purpose robots and embodied intelligence.
The introduction of the Yan Model marks a significant milestone in the field of artificial intelligence. With its unique architecture, improved efficiency, and broad applicability, the Yan Model has the potential to drive innovation and shape the future of AI. As the developer ecosystem is established and more industries embrace large models, we can anticipate exciting advancements and opportunities in the AI landscape.
Use the share button below if you liked it.