ByteDance, the parent company of TikTok, has announced a significant leap in AI model training efficiency, achieving a 1.71x improvement in the performance of its large language model training, according to a recent research paper published on the scientific platform arXiv.
This breakthrough was made possible by the COMET system, an enhanced version of the “Mixture of Experts” (MoE) technique, a machine learning method where multiple expert networks are used to divide the problem space into homogeneous sections, boosting the efficiency of large models while reducing computational costs.
According to the South China Morning Post, this technique has already been widely adopted by leading AI models, such as “Grok” and “DeepSeek,” and allows the scaling of models to trillions of parameters without increasing computing costs.
ByteDance has implemented this methodology in its production environment, utilizing over 10,000 graphic processing units (GPUs), resulting in the saving of millions of GPU hours. This advancement could reduce reliance on Nvidia chips, which face stringent export controls from the United States.
Despite the significant benefits of the “Mixture of Experts” technique, the research team behind ByteDance’s “Doubao” chatbot pointed out challenges related to “interference with computational processes”, which may affect computational efficiency. The company aims to improve this aspect moving forward.
Source: South China Morning Post