• Tue, Oct 2024

How Chinese AI Firms Are Innovating to Dominate the Competitive Text-to-Video Market

How Chinese AI Firms Are Innovating to Dominate the Competitive Text-to-Video Market

How Chinese AI Firms Are Innovating to Dominate the Competitive Text-to-Video Market

Companies ranging from Zhipu AI, a small start-up to ByteDance, one of the biggest players in the Chinese market, have recently introduced artificial intelligence (AI) video generation tools but have struggled with the problem of how to distinguish themselves from multiple local competitors. 
The other player that joined the market was an operator of the short video generation Kuaishou Technology besides, the start-up Shengshu AI developed video generation tools to public. Another large multinational company that has issued a guide for a tool similar to Sora is the e-commerce company Alibaba Group Holding. As for the media, Alibaba Company’s media arm consists of South China Morning Post. Thus, Chinese firms appear to be a few months behind OpenAi’s Sora in developing models that can transcribe text directly into videos, although they have demonstrated the ability to rapidly progress in the subject, said the analysts. 

Do you have questions about the biggest themes and events in the global arena and throughout the year? The answers are available with SCMP Knowledge, a new mode of selected pieces compiled from features, guides, explanations, and statements made by the award-winning team. 

According to Lu Yanxia, the research director for emerging technology at IDC China, text-to-video models have rapidly grown because of China’s massive fund for AI models. Microsoft-funded OpenAI introduced text-to-video synthesis with Sora in February but he product has not been launched to the market, with only several pilot customers allowed to use it. 
Other players in the line also followed and introduced their adaptation of Sora, including ByteDance, which launched a video tool called Jimeng to local Android stores on July 31. It also takes both text and image inputs and produces a clip of up to 12 seconds; thus, it is effective for delivering any video, taking the title of the greatest when it comes to the video length. 

Kuaishou model is capable of producing clips of a maximum of 10 seconds while Zhipu AI has Qing and Shengshu’s Vidu that produces clips of 4 to 6 seconds. Using Figures 5, 6 and 7 below, Shengshu can be seen to be the fastest when it comes to the generation of advertisements. Its version can produce a clip of four seconds in less than 30 seconds while most of the other services may take some time to produce a video with approximately the same duration. 

An employee of one of the AI firms, who preferred to remain anonymous, noted that Chinese firms’ models were not much different from each other. Instead, what will manifest itself is the demonstration of differentiation towards services offered and the industry in which those services operate. encompassing four services; one, two and three offer free trials of the services but the response time is comparatively slower especially during high usage. They also have pricing plans whereby the users do not have to wait for a long time and have other benefits like having higher clarity clips. 

According to IDC’s Lu, the first areas that will use video models will be internet followed by live streaming and video games smart cities and lastly manufacturing. ‘This will be the primary area of the fierce rivalry of the generative AI technologies,’ she commented.