New AI Models From Twelve Labs Enable Deep Understanding Of Videos

Twelve Labs, a San Francisco-based startup, is paving the way for powerful applications by training AI models to comprehend images and text simultaneously. With a vision to help developers create programs that can see, listen, and understand the world like humans do, Twelve Labs focuses on solving complex video-language alignment problems.

Key Takeaway

Twelve Labs is making significant progress in building AI models for comprehensive video understanding. Its technology unlocks new possibilities in video search, scene classification, and content summarization. By addressing bias concerns and collaborating with strategic partners, Twelve Labs aims to continue driving innovation in the field of video analysis and empower organizations to unlock the potential of their vast video data.

Mapping Natural Language to Videos

By developing advanced models, Twelve Labs enables natural language mapping to the content of videos. These models can analyze actions, objects, and background sounds within a video, allowing developers to design applications that offer functionalities such as video search, scene classification, topic extraction, automatic summarization, and chapter splitting.

One of the key advantages of Twelve Labs’ technology is its potential for driving ad insertion and content moderation. For instance, it can distinguish between videos with knives that are violent versus instructional. Additionally, the technology has applications in media analytics and can automatically generate highlight reels or create blog post headlines and tags from videos.

Addressing Bias and Ensuring Fairness

While concerns about bias in AI models are valid, Twelve Labs assures that it strives to meet internal bias and fairness metrics when developing and releasing its models. The company also plans to share model-ethics-related benchmarks and datasets in the future.

Differentiation and Collaborations

Twelve Labs sets itself apart by offering high-quality models and a platform with fine-tuning capabilities. This allows customers to customize the models with their own data for domain-specific video analysis. In comparison to existing models like Google’s MUM, Twelve Labs’ technology has holistic integration of visual, audio, and speech components, pushing the boundaries of video understanding.

With an expanding user base of 17,000 developers since its private beta launch in May, Twelve Labs is already collaborating with various industries, including sports, media and entertainment, e-learning, and security. The company’s strategic funding round, which raised $10 million from Nvidia, Intel, and Samsung Next, will fuel ongoing innovation and support research, product development, and distribution efforts.