High 3D Gen creates detailed 3D models from single images, estimating even hidden parts accurately. It excels in intricate details compared to other generators, requiring a separate tool for texture generation. A free Hugging Face demo is available for testing.
HSMR generates 3D models of people, including their skeletons, from images or videos. This allows for accurate pose and movement estimation from different camera angles. Code and a Hugging Face demo are available.
Anime Gamer creates interactive anime game scenes from text prompts. Players control characters and environments by inputting text commands. The AI generates scenes in response to the prompts, modifying character states like stamina and social energy. Models and code are available on Hugging Face and GitHub.
Skyreels A2 is an AI that combines reference images to create videos. It merges characters, objects, and backgrounds into coherent scenes. Models and a GitHub repository are available, licensed under Apache 2 for commercial use.
Dream Actor M1 transfers acting and movements from a reference video to a still image. It accurately applies body movements, hand gestures, and facial expressions. It can also animate deceased actors. Currently, only a technical paper is available.
Wondershare Verbbo is an AI video maker that turns text, photos, or existing videos into videos with AI avatars, voice cloning, and AI voices in 90 languages, with translation features for global content creation.
Easy Control is an open-source image generator that allows for various conditional image generation. It combines multiple control nets into a single framework. A free Hugging Face space uses Easy Control with a Ghibli Studio style for image transformation.
Luminina MGBT2 is an open-source auto-regressive model for image generation. It can generate images from text prompts, edit existing images, and incorporate reference images. A GitHub repository with download instructions is available, but the standard model requires 80GB of VRAM.
Meta's Mocha generates videos from text descriptions and speech audio. It creates realistic animations of people and scenes, but it is limited to 5-second clips and is text-to-video only. The tool's release is uncertain.
OpenAI plans to release the 03 and O4 Mini models before GPT-5. 03 is expected to be more performant, especially in coding, math, and science.
This AI tool identifies and segments moving objects in videos accurately, even with shaky cameras, motion blur, and complex shapes. The code is available on GitHub.
Runway Gen 4 and MidJourney V7 have been released, with marginal improvements, but the presenter considers it to be disappointing.
Highlights the release of models for Vase by Alibaba, a plugin for base video generators, which can perform inpainting, add reference characters, transfer motion, and outpaint smaller videos.