Some serious updates on open source AI while OpenAI, Runway and Suno have been unveiling their new models and making deals with content providers and streaming services to overcome copyright issues.
Lets start out with cogvideox, perhaps the best open source video model to date, there are various ways to run it including comfyui nodes
CogVideo & CogVideoX
Experience the CogVideoX-5B model online at 🤗 Huggingface Space or 🤖 ModelScope Space
📚 View the paper and user guide
📍 Visit QingYing and API Platform to experience larger-scale commercial video generation models
Project Updates
- 🔥🔥 News:
2024/11/08
: We have released the CogVideoX1.5 model. CogVideoX1.5 is an upgraded version of the open-source model CogVideoX The CogVideoX1.5-5B series supports 10-second videos with higher resolution, and CogVideoX1.5-5B-I2V supports video generation at any resolution. The SAT code has already been updated, while the diffusers version is still under adaptation. Download the SAT version code here. - 🔥 News:
2024/10/13
: A more cost-effective fine-tuning framework forCogVideoX-5B
that works with a single 4090 GPU, cogvideox-factory, has been released. It supports fine-tuning with multiple resolutions. Feel free to use it! - 🔥 News:
2024/10/10
: We have updated our technical…
Temporal lab is a python based video suite that combines cogvideox with ollama to provide integrated LLM and video generation services for filmmakers, artists, etc
TemporalLabsLLC-SOL / TemporalPromptEngine
A comprehensive, click to install, fully open-source, Video + Audio Generation AIO Toolkit using advanced prompt engineering plus the power of CogVideox + AudioLDM2 + Python!
Temporal Prompt Engine: Local, Open-Source, Intuitive, Cinematic Prompt Engine + Video and Audio Generation Suite for Nvidia GPUs
Table of Contents
- Introduction
- Features Overview
- Installation
- Quick Start Guide
- API Key Setup
- Story Mode: Unleash Epic Narratives
- Inspirational Use Cases
- Harnessing the Power of ComfyUI
- Local Video Generation Using CogVideo
- Join the Temporal Labs Journey
- Donations and Support
- Additional Services Offered
- Attribution and Courtesy Request
- Contact
- Acknowledgments
1. Introduction
Welcome to the Temporal Prompt Engine, your ultimate tool for crafting immersive video and audio experiences. This engine empowers you to generate high-quality prompts with unparalleled control over cinematic elements, all while being intuitive and accessible for users. I'm still currently experimenting with options a lot and will be honing the variety down a bit as I go.
Unleash Your Creativity
Imagine capturing the world through the eyes of an ancient philosopher contemplating the cosmos, visualizing crypto-animals roaming digital landscapes…
This is not new news but I recently tested invoke AI, it is an interesting alternative for those who are looking for the modular aspects of comfyui without the complexity
https://invoke-ai.github.io/InvokeAI/installation/installer/#running-the-installer
Finally some cutting edge new generation Image models, the long awaited SD 3.5 and the Flux model by their competitors.
https://comfyanonymous.github.io/ComfyUI_examples/sd3/?ref=blog.comfy.org
https://stable-diffusion-art.com/flux-comfyui/
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.
VectorSpaceLab / OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
OmniGen: Unified Image Generation
News | Methodology | Capabilities | Quick Start | Finetune | License | Citation
1. News
- 2024-11-03:✨✨Added Replicate Demo and API:
- 2024-10-28:✨✨We release new version of inference code, optimizing the memory usage and time cost. You can refer to docs/inference.md for detailed information.
- 2024-10-22:🔥🔥We release the code for OmniGen. Inference: docs/inference.md Train: docs/fine-tuning.md
- 2024-10-22:🔥🔥We release the first version of OmniGen. Model Weight: Shitao/OmniGen-v1 HF Demo: 🤗
2. Overview
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we…
In the rapidly evolving landscape of Artificial General Intelligence (AGI), the emergence of Florence-2 signifies a monumental stride forward in the realm of computer vision. Developed by a team at Azure AI, Microsoft, this state-of-the-art vision foundation model aims to redefine the way machines comprehend and interpret visual data. Let's delve into this groundbreaking advancement and explore how Florence-2 is poised to revolutionize the field of AI. (you have nodes in comfyui to use florence)
https://www.labellerr.com/blog/florence-2-vision-model-by-microsoft/
On a closing note, let me introduce again stability matrix, a one system to manage all your AI art needs (it can install various interfaces like sdwebui and its variants forge,SDnext, comfyui, invokeai, foocus, swarmui, onetrainer ), it works mostly great, Im only running invokeai separately as of now. Makes easy keeping track of things, hopefully it will integrate some llm solutions, NERF, TD UE plugins too in future as a full multimodal system though as of now these need to installed separately
LykosAI / StabilityMatrix
Multi-Platform Package Manager for Stable Diffusion
Stability Matrix
Multi-Platform Package Manager and Inference UI for Stable Diffusion
🖱️ One click install and update for Stable Diffusion Web UI Packages
- Supports
- Stable Diffusion WebUI reForge, Stable Diffusion WebUI Forge, Automatic 1111, Automatic 1111 DirectML, SD Web UI-UX, SD.Next
- Fooocus, Fooocus MRE, Fooocus ControlNet SDXL, Ruined Fooocus, Fooocus - mashb1t's 1-Up Edition, SimpleSDXL
- ComfyUI
- StableSwarmUI
- VoltaML
- InvokeAI
- SDFX
- Kohya's GUI
- OneTrainer
- FluxGym
- Manage plugins / extensions for supported packages (Automatic1111, Comfy UI, SD Web UI-UX, and SD.Next)
- Easily install or update Python dependencies for each package
- Embedded Git and Python dependencies, with no need for either to be globally installed
- Fully portable - move Stability Matrix's Data Directory to a new drive or computer at any time
✨ Inference - A Reimagined Interface for Stable Diffusion, Built-In to Stability Matrix
- Powerful auto-completion and…
Discussion (0)