VJ UNION

Cover image for Open Source AI updates for Visual artists
Sleepless Monk
Sleepless Monk

Posted on

Open Source AI updates for Visual artists

Some serious updates on open source AI while OpenAI, Runway and Suno have been unveiling their new models and making deals with content providers and streaming services to overcome copyright issues.

Lets start out with cogvideox, perhaps the best open source video model to date, there are various ways to run it including comfyui nodes

GitHub logo THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

CogVideo & CogVideoX

中文阅读

日本語で読む

Experience the CogVideoX-5B model online at 🤗 Huggingface Space or 🤖 ModelScope Space

📚 View the paper and user guide

👋 Join our WeChat and Discord

📍 Visit QingYing and API Platform to experience larger-scale commercial video generation models

Project Updates

  • 🔥🔥 News: 2024/11/08: We have released the CogVideoX1.5 model. CogVideoX1.5 is an upgraded version of the open-source model CogVideoX The CogVideoX1.5-5B series supports 10-second videos with higher resolution, and CogVideoX1.5-5B-I2V supports video generation at any resolution. The SAT code has already been updated, while the diffusers version is still under adaptation. Download the SAT version code here.
  • 🔥 News: 2024/10/13: A more cost-effective fine-tuning framework for CogVideoX-5B that works with a single 4090 GPU, cogvideox-factory, has been released. It supports fine-tuning with multiple resolutions. Feel free to use it!
  • 🔥 News: 2024/10/10: We have updated our technical…

Temporal lab is a python based video suite that combines cogvideox with ollama to provide integrated LLM and video generation services for filmmakers, artists, etc

GitHub logo TemporalLabsLLC-SOL / TemporalPromptEngine

A comprehensive, click to install, fully open-source, Video + Audio Generation AIO Toolkit using advanced prompt engineering plus the power of CogVideox + AudioLDM2 + Python!

Temporal Prompt Engine: Local, Open-Source, Intuitive, Cinematic Prompt Engine + Video and Audio Generation Suite for Nvidia GPUs

Table of Contents

  1. Introduction
  2. Features Overview
  3. Installation
  4. Quick Start Guide
  5. API Key Setup
  6. Story Mode: Unleash Epic Narratives
  7. Inspirational Use Cases
  8. Harnessing the Power of ComfyUI
  9. Local Video Generation Using CogVideo
  10. Join the Temporal Labs Journey
  11. Donations and Support
  12. Additional Services Offered
  13. Attribution and Courtesy Request
  14. Contact
  15. Acknowledgments


1. Introduction

Welcome to the Temporal Prompt Engine, your ultimate tool for crafting immersive video and audio experiences. This engine empowers you to generate high-quality prompts with unparalleled control over cinematic elements, all while being intuitive and accessible for users. I'm still currently experimenting with options a lot and will be honing the variety down a bit as I go.

Unleash Your Creativity

Imagine capturing the world through the eyes of an ancient philosopher contemplating the cosmos, visualizing crypto-animals roaming digital landscapes…




This is not new news but I recently tested invoke AI, it is an interesting alternative for those who are looking for the modular aspects of comfyui without the complexity
https://invoke-ai.github.io/InvokeAI/installation/installer/#running-the-installer

Finally some cutting edge new generation Image models, the long awaited SD 3.5 and the Flux model by their competitors.

https://comfyanonymous.github.io/ComfyUI_examples/sd3/?ref=blog.comfy.org

https://stable-diffusion-art.com/flux-comfyui/

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.

GitHub logo VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

OmniGen: Unified Image Generation

Build Build License Build Build

1. News

  • 2024-11-03:✨✨Added Replicate Demo and API: Replicate
  • 2024-10-28:✨✨We release new version of inference code, optimizing the memory usage and time cost. You can refer to docs/inference.md for detailed information.
  • 2024-10-22:🔥🔥We release the code for OmniGen. Inference: docs/inference.md Train: docs/fine-tuning.md
  • 2024-10-22:🔥🔥We release the first version of OmniGen. Model Weight: Shitao/OmniGen-v1 HF Demo: 🤗

2. Overview

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.

Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we




In the rapidly evolving landscape of Artificial General Intelligence (AGI), the emergence of Florence-2 signifies a monumental stride forward in the realm of computer vision. Developed by a team at Azure AI, Microsoft, this state-of-the-art vision foundation model aims to redefine the way machines comprehend and interpret visual data. Let's delve into this groundbreaking advancement and explore how Florence-2 is poised to revolutionize the field of AI. (you have nodes in comfyui to use florence)
https://www.labellerr.com/blog/florence-2-vision-model-by-microsoft/

On a closing note, let me introduce again stability matrix, a one system to manage all your AI art needs (it can install various interfaces like sdwebui and its variants forge,SDnext, comfyui, invokeai, foocus, swarmui, onetrainer ), it works mostly great, Im only running invokeai separately as of now. Makes easy keeping track of things, hopefully it will integrate some llm solutions, NERF, TD UE plugins too in future as a full multimodal system though as of now these need to installed separately

GitHub logo LykosAI / StabilityMatrix

Multi-Platform Package Manager for Stable Diffusion

Stability Matrix

Build Discord Server

Latest Stable Latest Preview Latest Dev

Header image for Stability Matrix, Multi-Platform Package Manager and Inference UI for Stable Diffusion

Windows Linux (AppImage) Arch Linux (AUR) macOS

Multi-Platform Package Manager and Inference UI for Stable Diffusion

🖱️ One click install and update for Stable Diffusion Web UI Packages

✨ Inference - A Reimagined Interface for Stable Diffusion, Built-In to Stability Matrix

  • Powerful auto-completion and…

Discussion (0)