VJ UNION

Cover image for Wildstyle Graf - VJ Loops Pack
ISOSCELES
ISOSCELES

Posted on

Wildstyle Graf - VJ Loops Pack

Download Pack

This pack contains 137 VJ loops (124 GB)
https://www.patreon.com/posts/115158665

Behind the Scenes

I love when I'm driving and see only a glimpse of some street art and being left with a feeling of surprised awe. So I keep trying to create my own warped version of graffiti and visualize what I've long imagined. So after many years of daydreaming and inching towards this point, I feel like I've arrived. This is an epic pack because it's a topic that has continually inspired me and so I'm off the leash with this one.

While I was happy with the result of the prior 'Graffiti Reset' VJ pack, I felt like there was still new territory to explore in the wildstyle graffiti vein. I've been really curious to experiment with the Flux model that was recently released. So I got Forge installed for the first time and found it comparable to Automatic1111. I could immediately tell that Flux was going to be a big upgrade from Stable Diffusion since it follows text prompts much more consistently and the render quality is superb, although the render time per image is heavier. Stable Diffusion would never quite follow my text prompt when I requested a subject "on a pure black background" without a special LoRA or IMG2IMG tricks.

First I tried a few different approaches using Flux with just text prompting to create graffiti imagery that I've also done with Stable Diffusion, but it just doesn't seem like these foundation models are trained on what I'm looking to visualize. Just when I was able to give up, I headed over to CivitAI and found some amazing LoRAs that were hugely exciting to play with. So I nailed down a text prompt using Flux and started rendering out tons of images on my local computer. Holy smokes, Flux is very hungry for RAM and so I didn't have enough RAM to run another instance running of Forge on my second GPU and so that was a slight bummer. After letting it render overnight, I saw it was taking 9 seconds per image (at 512x512) and it was going to take too long to get a large dataset. So I used Google Colab to get another instance of Forge rendering out images. So I bought 100 compute units and started rendering out loads of images, taking about 2 seconds per image on a A100 GPU. In total I rendered out 41,742 images. Then I manually curated through the images and deleted any that didn't match the theme I was hunting for, which was a significant percentage. This was painful to do manually but the text prompt I created was so full of variety and yet every time I tried to refine the text prompt then it also killed it's unhinged creativity. I ended up with a refined image dataset of 7,256 images of a wide range of wildstyle graffiti styles.

The next step was to take the image dataset and use it to train StyleGAN2 and StyleGAN3. One thing I really dislike about this wild west time period is how quickly AI tech is breaking. I was planning on doing some extensive training in the cloud using Google Colab, but my notebooks no longer function even though I haven't changed anything and within one year it's already broken. I suspect that some change to CUDA or Torch wasn't backwards compatible. Plus I recently learned that I can't use a GPU newer than a 3090 due to the StyleGAN codebase doing JIT compiling while training and so it relies on a certain version of CUDA. I hate wasting my time on these types of undocumented issues and so I tried a bunch of fixes and just gave up on training in the cloud. Hence I had no choice but to train locally on my tower.

I was considering the ideal gamma value to start with for training StyleGAN2, which is tricky since there is a rough global pattern to wildstyle graffiti and yet it's also highly diverse. But the black background makes it easier for the neural network to converge since there is effectively only foreground visible. Even after all the experiments I've done with StyleGAN there are still some core questions that continue to plague me. So I randomly had the idea of asking ChatGPT-4o some questions about how the Gamma value (r1 regularization) functions. To directly quote ChatGPT: "Without sufficient regularization, the discriminator can focus on tiny, irrelevant details in the training images. The generator then tries to match those details exactly, rather than learning general, meaningful features from the data. As the model tries to generate images that fool the overly powerful discriminator, it may start memorizing the noise or specific patterns in the training data. Instead of generalizing to new, unseen data or creating varied images, it will overfit to the specific dataset it’s been trained on. If you notice repeating patterns then increase Gamma by 5 or 10 and after 1000kimg compare it against your prior results. This will help regularize the model and encourage it to generalize better across your dataset." This was super insightful to me because I've long thought the opposite was true and that by lowering the Gamma value that I was giving the model more diversity to learn from, which is partially true. So these tips from ChatGPT were helpful since I just needed just a little bit more info to bring together my many experiences with StyleGAN training. Over the years I've cobbled together an understanding of the various attributes based on my many experiments and notes, but I've reached a limit since I don't truly understand each attribute and so it's been hard to know how to judge what is successful and for what reason. I've read every article, paper, and comment available on the internet about regularization for neural networks and it's still been very difficult to assemble my own understanding. So it's pretty amazing to have an actual problem that I've had for years and have ChatGPT explain things that I know are not available elsewhere. Ironically I used a gamma value of 10, which is what I typically start at, and it converged really well.

Over multiple training runs I ended up fine-tuning StyleGAN2 for 9024 kimg, which amounts to roughly 216 hours. I also fine-tuned StyleGAN3 for 4584 kimg, which amounts to roughly 220 hours. This makes sense due to my (x2) Quadro RTX 5000 cards can do about 1000 kimg per day for StyleGAN2 and 500 kimg per day for StyleGAN3. In the past the most intense training run I've done was only half this duration and so the quality of these interpolations is on another level, which is possible due to the highly refined dataset. An interesting aspect I've realized is that I believe Stable Diffusion starts to loosely repeat itself when rendering out a dataset with thousands of images, meaning that there are global patterns that are difficult for a human eye to pick up. But Flux seems to generate images with much more diversity when rendering out a dataset with thousands of images. In the past I could easily pick out recurring themes in a fine-tuned StyleGAN model and see where it was overfitting to a Stable Diffusion image dataset. And while there is still a little bit of overfitting in the fine-tuned model of the Flux image dataset, it's much more expressive. So now that overfitting is less of an issue, I can train for longer and get better results.

From here I rendered out 50,000 seeds for each of the SG2 and SG3 models so that I could pick out the best seeds by hand, sequence the seeds, and then render out the videos at 512x512. Then I took the videos into Topaz Video AI and uprezzed them to 3072x3072. Since the graffiti didn't fill up the entire frame, this huge uprez allowed me to then take the videos into After Effects and crop them to 3840x2160 without cropping out any graffiti content. I'm such a sucker for content that doesn't touch the frame edges and therefore allows you to place it anywhere on your canvas while VJing. But golly, rendering out 3840x2160 60fps content from After Effects created some very long renders. More tech, more problems!

I had a fresh idea while rendering out the seed walk videos. Typically I set the truncation value to 0.7 and don't think further about it since it typically distorts the video in messy ways that I feel are undesirable. But in this context I wondered what would happen if I rendered out the same video but at several different "truc" values (0.7, 1.0, 1.5, 2.0) and then composite them together in After Effects. The experimental result is delicious and pushes the graffiti into uncharted territories where you can see both the AI model leaking through into almost painterly realms.

Riding the wave of that successful experiment, I wondered how else I could further tweak the StyleGAN models and then composite it together After Effects. So I loaded up a SG2 model blending script that takes higher rez portions from one model and the lower rez portions of a different model and then merges the two disparate neural networks together into a new blended model. Super experimental. At first I thought the rendered videos from these models were crap, but then I did some compositing experiments where I used the original model video to cutout details from the blended video... And the results were incredible. You'd never know it, but I combined together the wildstyle graffiti model with some prior SG2 models such as Alien Guest, Human Faces, Graffiti Reset, Lightning, Cyborg Fomo, and Nature Artificial. Strange worlds merging into new worlds.

From there I took everything into After Effects and did some more compositing experiments using FX such as Deep Glow, Pixel Encoder, and Modulation 2. I also did some slitscan experiments and followed my typical process of first taking the videos into Topaz Video AI, interpolating from 60fps to 240 fps, and that fixes most of the time aliasing that is visible when the slitscan FX is applied in After Effects. I also did some tests in creating a luminosity map to create an alpha channel for each video, but it removed too many dark gradients and so I scraped it. But I also did some tests in Resolume and the AutoMask FX works amazingly well if you crank up the Contrast attribute within the AutoMask settings.

And of course, my brain had some last minute ideas. Since I've been working on this pack so intensely, I was trying to fall asleep and was seeing graffiti in my mind. And in the moments before falling asleep I imagined a brick wall warping and building itself behind the graffit. So I bought a brick wall model on Turbosquid that featured individual poly for each brick, deleted the mortar, and played with different animation ideas. Def worth the extra elbow grease to give it that final polish.

Overall this pack has brought together my StyleGAN experience and pushed it to a new threshold. So it's very satisfying to see the culmination of my recurring daydreams after so many experiments, tests, and failures that I sometimes gloss over. But I still have more graffiti related ideas for the future... More to come. Happy tagging!

https://www.jasonfletcher.info/vjloops/

Discussion (0)