Stable Diffusion is an open-source latent diffusion model by CompVis Group at LMU Munich, released in 2022. A text-to-image model, it generates realistic, high-resolution images based on prompts, using a frozen CLIP ViT-L/14 text encoder. At runtime, it divides the imaging process into a "diffusion" phase, starting with mere noise and gradually improving the picture until it is completely free of noise, increasingly approaching the specified text description.
The diffusion-denoising mechanism allows the model to redraw existing pictures to add new features indicated by a text prompt (a technique known as "directed image synthesis").
Furthermore, prompts can partially edit existing pictures via inpainting and outpainting. As of now, the use of Stable Diffusion is completely free.
Generative AI refers to the subset of artificial intelligence focused on creating new content, ranging from images and music to text and more.
Text-to-Image Translation (T2I)
Text-to-image (T2I) translation is a type of artificial intelligence that generates an image based on a written description or a textual prompt.
Midjourney is a text-to-image AI system that produces unique images from human text input.