The tech giant has unveiled Apple MGIE, a cutting-edge open-source AI model that enables image editing through natural language instructions. MGIE, short for MLLM-Guided Image Editing, harnesses the power of multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations with remarkable accuracy.
The model boasts a wide range of editing capabilities, including Photoshop-style modification, global photo optimization, and local editing. This means that users can effortlessly enhance their images with a simple text command.
The development of MGIE is a result of a groundbreaking collaboration between Apple and a team of researchers from the University of California, Santa Barbara. The model was presented in a research paper accepted at the prestigious International Conference on Learning Representations (ICLR) 2024, a premier platform for AI research. The paper showcases the impressive effectiveness of MGIE in improving automatic metrics and human evaluation, all while maintaining competitive inference efficiency.
What is Apple MGIE?
Apple MGIE, which stands for Multimodal Guided Image Editing, is a system developed by Apple that uses machine learning to allow users to edit images using natural language instructions. This means that instead of having to use complex editing tools or menus, users can simply describe what they want to do to the image, and MGIE will automatically make the changes.
Just like other generative AI image tools such as Midjourney, StableDiffusion, and DALL-E, Apple MGIE bridges the gap between human intention and image manipulation. It leverages the power of multimodal learning, meaning it understands both visual information (the image itself) and textual information (your instructions).
How does Apple MGIE work?
A user could say “Make the sky in this image bluer” or “Remove the red car from this photo”, and MGIE would be able to understand and carry out these instructions. MGIE is still under development, but it has the potential to make image editing much easier and more accessible for everyone.
The core concept behind Apple MGIE workflow is as follows:
- Inputting your commands: You describe your desired edits in plain English, like “Make the trees in this photo taller,” or “Change the color of the dress to blue”
- Understanding your intent: MGIE’s advanced language model deciphers your instructions, grasping the specific objects, attributes, and modifications you have in mind
- Visual understanding: simultaneously, MGIE analyzes the image, identifying key elements and their relationships
- Guided editing: Combining both linguistic and visual understanding, MGIE intelligently manipulates the image to accurately reflect your commands. It doesn’t just blindly follow instructions but can interpret context and make sensible adjustments
How to use MGIE
Apple MGIE has emerged as an open-source project on GitHub, offering a unique approach to image editing through natural language commands. This development allows users to explore and contribute to the project directly.
The project provides full access to its source code, training data, and pre-trained models on GitHub. This transparency enables developers and researchers to understand its inner workings and potentially contribute improvements.
A demo notebook is also available on GitHub, guiding users through various editing tasks using natural language instructions. This serves as a practical introduction to MGIE’s capabilities.
Users can also experiment with MGIE through a web demo hosted on Hugging Face Spaces. This online platform offers a quick and convenient way to try out the system without local setup.
The system welcomes user feedback and allows for refining edits or requesting different modifications. This iterative approach aims to ensure the generated edits align with the user’s artistic vision.
While open-sourcing makes MGIE accessible, it’s important to remember it remains under development. Ongoing research and user contributions will shape its future capabilities and potential applications.
Featured image credit: vecstock/Freepik.