Multimodal Concept Erasure Benchmark for Diffusion Models
Analysis
This paper introduces M-ErasureBench, a novel benchmark for evaluating concept erasure methods in diffusion models across multiple input modalities (text, embeddings, latents). It highlights the limitations of existing methods, particularly when dealing with modalities beyond text prompts, and proposes a new method, IRECE, to improve robustness. The work is significant because it addresses a critical vulnerability in generative models related to harmful content generation and copyright infringement, offering a more comprehensive evaluation framework and a practical solution.
Key Takeaways
- •M-ErasureBench provides a comprehensive multimodal evaluation framework for concept erasure in diffusion models.
- •Existing concept erasure methods are vulnerable to attacks using learned embeddings and inverted latents.
- •IRECE, a proposed plug-and-play module, improves robustness against concept reproduction.
- •The research addresses a critical issue of harmful content generation in generative models.
“Existing methods achieve strong erasure performance against text prompts but largely fail under learned embeddings and inverted latents, with Concept Reproduction Rate (CRR) exceeding 90% in the white-box setting.”