Empowering VLMs for Humorous Meme Generation

Published:Dec 31, 2025 01:35
1 min read
ArXiv

Analysis

This paper introduces HUMOR, a framework designed to improve the ability of Vision-Language Models (VLMs) to generate humorous memes. It addresses the challenge of moving beyond simple image-to-caption generation by incorporating hierarchical reasoning (Chain-of-Thought) and aligning with human preferences through a reward model and reinforcement learning. The approach is novel in its multi-path CoT and group-wise preference learning, aiming for more diverse and higher-quality meme generation.

Reference

HUMOR employs a hierarchical, multi-path Chain-of-Thought (CoT) to enhance reasoning diversity and a pairwise reward model for capturing subjective humor.