Large-Scale Multimodal Dataset for Industrial Defect Understanding
Analysis
Key Takeaways
- •Introduces IMDD-1M, a large-scale multimodal dataset for industrial defect understanding.
- •The dataset contains aligned image-text pairs covering a wide range of materials and defect types.
- •A diffusion-based vision-language foundation model is trained on the dataset.
- •The model demonstrates data-efficient adaptation to specialized domains, achieving comparable performance with significantly less data than dedicated models.
“The model achieves comparable performance with less than 5% of the task-specific data required by dedicated expert models.”