标签: Masked Image Modeling

Machine Learning Computer Vision

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

1 研究背景、动机、主要贡献1.1 存在问题(动机)自回归生成由于图像令牌数量庞大，效率低下；而非自回归方法（如MIM）则在性能上有限，无法与先进的扩散模型相比。 1.2 主要贡献增强的变换器架构：结合多模态和单模态变换器层，提高M...