- 学术资源搜索

Adding conditional control to text-to-image diffusion models

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …

被引用次数：2028 相关文章所有 6 个版本

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we present an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

被引用次数：823 相关文章所有 4 个版本

[PDF] aaai.org

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L Xie, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

被引用次数：502 相关文章所有 3 个版本

[PDF] thecvf.com

Generalized decoding for pixel, image, and language

X Zou, ZY Dou, J Yang, Z Gan, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present X-Decoder, a generalized decoding model that can predict pixel-level
segmentation and language tokens seamlessly. X-Decoder takes as input two types of …

被引用次数：170 相关文章所有 6 个版本

[PDF] thecvf.com

Dreambooth3d: Subject-driven text-to-3d generation

A Raj, S Kaza, B Poole, M Niemeyer… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present DreamBooth3D, an approach to personalize text-to-3D generative models from
as few as 3-6 casually captured images of a subject. Our approach combines recent …

被引用次数：148 相关文章所有 5 个版本

[PDF] thecvf.com

Instantbooth: Personalized text-to-image generation without test-time finetuning

J Shi, W Xiong, Z Lin, HJ Jung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recent advances in personalized image generation have enabled pre-trained text-to-image
models to learn new concepts from specific image sets. However these methods often …

被引用次数：132 相关文章所有 3 个版本

[PDF] thecvf.com

Stablevideo: Text-driven consistency-aware diffusion video editing

W Chai, X Guo, G Wang, Y Lu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Diffusion-based methods can generate realistic images and videos, but they struggle to edit
existing objects in a video while preserving their geometry over time. This prevents diffusion …

被引用次数：92 相关文章所有 5 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：109 相关文章所有 6 个版本

[PDF] arxiv.org

Composer: Creative and controllable image synthesis with composable conditions

L Huang, D Chen, Y Liu, Y Shen, D Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent large-scale generative models learned on big data are capable of synthesizing
incredible images yet suffer from limited controllability. This work offers a new generation …

被引用次数：171 相关文章所有 5 个版本

[PDF] thecvf.com

Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion

J Xie, Y Li, Y Huang, H Liu, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent text-to-image diffusion models have demonstrated an astonishing capacity to
generate high-quality images. However, researchers mainly studied the way of synthesizing …

被引用次数：84 相关文章所有 8 个版本

高级搜索

QQ 群

Adding conditional control to text-to-image diffusion models

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Generalized decoding for pixel, image, and language

Dreambooth3d: Subject-driven text-to-3d generation

Instantbooth: Personalized text-to-image generation without test-time finetuning

Stablevideo: Text-driven consistency-aware diffusion video editing

Multimodal foundation models: From specialists to general-purpose assistants

Composer: Creative and controllable image synthesis with composable conditions

Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion

引用