[12] In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
·
Paper Review/etc
[Paper] https://arxiv.org/pdf/2504.20690[Github] https://github.com/River-Zhang/ICEdit.git GitHub - River-Zhang/ICEdit: Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training releImage editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Onl..
[11] Visual Instruction Tuning (LLaVA: Large Language and Vision Assistant)
·
Paper Review/etc
[paper] https://arxiv.org/pdf/2304.08485[Github] https://github.com/haotian-liu/LLaVA GitHub - haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyo[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond. - haotian-liu/LLaVAgithub.com Abstract 기존 LLM의 문제점: 이미지를 입력 받지 못해 vision 정보를 처리..
[10] CrossViT: Cross-Attention Multi-Scale Vision Transformer for ImageClassification
·
Paper Review/etc
[Paper] https://openaccess.thecvf.com//content/ICCV2021/papers/Chen_CrossViT_Cross-Attention_Multi-Scale_Vision_Transformer_for_Image_Classification_ICCV_2021_paper.pdf [Github] https://github.com/IBM/CrossViT GitHub - IBM/CrossViT: Official implementation of CrossViT. https://arxiv.org/abs/2103.14899 Official implementation of CrossViT. https://arxiv.org/abs/2103.14899 - GitHub - IBM/CrossViT: ..
[9] Supervised Contrastive Learning
·
Paper Review/etc
[Paper] https://arxiv.org/pdf/2004.11362.pdf [Github] https://github.com/HobbitLong/SupContrast