Effortless data labeling with AI support from Segment Anything and other awesome models.
-
Updated
Feb 21, 2026 - Python
Effortless data labeling with AI support from Segment Anything and other awesome models.
BoxMOT: Pluggable SOTA multi-object tracking modules for segmentation, object detection and pose estimation models
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
OpenMMLab Pre-training Toolbox and Benchmark
🥂 Gracefully face hCaptcha challenge with multimodal large language model.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
CLIP + FFT/DWT/RGB = text to image/video
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
LLM2CLIP significantly improves already state-of-the-art CLIP models.
Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam
Add a description, image, and links to the clip topic page so that developers can more easily learn about it.
To associate your repository with the clip topic, visit your repo's landing page and select "manage topics."