clip

Star

Here are 635 public repositories matching this topic...

CVHub520 / X-AnyLabeling

Sponsor

Star

Effortless data labeling with AI support from Segment Anything and other awesome models.

Updated Feb 21, 2026
Python

mikel-brostrom / boxmot

Sponsor

Star

BoxMOT: Pluggable SOTA multi-object tracking modules for segmentation, object detection and pose estimation models

Updated Feb 25, 2026
Python

open-compass / VLMEvalKit

Star

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Feb 27, 2026
Python

open-mmlab / mmpretrain

Star

OpenMMLab Pre-training Toolbox and Benchmark

deep-learning pytorch image-classification resnet pretrained-models clip mae mobilenet moco multimodal self-supervised-learning constrastive-learning beit vision-transformer swin-transformer masked-image-modeling convnext

Updated Nov 1, 2024
Python

pharmapsychotic / clip-interrogator

Star

Image to prompt with BLIP and CLIP

pytorch clip

Updated May 15, 2024
Python

QIN2DIM / hcaptcha-challenger

Star

🥂 Gracefully face hCaptcha challenge with multimodal large language model.

agent captcha gemini openai yolo clip captcha-solving captcha-solver ai-agents hcaptcha playwright hcaptcha-solver llm chatgpt

Updated Jan 28, 2026
Python

cambrian-mllm / cambrian

Star

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Nov 7, 2025
Python

roboflow / awesome-openai-vision-api-experiments

Star

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

computer-vision openai classification clip zero-shot chatgpt segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Jan 14, 2025
Python

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Aug 5, 2025
Python

unum-cloud / UForm

Star

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Updated Oct 30, 2025
Python

ArrowLuo / CLIP4Clip

Star

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

search retrieval ranking clip multimodality multimodal-learning multimodal activitynet retrieval-model msvd msrvtt video-text-retrieval lsmdc didemo video-clip-retrieval

Updated Apr 12, 2024
Python

omerbt / Text2LIVE

Star

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

image-editing generative-model image-manipulation clip video-editing single-image eccv2022 text2live text-driven-editing single-video

Updated Mar 9, 2023
Python

pengsongyou / openscene

Star

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

clip point-clouds semantic-segmentation scannet point-cloud-segmentation nuscenes matterport3d 3d-scene-understanding llm cvpr2023

Updated Oct 27, 2023
Python

eps696 / aphantasia

Star

CLIP + FFT/DWT/RGB = text to image/video

clip text-to-image text-to-video

Updated Feb 13, 2025
Python

PaddlePaddle / PaddleMIX

Star

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Feb 3, 2026
Python

Sense-GVT / DeCLIP

Star

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

multi-model clip big-model zero-shot self-supervised image-text vision-language-pretraining

Updated Sep 19, 2022
Python

v-iashin / video_features

Star

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

Updated Feb 1, 2026
Python

SkalskiP / awesome-foundation-and-multimodal-models

Star

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Feb 29, 2024
Python

microsoft / LLM2CLIP

Star

LLM2CLIP significantly improves already state-of-the-art CLIP models.

clip multimodality fundation-models

Updated Feb 1, 2026
Python

leondgarse / keras_cv_attention_models

Star

Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam

recognition tensorflow model detection keras tf2 imagenet attention coco clip tf visualizing ddpm stable-diffusion segment-anything

Updated Feb 15, 2026
Python

Improve this page

Add a description, image, and links to the clip topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the clip topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip

Here are 635 public repositories matching this topic...

CVHub520 / X-AnyLabeling

mikel-brostrom / boxmot

open-compass / VLMEvalKit

open-mmlab / mmpretrain

pharmapsychotic / clip-interrogator

QIN2DIM / hcaptcha-challenger

cambrian-mllm / cambrian

roboflow / awesome-openai-vision-api-experiments

mbzuai-oryx / Video-ChatGPT

unum-cloud / UForm

ArrowLuo / CLIP4Clip

omerbt / Text2LIVE

pengsongyou / openscene

eps696 / aphantasia

PaddlePaddle / PaddleMIX

Sense-GVT / DeCLIP

v-iashin / video_features

SkalskiP / awesome-foundation-and-multimodal-models

microsoft / LLM2CLIP

leondgarse / keras_cv_attention_models

Improve this page

Add this topic to your repo