multimodal-llm

Here are 33 public repositories matching this topic...

FireRedTeam / FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

open-source transformer speech-recognition automatic-speech-recognition asr conformer llm industrial-grade multimodal-llm speechllm

Updated Sep 22, 2025
Python

eric-ai-lab / MiniGPT-5

Star

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

transformers diffusion-models multimodal-generation multimodal-llm

Updated May 8, 2025
Python

alipay / Ant-Multi-Modal-Framework

Star

Research Code for Multimodal-Cognition Team in Ant Group

video-editing multimodal-learning video-text-retrieval image-text-retrieval multimodal-llm

Updated Oct 14, 2025
Python

xmed-lab / TAM

Star

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

transformers cam explainable-ai mllm multimodal-llm token-activation-map

Updated Aug 8, 2025
Python

InfiXAI / InfiGUI-G1

Star

Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecks in GUI agents through efficient, guided exploration.

reinforcement-learning computer-vision deep-learning large-language-models multimodal-llm gui-grounding gui-agent

Updated Sep 4, 2025
Python

Zhoues / MineDreamer

Star

[IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "

minecraft diffusion-model embodied-agent multimodal-llm

Updated Jun 16, 2025
Python

UCSC-VLAA / vllm-safety-benchmark

Star

[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

benchmark safety datasets robustness adversarial-attacks llm vision-language-model multimodal-llm

Updated Nov 28, 2023
Python

AIDC-AI / Wings

Star

The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]

deep-learning mllm multimodal-large-language-models multimodal-llm text-only-forgetting

Updated Dec 28, 2024
Python

shanface33 / GPT4MF_UB

Star

Official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

image-forensics deepfake-detection deepfake-images chatgpt-4 multimodal-llm ai-generated-image-detection

Updated Mar 22, 2024

HenryPengZou / ImplicitAVE

Star

[ACL 2024] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"

attribute-value-extraction vision-language-model multimodal-llm implicit-attribute-value-extraction

Updated Jun 10, 2024
Jupyter Notebook

andy9705 / SumGD

Star

[NAACL 2025 Findings] Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding

multimodal hallucination language-prior large-vision-language-models multimodal-llm

Updated Jun 20, 2025
Python

monurcan / efficient_test_time_scaling

Star

Efficient Test-Time Scaling for Small Vision-Language Models, official implementation of the paper, test-time scaling via test-time augmentation

vlm test-time-augmentation test-time-adaptation llm vision-language-model multimodal-llm test-time-scaling inference-time-scaling

Updated Oct 7, 2025
Python

iamaziz / chat_with_images

Star

Streamlit app to chat with images using Multi-modal LLMs.

streamlit llms llava multimodal-llm

Updated Mar 17, 2024
Python

deeplearning-wisc / mllmshift-emi

Star

Official implementation of ICML 2025 paper "Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach"

information-theory robustness distribution-shift vision-language-model multimodal-llm

Updated May 27, 2025
Python

manhph2211 / Q-HEART

Star

Q-HEART: ECG Question Answering via Knowledge-Informed Multimodal LLMs (ECAI 2025)

self-supervised-learning ecg-qa multimodal-llm multimodal-retrieval-augmented-generation ecg-foundation-model ecg-text-multimodal-learning ecg-question-answering q-heart

Updated Aug 22, 2025
Python

jagennath-hari / SpatialFusion-LM

Star

SpatialFusion-LM is a real-time spatial reasoning framework that combines neural depth, 3D reconstruction, and language-driven scene understanding.

computer-vision robotics transformer point-clouds stereo-vision depth-estimation zero-shot-learning scene-understanding 3d-estimation vision-transformer foundation-models mllm vision-language-model multimodal-llm spatial-intelligence

Updated May 2, 2025
Python

abdur75648 / MedicalGPT

Star

Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)

medical-imaging vqa llama vqa-dataset medical-dataset vicuna llm medical-report-generation llms chatgpt minigpt4 multimodal-llm medicalgpt chatgpt4o xraygpt

Updated Jun 28, 2025
Python

autodistill / autodistill-llava

Star

LLaVA base model for use with Autodistill.

computer-vision llava autodistill multimodal-llm

Updated Jan 24, 2024
Python

zhudotexe / kani-vision

Sponsor

Star

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

extension kani large-language-models vision-language-model llava multimodal-llm gpt-vision

Updated Jul 2, 2025
Python

Masoudjafaripour / nanochat-VLM

Star

A minimal, hackable Vision-Language Model built on Karpathy’s nanochat — add image understanding and multimodal chat for under $200 in compute.

pytorch vlm finetuning llm llms vlms multimodal-llm nanochat

Updated Nov 4, 2025
Python

Improve this page

Add a description, image, and links to the multimodal-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-llm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-llm

Here are 33 public repositories matching this topic...

FireRedTeam / FireRedASR

eric-ai-lab / MiniGPT-5

alipay / Ant-Multi-Modal-Framework

xmed-lab / TAM

InfiXAI / InfiGUI-G1

Zhoues / MineDreamer

UCSC-VLAA / vllm-safety-benchmark

AIDC-AI / Wings

shanface33 / GPT4MF_UB

HenryPengZou / ImplicitAVE

andy9705 / SumGD

monurcan / efficient_test_time_scaling

iamaziz / chat_with_images

deeplearning-wisc / mllmshift-emi

manhph2211 / Q-HEART

jagennath-hari / SpatialFusion-LM

abdur75648 / MedicalGPT

autodistill / autodistill-llava

zhudotexe / kani-vision

Masoudjafaripour / nanochat-VLM

Improve this page

Add this topic to your repo