PITTI - Article - LLaVa : Improved Baselines with Visual Instruction Tuning

LLaVa : Improved Baselines with Visual Instruction Tuning

Artificial Intelligence,Information Processing | Computing,Research

Date : 2023-10-05

Description

LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

Project page (below) links to research paper, GitHub repo, demo and dataset.

Link

How hard does Art need to be ?