Abstract: Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation of their efficacy.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results