Vision language model (4-bit weight quantization)

Chat with text or text+image.

Message