AI models include Large Language Model (LLM) and Vision-Language Model (VLM). OpenAI ChatPGT became popular in December 2022. LLaVA: Large Language and Vision Assistant was submitted in late 2023. In this research, I would like to compare what's described on Unsplash website and the description output from LLaVA:13b.
As for more advanced Q&A from the images, it's not shown in this research. Computer Vision has been improved so much.
Photo by Shelby Murphy Figueroa from Unsplash
Description from the Unsplash website :
a group of women standing next to each other holding flowers
Description from llava:13b :
The image features a group of seven women standing together
and holding flowers in their hands. They are positioned close
to each other, with some of the women wearing shorts as part
of their outfits. The women appear to be posing for a picture,
showcasing their colorful bouquets of flowers.
Description from the Unsplash website :
modern living room with white furniture and classy lights
Description from llava:13b :
The image shows a modern and stylish living room filled with
various white furniture. There is a couch situated in the middle
of the room, accompanied byseveral chairs arranged throughout the space.
A couple of tables can be seen, one near the seating area and
another closer to the edge of the frame.
Additionally, there are two vases placed within the room, with one positioned
more towards the left side and the other on the right side. The living room also
features a large window that adds natural light to the space, creating an
inviting atmosphere for relaxation and socialization.
Photo by Nick Karvounis from Unsplash
Description from the Unsplash website :
a boat is in the water near a restaurant
Description from llava:13b :
The scene depicts a beautiful beach with a large boat sailing near
the shore. The boat is docked close to the shoreline, creating an
inviting and picturesque atmosphere. Numerous people can be seen
scattered around the beach, enjoying the view and the sunny day.
Some of them are standing near the water's edge, while others are
situated further away in various spots across the scene.
In addition to the boat and people, there is a chair positioned
towards the middle of the image. The combination of the beach setting,
the large boat, and the presence of multiple individuals contributes
to creating a lively and captivating environment for both locals and visitors.
Photo by Kristjan Kotar from Unsplash
Description from the Unsplash website :
four person on mountain during daytime
Description from llava:13b :
The image depicts a group of five people standing on top of a rocky hill
or mountain, enjoying the panoramic view. They are all equipped with backpacks,
suggesting that they may be hikers or travelers exploring the area together.
The scene captures the essence of adventure and camaraderie among the group
members. The people seem to be taking a break from their journey or activity,
perhaps admiring the stunning surroundings or simply appreciating the moment.
Photo by Patrick Amoy from Unsplash
Description from the Unsplash website :
person holding a computer mouse in a room
Description from llava:13b :
The image depicts a room full of computers and people. There is a young man
sitting at a desk, working on a laptop. Another person can be seen sitting
nearby, also using a computer. Several other individuals are present in the
room, some of them engaged with their laptops as well.
Various elements fill the room, including multiple cups scattered around the
space and several chairs placed both close to and farther from the people.
Additionally, there is a clock on the wall above the desks where the
individuals are working. The scene suggests an environment like an office
or study area, with many people collaborating or focusing on their tasks.
LLaVA: Large Language and Vision Assistant https://llava-vl.github.io/
Ollama: Run LLaVA:13b locally https://ollama.ai/