Member-only story

How to create a Multimodal Retail Recommendations system using Gemini Pro Vision

Komal Agrawal
1 min readJan 16, 2024

Gemini Pro Vision Model

Gemini Pro Vision is a powerful multimodal large language model (LLM) developed by Google AI.

It’s part of the larger Gemini family of models, which are known for their ability to handle a variety of tasks across different modalities, including text, images, and videos.

In the below video, you will learn how to

  • How to use the Gemini Pro Vision model to perform visual understanding
  • How to consider multimodality in prompting for the Gemini Pro Vision model
  • How the Gemini Pro Vision model can be used to create retail recommendation applications out-of-the-box

Video Link: https://www.youtube.com/watch?v=mzMfPMV_xSk

Steps:

  • Task 1. Open Python Notebook and Install Packages
  • Task 2. Use the Gemini Pro Vision model
  • Task 3. Visual understanding with Gemini Pro Vision

Done !!!

If you want to know more, you can refer to the below docs

Thank you :)

--

--

Komal Agrawal
Komal Agrawal

Written by Komal Agrawal

Test Engineer @HCLTech, GCP DevOps Certified, Reader & Writer

Responses (5)