Member-only story

How to create a Multimodal Retail Recommendations system using Gemini Pro Vision

1 min readJan 16, 2024

Gemini Pro Vision Model

Gemini Pro Vision is a powerful multimodal large language model (LLM) developed by Google AI.

It’s part of the larger Gemini family of models, which are known for their ability to handle a variety of tasks across different modalities, including text, images, and videos.

In the below video, you will learn how to

How to use the Gemini Pro Vision model to perform visual understanding
How to consider multimodality in prompting for the Gemini Pro Vision model
How the Gemini Pro Vision model can be used to create retail recommendation applications out-of-the-box

Video Link: https://www.youtube.com/watch?v=mzMfPMV_xSk

Steps:

Task 1. Open Python Notebook and Install Packages
Task 2. Use the Gemini Pro Vision model
Task 3. Visual understanding with Gemini Pro Vision

Done !!!

If you want to know more, you can refer to the below docs

Gemini - Google DeepMind

Gemini is built from the ground up for multimodality - reasoning seamlessly across image, video, audio, and code.

deepmind.google

Thank you :)

How to create a Multimodal Retail Recommendations system using Gemini Pro Vision

Gemini Pro Vision Model

Gemini - Google DeepMind

Gemini is built from the ground up for multimodality - reasoning seamlessly across image, video, audio, and code.

Written by Komal Agrawal

Responses (5)