Member-only story

How to use Gemini for a variety of multimodal use cases

1 min readFeb 27, 2024

Gemini

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases.

The Gemini API gives you access to the Gemini Pro Vision and Gemini Pro models.

Vertex AI Gemini API

The Vertex AI Gemini API provides a unified interface for interacting with Gemini models. There are currently two models available in the Gemini API:

Gemini Pro model (gemini-pro): Designed to handle natural language tasks, multiturn text and code chat, and code generation.
Gemini Pro Vision model (gemini-pro-vision): Supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.

In the below video, you will learn how to use Gemini for a variety of multimodal use cases

Detecting objects in photos
Understanding screens and interfaces
Understanding charts and diagrams
Recommendation of images based on user preferences
Generating a video description
Extracting highlights/messaging of a video

How to use Gemini for a variety of multimodal use cases

Gemini

Vertex AI Gemini API

Written by Komal Agrawal

Responses (2)