Member-only story
How to use Gemini for a variety of multimodal use cases
1 min readFeb 27, 2024
Gemini
Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases.
The Gemini API gives you access to the Gemini Pro Vision and Gemini Pro models.
Vertex AI Gemini API
The Vertex AI Gemini API provides a unified interface for interacting with Gemini models. There are currently two models available in the Gemini API:
- Gemini Pro model (
gemini-pro
): Designed to handle natural language tasks, multiturn text and code chat, and code generation. - Gemini Pro Vision model (
gemini-pro-vision
): Supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.
In the below video, you will learn how to use Gemini for a variety of multimodal use cases
- Detecting objects in photos
- Understanding screens and interfaces
- Understanding charts and diagrams
- Recommendation of images based on user preferences
- Generating a video description
- Extracting highlights/messaging of a video