Google’s Gemini : Busy person’s guide.

Google’s Gemini : Busy person’s guide.

Gemini is Google’s answer to Open-ai’s gpt4 series. It is the best model of Google capable of multi-modality. It is built ground up for multimodality working fine across text , images , videos , audios and code. In many of the benchmarks across text , image , video and audio it has better numbers over GPT4.To be specific , in 30 out of 32 academic benchmarks , gemini – ultra exceeds the state of the art.It comes in three sizes

1) Gemini – Ultra : For higher complex tasks

2) Gemini – Pro : For wide range of tasks across.

3) Gemini – Nano : For on-device tasks

Let’s get straight to using gemini.

In the first cell , we are installing google-generativeai package and importing it.

In the next cell , we are creating gemini-pro model object and then in the following cell configuring the google api key. We can get the key in the below link.

Google API KEY link : htt ps://makersuite.google.com/

And then we can start interacting with the gemini-pro model , Here in example I asked him “What is the capital of India ? ” and it gave me a response New Delhi.

Now , let’s see using gemini-pro-vision model.

Here , we can see we are creating a gemini-pro-vision model object and then passing our prompt and image to the generate content function.Let’s see the image passed and it’s description generated.

Description generated for the image :
“The image shows Mount Rushmore, a famous mountain in South Dakota, USA. The mountain has the faces of four former US presidents carved into it: George Washington, Thomas Jefferson, Theodore Roosevelt, and Abraham Lincoln. The mountain is a popular tourist destination and is considered to be one of the most iconic landmarks in the United States.”

Quite good ha !!!

Hence , this was it for this blog. We quickly had a brief look at what google’s gemini is and then learned how to use it.

References :
https://blog.google/technology/ai/google-gemini-ai/?utm_source=gdm&utm_medium=referral#sundar-note

0 Comments