Google’s Gemini : Busy person’s guide.
Gemini is Google’s answer to Open-ai’s gpt4 series. It is the best model of Google capable of multi-modality. It is built ground up for multimodality working fine across text , images , videos , audios and code. In many of the benchmarks across text , image , video and audio it has better numbers over GPT4.To be specific , in 30 out of 32 academic benchmarks , gemini – ultra exceeds the state of the art.It comes in three sizes
1) Gemini – Ultra : For higher complex tasks
2) Gemini – Pro : For wide range of tasks across.
3) Gemini – Nano : For on-device tasks
Let’s get straight to using gemini.
In the first cell , we are installing google-generativeai package and importing it.
In the next cell , we are creating gemini-pro model object and then in the following cell configuring the google api key. We can get the key in the below link.
And then we can start interacting with the gemini-pro model , Here in example I asked him “What is the capital of India ? ” and it gave me a response New Delhi.
Now , let’s see using gemini-pro-vision model.
Here , we can see we are creating a gemini-pro-vision model object and then passing our prompt and image to the generate content function.Let’s see the image passed and it’s description generated.
Description generated for the image :
“The image shows Mount Rushmore, a famous mountain in South Dakota, USA. The mountain has the faces of four former US presidents carved into it: George Washington, Thomas Jefferson, Theodore Roosevelt, and Abraham Lincoln. The mountain is a popular tourist destination and is considered to be one of the most iconic landmarks in the United States.”
Quite good ha !!!
Hence , this was it for this blog. We quickly had a brief look at what google’s gemini is and then learned how to use it.