Author: Om Kamath

Om Kamath

Gemini 1.5 Flash vs GPT-4o: Google’s Response to GPT-4o?

The AI race has intensified, becoming a catch-up game between the big players in tech. The launch of GPT-4o just before Google I/O is no coincidence. GPT-4o’s incredible capabilities in multimodality, or omnimodality to be precise, have created a significant impact in the Generative AI competition. However, Google is not one to hold back. During Google I/O, they announced new variants of their Gemini and Gemma models. Among all the models announced, the Gemini 1.5 Flash stands out as the most impactful. In this blog, we will explore the top features of the Gemini 1.5 Flash and compare it to the Gemini 1.5 Pro and Gemini 1.5 Flash vs GPT-4o to determine which one is better.

Comparison of Gemini 1.5 Flash vs GPT-4o

Based on the benchmark scores released by Google, the Gemini 1.5 Flash has superior performance on audio compared to all other LLMs by Google and is on par with the outgoing Gemini 1.5 Pro (Feb 2024) model for other benchmarks. Although we would not recommend relying completely on benchmarks to assess the performance of any LLM, they help in quantifying the difference in performance and minor upgrades.

Gemini 1.5 Flash Benchmarks

The elephant in the room is the cost of the Gemini 1.5 Flash. Compared to GPT-4o, the Gemini 1.5 Flash is much more affordable.

Price of Gemini

Price of Gemini

Price of GPT

Context Window

Just like the Gemini 1.5 Pro, the Flash comes with a context window of 1 million tokens, which is more than any of the OpenAI models and is one of the largest context windows for production-grade LLMs. A larger context window allows for more data comprehension and can improve third-party techniques such as RAG (Retrieval-Augmented Generation) for use cases with a large knowledge base by increasing the chunk size. Additionally, a larger context window allows more text generation, which is helpful in scenarios like writing articles, emails, and press releases.

Multimodality

Gemini-1.5 Flash is multimodal. Multimodality allows for inputting context in the form of audio, video, documents, etc. LLMs with multimodality are more versatile and open the doors for more applications of generative AI without any preprocessing required.

“Gemini 1.5 models are built to handle extremely long contexts; they have the ability to recall and reason over fine-grained information from up to at least 10M tokens. This scale is unprecedented among contemporary large language models (LLMs), and enables the processing of long-form mixed-modality inputs including entire collections of documents, multiple hours of video, and almost five days long of audio.” — DeepMind Report

Multimodality

Dabbas = Train coach in Hindi. Demonstrating the Multimodality and Multilingual performance.

Having multimodality also allows us to use LLMs as substitutes for other specialized services. For eg. OCR or Web Scraping.

OCR on gemini

Easily scrape data from web pages and transform it.

Speed

Gemini 1.5 Flash, as the name suggests, is designed to have an edge over other models in terms of response time. For the example of web scraping mentioned above, there is approximately a 2.5-second difference in response time, which is almost 40% quicker, making the Gemini 1.5 Flash a better choice for automation usage or any use case that requires lower latency.

Speed on Gemini 1.5 Pro

Some interesting use-cases of Gemini 1.5 Flash

Summarizing Videos


Writing Code using Video

Automating Gameplay

GPT-4o: OpenAI Unveils Its Latest Language Model, Available for Free to Users

GPT-4o

After a ton of speculation on social media and other forums about what OpenAI has in store for us, yesterday, OpenAI finally revealed their latest and most powerful LLM to date — GPT-4o (‘o’ for omni). In case you missed the launch event of GPT-4o, let’s go over the capabilities of GPT-4o and the features it offers.

Enhanced Audio, Text and Vision Capabilites

GPT-4 Turbo is a powerful model, but it comes with one drawback — latency. When compared to GPT-3.5 Turbo, GPT-4 Turbo is still considerably slower. GPT-4o addresses this drawback and is 2x faster than GPT-4 Turbo. This opens up a broader spectrum of use cases involving the integration of data from speech, text, and vision, taking it one step further from multi-modal to omni-modal. The main difference between multi-modal and omni-modal is that in omni-modal, all three sources can be seamlessly run in parallel.

These enhancements also enable the model to generate speech with improved voice modulation, the capability to understand sarcasm, and enhanced natural conversational abilities.

Reduced pricing and available for free to ChatGPT users

Although GPT-4o is more efficient and faster compared to the outgoing GPT-4 Turbo, it is half the price (API) of GPT-4 Turbo, meaning that GPT-4o will cost US$5.00/1M input tokens and US$15.00/1M output tokens. With the better pricing, the context window is now 128k tokens, and the knowledge cutoff is October 2023.

As a cherry on top, GPT-4o will be available to all ChatGPT users for free (ChatGPT Plus users will have 5x cap for GPT-4o). Alongside this, OpenAI also unveiled the ChatGPT desktop app, which will allow users to make use of the vision capabilities of GPT-4o to read and comprehend the content being displayed on the screen. Users will also be able to talk to ChatGPT using the desktop app.

GPT-4o Demo

 

OpenAI stated that they are rolling out access to GPT-4o in stages over the next few weeks, with ChatGPT Plus users receiving priority and early access to the model. We will understand the true potential of this model only once we get access to it in the coming weeks. Exciting times ahead!

Groq and Llama 3: A Game-Changing Duo

A couple of months ago, a new company named ‘Groq’ emerged seemingly out of nowhere, making a breakthrough in the AI industry. They provided a platform for developers to access LPUs as inferencing engines for LLMs, especially open-source ones like Llama, Mixtral, and Gemma. In this blog, let’s explore what makes Groq so special and delve into the marvel behind LPUs.

What is Groq?

“Groq is on a mission to set the standard for GenAI inference speed, helping real time AI applications come to life today.” — The Groq Website

Groq isn’t a company that develops LLMs like GPT or Gemini. Instead, Groq focuses on enhancing the foundations of these large language models—the hardware they operate on. It serves as an ‘inference engine.’ Currently, most LLMs in the market utilize traditional GPUs deployed on private servers or the cloud. While these GPUs are expensive and powerful, sourced from companies like Nvidia, they still rely on traditional GPU architecture, which may not be optimally suited for LLM inferencing (though they remain powerful and preferred for training models).

The inference engine provided by Groq works on LPUs — Language Processing Units.

What is an LPU?

A Language Processing Unit is a chip specifically designed for LLMs and it is built on a unique architecture combining CPUs and GPUs to transform the pace, predictability, performance and accuracy of AI solutions for LLMs.

LPUs Language Processing Unit of Groq

Key attributes of an LPU system. Credits: Groq

An LPU system has as much or more compute as a Graphics Processor (GPU) and reduces the amount of time per word calculated, allowing faster generation of text sequences.

Features of an LPU inference engine as listed on the Groq website:

  • Exceptional sequential performance
  • Single core architecture
  • Synchronous networking that is maintained even for large scale deployments
  • Ability to auto-compile >50B LLMs
  • Instant memory access
  • High accuracy that is maintained even at lower precision levels

Services provided by Groq:

  1. GroqCloud: LPUs on the cloud
  2. GroqRack: 42U rack with up to 64 interconnected chips
  3. GroqNode: 4U rack-ready scalable compute system featuring eight interconnected GroqCard™ accelerators
  4. GroqCard: A single chip in a standard PCIe Gen 4×16 form factor providing hassle-free server integration

“Unlike the CPU that was designed to do a completely different type of task than AI, or the GPU that was designed based on the CPU to do something kind of like AI by accident, or the TPU that modified the GPU to make it better for AI, Groq is from the ground up, first principles, a computer system for AI”— Daniel Warfield, Towards Data Science

To know more about how LPUs differ from GPUs, TPUs and CPUs, we recommend reading this comprehensive article written by Daniel Warfield for Towards Data Science.

What’s the point of Groq?

LLMs are incredibly powerful, capable of tasks ranging from parsing unstructured data to answering questions about the cuteness of cats. However, their main drawback currently lies in response time. The slower response time leads to significant latency when using LLMs in backend processes. For example, fetching data from a database and displaying it in JSON format is currently much faster when done using traditional logic rather than passing the data through an LLM for transformation. However, the advantage of LLMs lies in their ability to understand and handle data exceptions.

With the incredible inference speed offered by Groq, this drawback of LLMs can be greatly reduced. This opens up better and wider use-cases for LLMs and reduces costs, as with an LPU, you’ll be able to deploy open-source models that are much cheaper to run with really quick response times.

Llama 3 on Groq

A couple of weeks ago, Meta unveiled their latest iteration of the already powerful and highly capable open-source LLM—Llama 3. Alongside the typical enhancements in speed, data comprehension, and token generation, two significant improvements stand out:

  1. Trained on a dataset 7 times larger than Llama 2, with 4 times more code.
  2. Doubled context length to 8,000 tokens.

Llama 2 was already a formidable open-source LLM, but with these two updates, the performance of Llama 3 is expected to rise significantly.

Llama 3 Benchmarks

Llama 3 Benchmarks

To test Llama 3, you have the option to utilize Meta AI or the Groq playground. We’ll showcase the performance of Groq by testing it with Llama 3.

Groq Playground

Currently, the Groq playground offers free access to Gemma 7B, Llama 3 70B and 8B, and Mixtral 8x7b. The playground allows you to adjust parameters such as temperature, maximum tokens, and streaming toggle. Additionally, it features a dedicated JSON mode to generate JSON output only.

Only 402ms for inference at the rate of 901 tokens/s

Only 402ms for inference at the rate of 901 tokens/s

Only 402ms for inference at the rate of 901 tokens/s

Coming to the most impactful domain/application in my opinion, data extraction and transformation:

Asking the model to extract useful information and providing a JSON using the JSON mode.

Asking the model to extract useful information and providing a JSON using the JSON mode.

The extraction and transformation to JSON format was completed in less than half a second.

The extraction and transformation to JSON format was completed in less than half a second.

Conclusion

As demonstrated, Groq has emerged as a game-changer in the LLM landscape with their innovative LPU Inference Engine. The rapid transformation showcased here hints at the immense potential for accelerating AI applications. Looking ahead, one can only speculate about the future innovations from Groq. Perhaps, an Image Processing Unit could revolutionize image generation models, contributing to advancements in AI video generation. Indeed, it’s an exciting future to anticipate.

Looking ahead, as LLM training becomes more efficient, the potential for having a personalized ChatGPT, fine-tuned with your data on your local device, becomes a tantalizing prospect. One platform that offers such capabilities is Cody, an intelligent AI assistant tailored to support businesses in various aspects. Much like ChatGPT, Cody can be trained on your business data, team, processes, and clients, using your unique knowledge base.

With Cody, businesses can harness the power of AI to create a personalized and intelligent assistant that caters specifically to their needs, making it a promising addition to the world of AI-driven business solutions.

Top 5 Free Open Source LLMs in 2024

LLMs are ubiquitous nowadays, needing no introduction. Whether you’re in tech or not, chances are you’ve encountered or are currently using some form of LLM on a daily basis. The most prominent LLMs at present include GPT from OpenAI, Claude from Anthropic, and Gemini from Google.

However, these popular LLMs often operate as abstract or black-box systems, raising concerns about data privacy and transparency. To address such issues, several open-source LLMs are available, allowing users to deploy them on private machines or servers with peace of mind.

Open source refers to software or products distributed with their source code freely available for inspection, modification, and distribution. This accessibility empowers users to understand, enhance, and contribute to the development of the software.

Here are some of the best open source LLMs currently available:

Llama 2

LLaMA 2: Meta's Open Source AI Model

Llama 2 is an open-source LLM developed by Meta, offered free for commercial and research purposes. Llama 2 models are trained on two trillion tokens and boast double the context length of Llama 1.

The model’s parameters directly impact its ability to comprehend text, with larger models offering better performance at the cost of increased size and resource requirements.

Variants Available: 7B, 13B, and 70B parameters

Context Window: 4096 Tokens

Languages Supported: Performs best in English

Mixtral 8x7B

Mistral AI unveils Mixtral 8x7B, an open-source LLM that it claims is on  par with GPT 3.5

Mixtral 8x7B, developed by Mistral AI, is an LLM containing 46.7B total parameters. Despite its size, it maintains inference speed and cost similar to models one-third its size. This decoder-only Transformer Mixture of Experts (MoE) model significantly outperforms LLama 2 and GPT-3.5 in certain benchmarks.

Variants Available: Tiny, Small, Medium, and Large (Ranked from cost-efficient to high performance)

Context Window: 32000 Tokens (On Mistral Large)

Languages Supported: English, French, Spanish, German, Italian (On Mistral Large)

Falcon

UAE's Technology Innovation Institute Launches Open-Source "Falcon 40B"  Large Language Model for Research & Commercial Utilization | Technology  Innovation Institute

Falcon, developed by the Technology Innovation Institute (TII) in Abu Dhabi, is another leading open source LLM. Following its launch, Falcon 40B held the #1 position on Hugging Face’s leaderboard for open source large language models (LLMs) for two months. With the 180B variant, TII further enhances the model’s knowledge and data comprehension abilities. Falcon 180B is a super-powerful language model trained on 3.5 trillion tokens.

Variants Available: Falcon 40B and Falcon 180B

Context Window: 4096 Tokens

Languages Supported: English, German, Spanish, French, with limited support for Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish.

BLOOM

BLOOM

BLOOM is an autoregressive Large Language Model (LLM) developed by Big Science. Trained on 176B parameters, BLOOM excels at generating text continuations from prompts using vast amounts of text data and industrial-scale computational resources.

Variants Available: bloom-560m, bloom-1b1, bloom-1b7, bloom-3b, bloom-7b1, bloom 176B

Context Window: 2048 Tokens

Languages Supported: 46 natural languages (with varying amounts of data, from 30% for English to 0.00002% for Chi Tumbuka)

Gemma

Gemma] Building AI Assistant for Data Science 🤖

Gemma, Google’s latest state-of-the-art open LLM, follows the success of Gemini. Gemma is a family of open-weights Large Language Models (LLM) by Google DeepMind, built on Gemini research and technology. While the model weights are freely accessible, specific terms of use, redistribution, and variant ownership may vary and might not be based on an open-source license.

Variants Available: Gemma 2B and Gemma 7B

Context Window: 8192 Tokens

Languages Supported: English

Conclusion

We at Cody prioritize a model-agnostic approach when it comes to LLMs, offering a platform that empowers you to build personalized bots tailored to your unique use-case. With a diverse range of LLM options available, you’re not restricted to a single provider, giving you the freedom to choose the best fit for your requirements.

Through Cody, businesses can leverage AI to develop intelligent assistants customized to their precise needs. This flexibility makes Cody a promising addition to the realm of AI-driven business solutions.

ChatGPT Killer? What Gemini 1.5 Means for Google’s AI Future

Google vs OpenAI: Is Google Winning?

After missing the mark with Bard in the AI hype train, Google recently unveiled their latest AI product, Gemini. As part of this launch, Bard has been rebranded as Gemini and now incorporates the new Gemini Pro LLM. Let’s delve deeper to grasp the extent of these changes.

What is Gemini AI?

Gemini represents Google’s newest Large Language Model (LLM), following the release of LaMDA and PaLM. Unlike its predecessors, Gemini is natively multimodal, capable of understanding text, images, speech, and code, and boasts enhanced comprehension and reasoning abilities.

Variants of Gemini AI

The Gemini AI consists of three Large Language Models:

  1. Gemini Nano: Optimized for on-device efficiency, delivering rapid AI solutions directly on your personal device.
  2. Gemini Pro: A versatile and scalable model, adept at tackling diverse tasks with robust performance. Accessible on the free version of the Gemini chat interface.
  3. Gemini Ultra: The pinnacle of the Gemini series, empowering complex problem-solving and advancing the frontiers of AI capabilities. Exclusive to subscribers of the Google One AI Premium Plan.

Gemini models were trained using TPUv5e and TPUv4, depending on their sizes and configuration. Training Gemini Ultra used a large fleet of TPUv4 accelerators owned by Google across multiple data-centers. This represents a significant increase in scale over their prior flagship model PaLM-2 which presented new infrastructure challenges.

Comparing Gemini With Other LLMs

Textual Understanding

Comparison of Gemini with other LLMs

Source: Google Deepmind

Image Understanding

Comparison of Gemini with other LLMs

Source: Google Deepmind

Read more about it here.

Benefits of Gemini

1. Seamless integration with all Google Apps

Gemini now seamlessly integrates with all Google Apps, including Maps, YouTube, Gmail, and more. To query specific apps, simply prefix the app name with ‘@’ followed by your query. While similar integrations are achievable on ChatGPT using GPTs and Plugins, they may not offer the same level of seamlessness as Gemini’s native integrations.

Gemini Integration

Google’s renowned expertise in search engine technology undoubtedly extends to enhance Gemini’s web-browsing capabilities. Leveraging foundational strengths in search algorithms and indexing, Gemini offers users a seamless and efficient browsing experience.

2. Multimodal capabilities

Gemini now provides multimodal capabilities, including image understanding, on the Gemini chat interface at no extra cost. While its performance during testing was decent, it may not match the accuracy of GPT-4V. Nevertheless, given that it’s free, we can’t really complain, can we? 😉 There’s a chance that Gemini Ultra may outperform GPT-4V based on the metrics

Gemini Multimodal

3. Free Access to Hobbyists and Students

For aspiring LLM developers looking to dive into the field but facing constraints with accessing GPT APIs due to costs, Google offers free access to the Gemini Pro 1.0 API. With this, you can make up to 60 queries per minute on Google AI Studio, a free web-based developer tool. Google AI Studio allows you to swiftly develop prompts and obtain an API key for app development. By signing into Google AI Studio with your Google account, you can take advantage of this free quota. It’s an excellent opportunity to kickstart your LLM journey and explore embeddings, vector databases, semantic search, and more.

Google AI Studio

4. Value for Money

For $20 per month, users can access GPT-4 via ChatGPT Plus. Alternatively, for the same price, they can access Gemini Advanced with Gemini Ultra 1.0, which includes additional benefits such as 2TB of cloud storage and integration with Google Apps like Gmail and Docs. However, accessing Gemini Advanced requires a subscription to the Google One AI Premium Plan. Despite this requirement, it offers greater value for your money.

Google One Plans

Introducing a mid-tier plan with 500 GB of storage and access to Gemini Advanced between the Standard and Premium Plans would significantly enhance the accessibility of Gemini, especially for students and users with moderate storage requirements. Google, if you’re listening, please consider this suggestion.

What’s Next for Gemini?

Google’s DeepMind is continuously advancing the Gemini Model, with the recent rollout of Gemini Pro 1.5 just a week ago. In this updated variant, the context window has been expanded to 128,000 tokens. Additionally, a select group of developers and enterprise customers can now experiment with even larger context windows of up to 1 million tokens through private previews on AI Studio and Vertex AI. To put this into perspective, a typical non-fiction book contains around 300,000 tokens. With the Gemini Pro 1.5’s 1 million token context window, users can now upload entire books in query requests—a remarkable advancement compared to GPT-4’s 128,000 token context window.

Amidst the saturation of LLMs in the AI industry, Google appears to have struck gold with its enhanced architecture, swift responses, and seamless integration within the Google ecosystem this time. It could indeed be a step in the right direction, keeping OpenAI and other competitors on their toes.

In this AI era, it is crucial for businesses to have well-trained employees, and incorporating AI for employee training can be a significant investment. If you are seeking AI solutions to train your employees, Cody is the right tool for you. Similar to ChatGPT and Gemini, Cody can be trained on your business data, team, processes, and clients, using your unique knowledge base. Cody is model-agnostic making it easier for you to switch models as per your requirements.

With Cody, businesses can harness the power of AI to create a personalized and intelligent assistant that caters specifically to their needs, making it a promising addition to the world of AI-driven business solutions.

3 Compelling Reasons to Hire an AI Employee for Your Business

Revolutionize your workplace with AI

Why Your Business Needs An AI Employee Today

There’s no denying the transformative power of AI solutions like ChatGPT in modern workplaces. From streamlining email drafting to providing mental health support, ChatGPT is revolutionizing how we approach everyday tasks. However, it’s not without its limitations, such as a lack of customization to your specific business knowledge base. Enter Cody, your no-code, hassle-free solution for bringing the best of AI into your organization.

Let’s explore three ways AI can benefit your organization:

Training: From Static to Dynamic

Traditional training methods often involve static, pre-defined flows that are not only less engaging but also not necessarily tailored for your business needs. By leveraging AI, you can bring dynamism and interactivity to your employee training programs.

With Cody, it’s as simple as uploading your existing training documents—whether they’re PDFs or Word documents. Choose from pre-made bot templates or use the advanced bot builder to customize Cody’s personality to your liking. In just a few easy steps, you’ll have a personalized onboarding coach that caters to each employee’s needs, thereby enhancing the effectiveness and intuitiveness of your training programs.

Searching: Making Knowledge Accessible

What’s the point of having a well-documented business knowledge base if your employees spend ages sifting through data? AI-powered solutions like Cody transform the way information is accessed within your organization, functioning like an internal search engine.

Once your business knowledge is uploaded into Cody, any query made in natural language will be met with a precise, coherent response generated from your specific data. It’s like having a 24/7 human expert ready to address all your inquiries. Gone are the days of aimless searching through endless data.

Automating: Simplifying Workflows

Our latest update allows you to take automation to the next level. Cody now integrates seamlessly with Zapier, enabling you to construct AI-powered automated workflows that are not just efficient, but user-friendly too. By automating routine tasks, you’re freeing up your employees to focus on more meaningful work. And with Cody’s AI capabilities, the generated content is on par with what a human could produce, if not better.

Zapier is a tool that enables you to connect Cody with more than 5,000 apps, opening up a world of endless possibilities.

The Future Is Now, and It’s Cody

We’ve delved into the transformative power of AI in the workplace, focusing on its impact on training, searching, and automating workflows. With platforms like Cody, the future is not a distant reality; it’s happening here and now. The integration of AI offers not only streamlined operational efficiency but also a meaningful reduction in costs and an enhancement in employee satisfaction.

So why wait? Whether you’re a startup looking to scale or an established company aiming to modernize, now is the perfect time to embrace AI solutions. With compelling benefits and a proven track record, Cody offers a hassle-free, no-code option for those looking to take the leap into the future of work.

Don’t miss the opportunity to revolutionize your workplace dynamics. Click here to start your journey with Cody and discover a world of efficiency and innovation that you never thought possible.