Is Google's Gemini Changing Artificial Intelligence? A Student Reaction
OpenAI’s ChatGPT is Released to the Public
In 2023, OpenAI released ChatGPT to the public. Large language models (LLM) provided ChatGPT with the ability to generate text at any length, in any format, and at varying levels of detail. As a computer engineer, I was fascinated by ChatGPT’s capabilities and I wanted to know more.
ChatGPT stands for, chat generative pre-trained transformer, which is a technology engineered by programmers, at OpenAI. Since ChatGPT is still new to the public, insider information can only be found through artificial intelligence (AI) educational courses. So, I enrolled in the most updated courses on AI, and here’s what those courses were.
I received my first introduction to AI development two summers ago, through LinkedIn Learning’s ‘Explore a Career in Machine Learning Engineering learning path’ (ECMLE), a LinkedIn service I have access to as a student. ECMLE taught me the basic concepts for understanding AI’s use in application development. Currently, I am using what I learned, from ECMLE, to build a Calorie Checker application. Naturally, my interest in AI grew, and this is when I decided to take my next course.
I noticed that a large part of ChatGPT’s success derived from its ability to understand and interpret human ideas and emotions. But I didn’t quite understand the technology, so as generative models continued to burst in popularity I took advantage of Google Cloud Skills Boost’s ‘Generative AI learning path’ to further my knowledge.
From then, I took courses on artificial intelligence and neural networking, machine-learning processes modeled after the human brain. Generative AI models like GPT-4 are forms of neural networks modeled and trained for specific purposes. These courses taught me how to use AI to solve human problems, even when contingencies such as unique needs, situations, and experiences, were involved. I continue to self-educate each day, because of the constant reminder that AI is always improving. I plan on getting ahead in AI information so that I can be a developer of the technology that is taking the world by storm.
Google’s Gemini Introduces Multimodal Technology
As finals week approaches for computer engineering majors, the campus at the University of Miami is quiet. During this time, I naturally gravitate toward news on tech, and recently artificial intelligence has been trending. During a study break, I stumbled on news that Google has released a new artificial intelligence model family, called Gemini.
Since Google’s Input Output (I/O) conference, in May 2023, reports of Gemini’s specifications, capabilities, and release date have been unclear. Just two days ago, I got an update from the TLDR AI newsletter, claiming that Gemini would not be released until January 2024. Since I only knew what rumors were saying about Gemini, I dove into some research.
Google’s Gemini differs from OpenAI’s ChatGPT by the architecture of the neural network itself. By training the model on multiple types of data at once, Gemini introduces a new “multimodal” form of data processing that interleaves textual, audio, and visual inputs, and can produce interleaved images and text as output. This means that AI’s interpretive technology has more abilities and answers for users of Google’s Gemini.
Google’s Gemini is offered in families of three different sizes: Gemini Ultra, Gemini Pro, and Gemini Nano. In direct competition with OpenAI’s GPT-4, Gemini Ultra; optimized for performance and adaptability, Gemini Pro; and finally, the ultra-efficient model currently incorporated in Google’s Pixel 8 Pro, Gemini Nano.
The training of these models, on Google’s TPUv5e and TPUv4, required massive innovations in algorithms, datasets, and infrastructures, which allowed the joint training of these natively multimodal models. Announced alongside the model family is the newest TPUv5p, which should prove to allow for further technological innovations in the future. The invention of Gemini answers the question of whether the mixing of inputs can produce a model with strong capabilities in each domain when compared to single-domain models.
“With an [accuracy] score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities” (The Keyword).
In the future, what sort of innovation can we expect from multimodal AI? Now that multiple forms of input can be artificially understood at once, we step into artificial general information (AGI) territory. We as humans intake information into all five of our senses at once, and the idea of a single AI that can intake and interpret more and more types of information that we as humans learn to create and distribute virtually brings about so many new questions.
How long is it until we have AI that can interpret and produce scents? Or compare the textures of different materials through touch? The idea of interleaving multiple forms of information input and output at once, and the proof that such an idea can be artificially replicated in a form that is competitive truly does mark the next level of artificial intelligence innovation.
Learning From Artificial Intelligence
In only a year, the world has witnessed rapid change in information processing technology. This is why I dedicate myself to innovation in what I learn every day. As AI improves and becomes a larger part of our daily lives, I envision a future that does not yet exist; a future that constantly changes as we continue to advance and explore the scientific limits of our universe. For now, I stick to learning the skills that will push the world to the next level.
Hey reader,
I asked Google Bard for suggestions about this blog post and it recommended that I provide a call to action and links to the work discussed. So, for more information about the new technology, click here for the technical report, here for the blog post, and here for videos and more information.
I would love it if you shared and interacted with this post; send me an email; connect with me on Linkedin! Want to work together? Here’s my resume. I’m a student at the University of Miami and I would love to help you.