Drop us a line and say hi!
Whether you have questions about pricing, features, or anything else, our team is ready to answer all your questions.
This week's release of OpenAI's GPT-4o has sent a wave of excitement through Click Creative Digital Agency. It signifies a significant leap forward in the world of artificial intelligence. This groundbreaking update to the popular ChatGPT platform marks a significant shift in the accessibility and capabilities of large language models (LLMs).
GPT-4o is freely available to everyone, both online and through a dedicated desktop app. This democratisation of AI technology allows anyone, from students and hobbyists to entrepreneurs and established businesses, to utilise GPT-4o's advanced features. OpenAI, in a recent blog post, explained the "omni" in GPT-4o's name as signifying a step towards more natural human-computer interaction.
But what exactly makes GPT-4o so revolutionary?
The multimodal marvel that is ‘4o’ boasts the ability to "reason across audio, vision, and text in real-time" (OpenAI, 2024). This groundbreaking technology is being rolled out progressively, with both free and paid users gaining access. While initial access within ‘Plus’ user accounts is limited to GPT-4o's text and image functionalities, the highly anticipated voice and video features are slated for a future release.
For web browser users of ChatGPT, accessing GPT-4o is a straightforward process. Simply login and navigate to the drop-down menu in the top left corner. Users fortunate enough to receive the update will find GPT-4o as the default option, identified as OpenAI's "newest and most advanced model."
However, the rollout for mobile and desktop applications is currently proceeding at a more measured pace. As of this writing, GPT-4o remains unavailable on iOS or Android apps, and the recently launched Mac app is still in its early stages of deployment. For Windows users, a dedicated version is planned for "later this year" (OpenAI, 2024).
OpenAI's GPT-4o demo, as seen in the below video, showcased a glimpse into the future of AI interaction. The model's ability to engage in real-time conversational speech and vision-based interaction, allowing it to "see" and converse simultaneously, generated significant excitement. However, these functionalities will require a bit more time before widespread adoption.
Currently, developers like our Studio team have access to GPT-4o within the API as a text and vision model, which differs from the image-based capabilities available to free and paid users since launch.
Regarding the much-anticipated voice features, OpenAI plans to "roll out a new alpha version of Voice Mode with GPT-4o within ChatGPT Plus in the coming weeks" (OpenAI, 2024). Additionally, they intend to "launch support for GPT-4o's new audio and video capabilities to a select group of trusted partners in the API in the coming weeks" (OpenAI, 2024).
This measured rollout strategy, with some of GPT-4o's most captivating features initially restricted to testers and developers among paid users, is entirely understandable. The complex technology powering OpenAI's demos likely necessitates significant processing power, and a wider launch may take time. Nonetheless, the unveiling of GPT-4o presents a significant leap forward in AI accessibility and paves the way for a future filled with enhanced human-computer interaction.
While there are still ethical considerations surrounding the use of LLMs, OpenAI's commitment to open access signifies a shift towards a future where everyone can benefit from the power of AI. As GPT-4o continues to develop, it will be fascinating to see what new possibilities it unlocks for both the Click digital agency Melbourne team and our clients.