You Can Now Talk to ChatGPT, Here’s How

OpenAI, a startup backed by Microsoft, has announced significant upgrades to its popular conversational AI tool, ChatGPT. The latest enhancements add voice and image capabilities to ChatGPT, marking a significant development in its capabilities.

Voice Conversations with ChatGPT

OpenAI's ChatGPT is taking interactivity to the next level by adding voice capabilities. Users will now be able to have verbal conversations with ChatGPT, which can respond in five different voices. These voices have been carefully created with the help of professional voice actors, ensuring a rich and diverse auditory experience.

To achieve this, OpenAI has employed a new text-to-speech model, capable of generating remarkably human-like audio from plain text and short speech samples. This innovation opens doors to a wide range of creative and accessibility-focused applications.

Enabling voice conversations is straightforward. Users need to access the "Settings" menu in the ChatGPT mobile app and select "New Features." From there, they can opt into voice conversations and choose their preferred voice from the available options. It's worth noting that the voice feature will be in beta, initially accessible to ChatGPT app users.

Enhanced Image Recognition

ChatGPT now has the ability to recognize images, which is another significant addition to its repertoire. Users can now show images to ChatGPT to get responses and information. For example, while traveling, you could show ChatGPT a picture of a landmark to learn more about it or show an issue with your smartphone to get help with fixing it.

This image recognition feature uses the multimodal capabilities of GPT-3.5 and GPT-4, allowing users to upload one or more images to facilitate interactive queries.

Collaborations and Availability

OpenAI is not only enhancing the capabilities of ChatGPT but also collaborating with other companies to harness this new technology. Spotify, for instance, is working with OpenAI to use ChatGPT's voice capabilities for voice translation in podcasts, making content more accessible to a wider audience.

The rollout of these features is set to begin within the next two weeks and will initially be available to Plus and Enterprise users. Voice conversations will be accessible on both Android and iOS platforms, while image recognition will be available on all platforms by default.

Via

Source