Answer follow-up questions

Dimaeiya333 · Post by **Dimaeiya333** » Wed Jan 22, 2025 5:39 am

has surprised users so much with its answers with information so accurate and natural that it is difficult to recognize if the text has been created by AI.

OpenAI says its model, called ChatGPT (Language Mo
Optimization for Dialogue), can interact conversationally, and board members email database due to its dialog format it also makes the following possible:

Admitting mistakes
Questioning incorrect premises
Redo inappropriate requests
The algorithms in this model are able to accurately understand what you are asking, including the variations added to your phrases, adjectives, among others, and respond to you in an accurate, complete, and coherent manner.

How was ChatGPT created?
It is important to know that AI is trained based on text, that is, it is asked certain questions and information is added. In such a way that this system, through corrections over time, is “trained” and improved to automatically carry out the activity or task for which it was created. This is how all AIs are “trained”.

This ChatGPT model is built with over 175 million parameters, and to improve the model's performance they used human trainers for both supervised and reinforcement learning approaches (Reinforcement Learning from Human Feedback (RLHF)), but using the same methods as InstructGPT, although with some differences in the data collection setup.

What is ChatGPT

For supervised learning
The model was equipped with conversations in which coaches or instructors played on both sides, that is, the user and an AI assistant.

To help them draft their responses, trainers were given access to the suggestions written in templates. This new set of dialogue data was then merged with the InstructGPT data, which was then transformed into a dialogue format.

For reinforcement learning
The (human) trainers first ranked the responses that the model had generated in a previous or preceding conversation, i.e. two or more responses from the model ranked by quality. These rankings were then used to create “reward models.”

That is, from the collection of that data, they took conversations that the AI trainers had had with the Chatbot. They then selected a message written by a random model and tested several alternative endings and the AI trainers ranked them.

In addition, they generated several Proximal Policy Optimization (PPO) interactions that allowed further tuning of the model.

In collaboration with the Azure supercomputing infrastructure, the models were trained in collaboration with Microsoft.

InstructGPT or ChatGPT?
InstructGPT had already been demonstrated by the company and was presented as a model capable of following an instruction on notice and providing a detailed response.

This language model can be persuaded to perform natural language tasks based on carefully crafted texts but can produce false results and unreliable information.

So, compared to this model, ChatGPT tries to reduce those misleading or harmful answers. For example, on the company's website, a comparison of answers between both models is made based on a common question asked by a user regarding: "Tell me about when Christopher Columbus came to the United States in 2015."