What is ChatGPT: An Easy Explanation for Non-Techies

Nguyen Hong Phuc• February 7, 2023 06:43

For a normal user, ChatGPT is simply a website to chat, talk about all kinds of topics with a virtual bot.

ChatGPT is currently one of the hottest keywords on social networks. However, not everyone can clearly understand the nature of this AI program. Below, VietNamNet Newspaper would like to send readers an article by security expert Nguyen Hong Phuc about ChatGPT, with the aim of explaining it easily to those who do not know about technology.

A simple understanding of ChatGPT

For a normal user, ChatGPT is simply a website to chat, talk about all kinds of topics with a virtual bot.

This bot was created by OpenAI, a company founded by Elon Musk in 2015, with the initial mission of "preventing the dangers of AI".

How is ChatGPT created?

ChatGPT is an artificial intelligence computer program. Technically, people often call it Model AI, in Vietnamese it is "artificial intelligence data model", but in fact it is still digital data running on a computer, so calling it a program is not wrong.

The word Model AI consists of two parts: Model (Data model) and AI (Artificial intelligence). The literal meaning is Intelligence comes from data, which means that with more data, intelligence will arise.

The process of creating an AI Model is a process consisting of the following steps: Data collection, data selection, data labeling for training, and training.

Teaching AI is basically easy, like this dialogue:

Question: What is your name?

Answer: My name is ChatGPT

Question: What is VietNamNet?

Answer: VietNamNet is an electronic newspaper in Vietnam.

Then we teach the AI to remember this information (training), then save the AI's memorized brain as an AI Model (model checkpoint). Later, when using it, load the brain with the memory containing the above information (inference) into the computer, you just need to ask the corresponding question, then the AI will remember the knowledge that has been taught and answer "exactly what it was taught".

In fact, over the past decades, AI has been specialized in many specific jobs such as: AI to support aircraft construction, AI to simulate combat, AI in games... but almost no large companies have invested in AI in the language sector. It was not until 2017 that there was a technological breakthrough that made AI training dramatically more effective, especially language AI.

Language, specifically writing, is the achievement that creates human civilization. Humans contain their knowledge in writing. Understanding language (writing) is understanding human knowledge. This is the core point that creates linguistic AI. Before 2017, it was very difficult for humans to make computers understand the meaning of a meaningful sentence.

So what's in 2017?

In August 2017, scientists at Google, specifically the Google Brain unit, Google's AI research unit since 2011, invented an algorithm called Transformer (the name of the algorithm is very similar to the famous movie in the cinema field, Robot Wars).

The Transformer algorithm is a breakthrough, specifically a breakthrough in language AI training. Before this algorithm, if humans wanted to teach AI, they had to create a training dataset with question-answer pairs (labeling data) as mentioned above, and machines actually only memorized the question-answer pairs but did not "understand" the meaning of the sentence, a huge difference between rote learning and understanding.

It's even easier to understand that after 2017, we just need to pour in as much text data as possible, the computer will automatically figure out what what we pour in means instead of us having to tell them the meaning.

Quoted from Google's Transformer announcement document: "With transformers, computers can see the same patterns humans see."

Google was kind enough to release the detailed documentation of the Transformer algorithm publicly for everyone to access. At the same time, it provided Open-Source rights for this algorithm. So the entire AI scientific community benefited from Google's invention. Among them was OpenAI, a company founded in 2015 and did not have any outstanding achievements until after 2017.

After Google announced Transformer, a few months later, the first language AIs based on this new algorithm were born in droves. In January 2018, OpenAI released the first AI based on Transformer, GPT-1, and they applied it very quickly, faster than Google itself.

GPT stands for Generative Pre-trained Transformer which means "Generative Pre-trained Transformer program"

This AI GPT was created with the main purpose of "Generating Words". Specifically, you will play a word association game with it, you write a sentence, it will read that sentence and then based on the knowledge it is storing in its memory, "generate words" to continue the sentence you wrote.

For example:

You entered: Vietnam is

ChatGPT: Vietnam is a country located in Southeast Asia...

This is the seemingly "magical" thing: You chat a sentence with ChatGPT and it says a sentence back. Actually it is not answering you but it is playing word connection by "Generating Words" to continue the meaning of the sentence you typed in the chat with it.

GPT-1 is the first generation of ChatGPT. This GPT-1 is a pretty small AI, small in both size and complexity.

In the world of Linguistic AI, people measure complexity - corresponding to the "intelligence" level of the AI - by a unit called Hyper Parameters. This concept can be roughly explained as how many layers of meaning this AI understands the meaning of all the texts used to teach it.

To get answers like these, scientists at OpenAI collected a large amount of human written text.

To train this AI GPT, scientists at OpenAI collected a large amount of human written text, mostly from Wikipedia, encyclopedias, major and public newspapers, the volume is somewhere around hundreds of GB and hundreds of millions of documents. After collecting, they cleaned and selected the content. Then they gave those documents to the AI to read, made it read many, many times, each time it read that block of data, it saw a layer of meaning behind those words, the more times, the more layers of meaning.

AIs are trained to reach a level of deep understanding of human written language, leading to a very serious problem that to date no AI scientist has a solution for.

Calculate "True" or "False". AI cannot understand what is "True" or "False".

AI can see many layers of meaning in a sentence, but cannot "understand whether the meaning is right or wrong". Because right and wrong are relative, for humans it is fragile and controversial, even causing fights between humans.

Besides, the huge amount of text data that scientists at OpenAI collect to train AI is not all "correct" biased and contains "correct" information according to human social standards, because the amount of data is too large beyond their ability to select.

For example, they may collect texts stating that the earth is round, and they may also collect texts stating that the earth is flat. Data contains both true and false information. When AI reads and rereads those texts to find layers of meaning, it will also find the "true" and "false" meanings, but AI has no consciousness to recognize which meaning - which information is true and which meaning - which information is false. AI simply memorizes everything. When asked later, it will simply answer from its memory that information, without distinguishing between right and wrong.

Companies like Google, Facebook, IBM, Microsoft have repeatedly announced breakthrough Language AIs in answering human input questions, but quickly deleted that AI. You can search for articles about this on the internet from major newspapers. Mostly because that AI answers some questions with a bias towards an unacceptable "Wrong" meaning in terms of current human social standards such as respect for gender, respect for religion, respect for ethnicity, the accuracy of events that have happened, truths that humans have agreed to be true...

Large companies all adhere to the standards of information accuracy, they evaluate that AI cannot solve the problem of Right - Wrong perception, so it is best not to go public.

GPT-3 is the same, it also creates paragraphs that violate human standards of "Right-Wrong", even wrong to the point of being unacceptable.

GPT-3 was on the way to becoming popular when the Covid-19 pandemic broke out globally. The epidemic situation became more and more tense from mid-2020, and the pandemic information completely overwhelmed information about GPT-3.

The AI GPT-3 and OpenAI were forgotten by the public until the end of 2022. OpenAI decided to do a marketing program to see if it could revive interest in Language AI?

So they modified the AI GPT-3 into ChatGPT, making it easier to use, instead of coming in the form of a website where people type in words, edit parameters, and then get back a paragraph of connected words, ChatGPT comes in the form of a Chat program, with a chat box to enter questions, the AI ChatGPT plays the game of Generating Words Connecting Words with that question, but in the form of an answer.

To summarize ChatGPT's formula for success in the past month: A Language AI trained deeply enough to generate meaningful sentences that are convincing enough for readers + the unethical nature of an AI technology company + suitable UI/UX (Chat) = ChatGPT.

AI can see many layers of meaning in a sentence, but cannot "understand whether that meaning is right or wrong".

(Expert Nguyen Hong Phuc)

What is ChatGPT: An Easy Explanation for Non-Techies

WHO

Chat

OPENAI

GPT-3

Transformers

GPT

Writing

Chat

Nguyen Hong Phuc

See more Economics

Featured Nghe An Newspaper

Latest

Read more