ChatGPT: A Vague Copy of the Internet

Written by Lan 2022 Cohort

Recently, the AI language model ChatGPT has gone viral on the Internet. It’s capable of understanding and processing a vast amount of information quickly. Meanwhile, it can converse on various topics, from science and technology to art and culture. It seems omnipotent, and only two months after launch, its active users exceeded 100 million.

However, if you ask GPT-3 (the large language model on which ChatGPT is based) to add and subtract a pair of numbers, it will almost always give the correct answer when the number has only two digits. But it became less accurate as the numbers got larger, dropping to ten percent accuracy when the numbers had five digits. Most of the correct answers given by GPT-3 cannot be found on the Internet, so it needs to do more than simple memory. But, despite ingesting a wealth of information, it has yet to deduce the principles of arithmetic. GPT-3’s statistical analysis of arithmetic examples allows it to produce simple approximations of the real thing but nothing more.

So, can big language models replace traditional search engines? To have confidence in them, we need to know that they are not being fed propaganda and conspiracy theories – we need to know that ChatGPT is capturing the right parts of the Web. But even if a large language model includes only the information we want, there is still the problem of ambiguity. One kind of ambiguity is acceptable, and that is rephrasing statements in different words. There is another kind of ambiguity that is a complete fabrication, which we consider unacceptable when we are looking for facts.

The truth is that ChatGPT should already generate a significant portion of the Internet by now. With the rise of this repackaging like ChatGPT, it is now harder to find what we are looking for online; the more texts generated by the Big Language Model are posted on the web, the more the web becomes a vague version of itself.

If we lose access to the Internet for good and have to store a copy on a private server with limited space, a big language model like ChatGPT might be a good solution, provided it can be unmade. But we keep access to the Internet. So, how useful is a vague copy when you still have the original?