The primary purpose of the Turing test, as mentioned in the research conducted by UC San Diego researchers, is to assess the extent to which a machine demonstrates human-like intelligence. The test involves a human interrogator interacting with a "witness," who could be either human or an AI agent, and asking a series of questions to determine whether the witness is human or not. In this study, the researchers used the Turing test to evaluate the ability of large language models, such as GPT-4, to generate responses that are indistinguishable from those written by humans.
In the two-player game designed by Jones and his colleagues, a human interrogator interacts with a "witness," who could be either human or an AI agent. The interrogator asks the witness a series of questions to try to determine whether it is human or not. The conversations lasted for up to five minutes, and at the end, the interrogator made a judgment about whether they thought the witness was a human or an AI. Within the five minutes, participants could talk about anything they wanted (except saying abusive things, which they used a filter to prevent). The researchers deployed three different Large Language Models (LLMs) as potential witnesses, namely the GPT-4, GPT 3.5, and ELIZA models.
The researchers at UC San Diego structured their experiments as a two-player online game called the Turing test. In this game, a human interrogator interacted with a "witness," who could be either a human or an AI agent. The interrogator asked the witness a series of questions to determine whether the witness was human or not. The conversations lasted for up to five minutes, and at the end, the interrogator made a judgment about whether they thought the witness was a human or an AI. The researchers deployed three different large language models (LLMs) as potential witnesses, namely the GPT-4, GPT 3.5, and ELIZA models. They found that while users could often determine that ELIZA and GPT-3.5 models were machines, their ability to figure out whether GPT-4 was a human or machine was no better than random chance.