Large language models (LLMs) have recently taken the world by storm. You are probably most familiar with ChatGPT, an artificial-intelligence (AI) chatbot developed by OpenAI, which uses the language models also developed by OpenAI. These models are called GPT and the latest version is GPT-4.

With every new version, the language models are able to accomplish more complex tasks. For example, they can summarize large texts, create new ones based on complex requirements, write code, give suggestions about anything and now even do your taxes.

It’s worth noting that these language models can and do make mistakes. First, they are trained on imperfect data, the model itself may have remembered some details incorrectly (similar to JPEG encoding artifacts), and the instructions given by the user may be ambiguous or misleading. Each of these can lead to incorrect output; however, based on my experience, if the user specifies the task well, mistakes are already quite rare.

The Idea: Empowering ChatGPT

In a recent release of GPT-4, the official launch video included a demo of writing code, and suggesting changes to the code or the development environment in order to make something work. This is where I got an idea.

“What if ChatGPT was given the power to make these changes automatically?”

If ChatGPT had indirect access to the developer’s computer, it would be able to perform these tasks by itself, without the user having to execute the tasks suggested by ChatGPT. But how can it do so, if it’s “just a large language model”?

We could write a small program that makes requests to ChatGPT, reads what ChatGPT says, and does that. The same program can tell ChatGPT what happened as a result.

The Experiment: Creating a Proof Of Concept

I wrote detailed instructions to GPT-4 about the syntax for the following actions:

Terminal - run a specific command in the command-line prompt (e.g. Terminal). This action is very powerful, e.g. install dependencies.
CreateFile - create a new file with specific contents. This action could be simulated through the Terminal action, but doing so requires further escaping, which makes the task unnecessarily more difficult for GPT.
ReplaceFile - update the contents of an existing file.
ReplaceLine - replace the contents of a specific line in an existing file.
EndOfActions - signify that all actions are complete.

Then, I implemented a command-line tool that can parse these actions according to the specified syntax, and is able to execute them on the developer’s machine.

The Demo: A HackerNews Story Summarizer

Here is a quick demonstration:

The above video shows how I describe a relatively simple program to the tool, giving the following description:

Write a command line program called HackerNews Summarizer, which fetches the contents of the top 3 stories on HackerNews, and summarizes the contents. When the program gets the URL of a story through HackerNews API, it should fetch the HTML from the URL, convert it to text only, and then summarize the first 5000 characters of this text into a single paragraph using OpenAI's text completion with model "text-davinci-003". Quote the text that you pass to OpenAI using the triple backticks (```) syntax. For each story, the program prints its headline, its URL and its summary. Assume that the OpenAI API key is already available through the environment variable OPENAI_API_KEY.

This tool then sets up the environment, installs appropriate dependencies, writes files containing source code for the target program, runs the program and checks that the output looks reasonable. Here is the source code it wrote:

import os
import requests
from bs4 import BeautifulSoup
import openai

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
HN_API_BASE = "https://hacker-news.firebaseio.com/v0"

def get_top_stories():
    response = requests.get(f"{HN_API_BASE}/topstories.json")
    return response.json()

def get_story_details(story_id):
    response = requests.get(f"{HN_API_BASE}/item/{story_id}.json")
    return response.json()

def fetch_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    return soup.get_text()

def summarize_content(content):
    openai.api_key = OPENAI_API_KEY
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=f"Summarize the following content in a single paragraph:\n```{content[:5000]}```\n",
        max_tokens=100,
        n=1,
        temperature=0.5,
    )
    return response.choices[0].text.strip()

def main():
    story_ids = get_top_stories()[:3]

    for story_id in story_ids:
        story = get_story_details(story_id)
        if "url" in story:
            content = fetch_content(story["url"])
            summary = summarize_content(content)
            print(f"Title: {story['title']}\nURL: {story['url']}\nSummary: {summary}\n")

if __name__ == "__main__":
    main()

It does all of this in roughly 1m30s (parts of the video where the facilitator is waiting for ChatGPT are cut out).

Conclusion

Currently, when tools like GitHub Copilot (which also uses GPT) suggest a few lines of code, the developer checks if those lines look reasonable. This can already greatly increase developer productivity.

With the approach described in this post, the idea is for a large language model to iterate on a larger task itself; the developer ideally only needs to check the final result. The proof of concept shown above is still quite limited, but it shows potential.

If you have any suggestions on what to try next, or would simply like to stay up to date with my work, follow/DM me on Twitter or LinkedIn.

Rok Strniša's Blog

How I Got ChatGPT to Write Complete Programs

The Idea: Empowering ChatGPT

The Experiment: Creating a Proof Of Concept

The Demo: A HackerNews Story Summarizer

Conclusion

Popular posts from this blog

The Birth of AI Operating Systems