Creating Applications with ChatGPT, LLMs and Generative AI - Overview
I am interested in learning about ChatGPT, LLMs, and Generative AI. What is the history for these technologies? How can you work with them?
Introductions to ChatGPT, LLMs, and Generative AI
OpenAI launched ChatGPT on November 30, 2022, immediately catching the attention of diverse audiences with its ability to create a humanlike conversational dialog and write sophisticated programming code. Indeed, ChatGPT grew from 0 to 1 million users in a manner of days and currently has more than 100 million users.
ChatGPT is an artificial intelligence (AI) chatbot that uses Natural Language Processing (NLP) to create conversations that mimic human interactions. AI practitioners refer to models like ChatGPT as a Large Language Model (LLM). LLMs are advanced machine learning models trained on massive amounts of data (billions of parameters) that not only understand but also generate language. These models generate textual outputs that can be short or long, with different writing styles and structures, in specific tones, and can generate computer code. It is important to note that the quality and accuracy of the output varies.
Who Created ChatGPT?
ChatGPT was created by OpenAI, a research organization initially founded and committed to ensuring that artificial general intelligence (AGI) benefits all of humanity. Founded in December 2015 by a group of tech luminaries, including Elon Musk and Sam Altman, OpenAI's mission was to build safe and beneficial AI while actively cooperating with other research and policy institutions.
In 2019, OpenAI transitioned into a for-profit organization as a privately held company. OpenAI has a commercial partnership with Microsoft and various investors like Khosla Ventures and Reid Hoffman. A subset of its models is hosted by Microsoft as a commercial offering under the moniker of Azure OpenAI.
How Does ChatGPT Work?
The GPT part of ChatGPT stands for Generative Pre-trained Transformer. The "Pre-trained" part of this definition is significant. To understand the meaning of the term, let's discuss the different training models for AI.
When AI was first introduced, the models mainly used "supervised learning" to train and develop the underlying algorithms. Supervised learning depends on manually labeled data; an example would be a database of chest X-rays paired with a label that says "Cancer" or "No Cancer." By processing thousands of images, the algorithm can be trained to detect cancer in patients. While effective in a small percentage of use cases, this type of training data is very time-consuming and expensive to produce. There is not enough suitable labeled data available in the universe to train LLMs.
GPT uses generative pre-training, where it is provided with basic guidelines and then exposed to vast amounts of unlabeled data from the internet. It processes this data in an unsupervised mode, forming its own understanding of textual patterns and rules. To ensure its responses are more consistent and suitable, GPT undergoes further "fine-tuning," often incorporating supervised learning methods.
ChatGPT operates using a deep learning neural network inspired by the human brain's structure, enabling it to recognize patterns in text data and predict subsequent text in sentences. A critical component of its design is the transformer architecture, introduced in a 2017 research paper, which has been pivotal to the surge in AI model advancements. This architecture not only improved the quality of AI models but also made them faster and more cost-effective to develop.
At the heart of the transformer model is the "self-attention" mechanism. Unlike older recurrent neural networks (RNNs) that read text sequentially from left to right, transformers analyze all words in a sentence simultaneously, comparing each word to the others, allowing them to focus on the most pertinent words regardless of their position. This process is executed in parallel on modern computing equipment. Rather than processing words, transformers work with "tokens" — segments of text represented as vectors. The spatial proximity of two token-vectors indicates their relatedness. Moreover, attention is also depicted as a vector, enabling transformer-based networks to recall significant details from previous parts of a paragraph.
To understand the remaining articles in this series, it is important to know how text is understood by AI models. As mentioned above, models work with tokens. Tokens are common sequences of characters found in text. On average, a token is about 4 characters long. You can use the Open API Tokenizer page to see how text translates into tokens.
GPT-3 was trained on about 500 billion tokens. That amount of data allows its LLMs to better assign meaning and predict plausible follow-up text by mapping the tokens in a multi-dimensional vector space. Simpler words map to a single token, although longer or more complex words map to multiple tokens.
GPT 3.5 and 4 were training using books, articles, and other documents on a wide range of topics, styles, and genres, together with the entire contents of the Internet until January 2022. This comes close to the total of all human knowledge. As an example, ChatGPT is exceptionally good at writing sonnets in the style of Shakespeare since it has been trained in all of his books.
Azure Open AI
Azure OpenAI hosts as a subset of the OpenAI models in a secure enterprise Azure environment. Azure OpenAI co-develops the API with OpenAI, ensuring compatibility and a smooth transition from one to the other. With Azure OpenAI, customers can leverage the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers enterprise-level security features such as private networking, regional availability, and responsible AI content filtering to enable you to build AI-based enterprise applications.
Open AI has a large set of models with different capabilities and price points. Note: You can also customize the OpenAI models with a process called fine-tuning.
Below is a list of different models:
|GPT-4 is a large multimodal model that always accepts text input and emits text outputs, but other output modes, such as speech and imagery outputs, have recently been added. It has broader knowledge and advanced reasoning capabilities, making it the most advanced model in the OpenAI family.
|GPT-3.5 models can understand and generate natural language or code. The most capable and cost-effective model in this family is gpt-3.5-turbo, which is the model that we will be using most throughout this article series.
|The GPT base models can understand and generate natural language and code but are not trained to follow instructions. The base models are designed to replace the original GPT-3 base models and use the legacy Completion API (more on the different Open AI APIs in the next section).
|DALL E is a model that can create realistic images and art from a natural language description.
|Whisper is a general-purpose speech recognition model. It powers the recently added speech capabilities in GPT-4.
|Embeddings are a numerical representation of text that can be used to measure the similarity between two pieces of text. We will use embeddings when we implement reasoning over PDFs and other documents in article five.
|The primary function of the Moderation Models is to check if content complies with OpenAI's usage policies.
|The GPT-3 family was the set of models that first introduced the capability to understand and generate natural language. It is now considered a legacy model.
While you can use the ChatGPT chatbot at OpenAI's site, the real power of the models is available through its underlying API. In this article series, you begin with getting an API key from OpenAI and trying out some simple conversations by leveraging OpenAI's Completion and ChatCompletion API endpoints.
This article series will leverage both the OpenAI APIs and the Azure OpenAI SDKs. I will first show you how to sign up for each service and how to start consuming both the OpenAI and Azure OpenAI endpoints. Next, I will introduce you to the widely used LangChain library, which provides a powerful abstraction on top of not just OpenAI and Azure OpenAI but also other LLM providers. We will continue to leverage the capabilities of LangChain throughout this article series.
This series is spread across eight distinct articles.
The first article will set the reader up on both the OpenAI and Azure OpenAI platforms. You will create a local developer environment based on Jupyter notebooks. Next, you will look at the different OpenAI APIs and create a simple example for both platforms. You will see how to use the temperature parameter to control randomness in the generated output and how the max_tokens parameter controls the length of the generated output.
In the second article, you will dive deeper into LLMs and their capabilities. We will introduce the LangChain framework, and you will learn how to create powerful Prompt Templates, enabling prompt reuse. ChatGPT and other LLMs can produce text and other formats such as JSON, XML, etc. You will use LangChain Output Parsers to create outputs in these various formats.
This one will begin with creating LangChain Chains. While using an LLM in isolation works for only the simplest of applications, more complex applications require the chaining of LLMs. Chains allow us to combine multiple components to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a Prompt Template, and then passes the formatted response to an LLM. We can build more complex chains, such as Router Chains, by combining multiple chains. We will also introduce the Memory classes to maintain state in your application.
In article four, we start using LangChain Agents. An agent is a component with access to an LLM and a suite of tools and can decide which tool(s) to use based on the user's input. A tool can be anything useful to the task at hand, such as searching the Web, calling a Weather API, scanning a document, etc.
Here, we will work with LangChain Embedding Models. Embeddings are used to transform a piece of text into a numerical vector representation. You can then store these representations in a vector database such as Pinecone. You will learn how to perform semantic similarity searches to look for pieces of text that are most similar. This is how you can search documents, PDFs, etc. You chunk up a document, transform each chunk to an embedding, store the embedding in Pinecone, and then perform a similarity search.
Now that we have covered the core LLM and LangChain concepts, we are ready to build applications. Our first application will be a chatbot that can answer questions about a specific application domain. This example will illustrate how to integrate external data sources with our LLM/LangChain environment.
Next, we will build a system that can ingest documents and allow the reader to reason over the content of the document using embeddings stored in a Pinecone library.
In the final article, we will build a LangChain agent to solve specific math and reasoning puzzles. Our agent will leverage the LLMMathChain to tackle complex word math problems seamlessly.
- Stay tuned for the upcoming articles in this series.
- Read more artificial intelligence related articles.
- OpenAI API Introduction
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-10-25