From Podman AI Lab to OpenShift AI

Learn how to rapidly prototype AI applications from your local environment with Podman AI Lab, add knowledge and capabilities to a large language model (LLM) using retrieval augmented generation (RAG), and use the open source technologies on Red Hat OpenShift AI to deploy, serve, and integrate generative AI into your application.

Red Hat OpenShift AI product trialRed Hat OpenShift trial

Podman AI Lab is a great way to get started with running and testing LLMs locally. This section will show you how you can use Services, Playgrounds, and Recipe catalogs in Podman AI Lab.

In order to get full benefit from taking this lesson, you need to:

  • Install Podman Desktop and Podman AI Lab extension: follow the installation instructions for Podman Desktop and the Podman AI Lab extension in the article Podman AI Lab - Getting Started. This article also gives a great overview of the features in Podman AI Lab.

In this lesson, you will:

  • Download, run, and test LLMs in Podman AI Lab.  
  • Find out how easy it is to start a chatbot recipe in Podman AI Lab with the downloaded model.

High-level architecture

Figure 1 depicts the transition from Podman AI Lab to OpenShift AI.

Architecture showing how a model and chatbot application from Podman AI Lab are deployed to OpenShift AI and OpenShift.
Figure 1: Architecture showing how a model and chatbot application from Podman AI Lab are deployed to OpenShift AI and OpenShift.

The workflow from local LLM to OpenShift AI is as follows:

  1. An LLM is downloaded through Podman AI Lab.
  2. A chatbot recipe is started in Podman AI Lab with the downloaded model.
  3. The chatbot recipe code from Podman AI Lab is updated in VS Code with LangChain to connect to the Elasticsearch vector database and OpenShift AI model serving inference endpoint.
  4. An ingestion notebook is run in OpenShift AI to add data to the Elasticsearch vector database.
  5. The LLM downloaded from Podman AI Lab is deployed to OpenShift AI on a custom serving runtime.
  6. The updated chatbot with LangChain is built as a container and deployed to OpenShift.

Download the model

We will be downloading and using TheBloke/Mistral-7B-Instruct-v0.2-GGUF. This model is a quantized (smaller) version of the full Mistral-7B-Instruct-v0.2. The smaller model will allow us to run inferencing on CPUs if GPUs are not an option.

  1. Go to the AI Lab extension and select Catalog under Models (Figure 2). 

    Podman AI Lab -> MODELS -> Catalog highlighted.
    Figure 2: Select Catalog.
  2. If you haven't already, download the TheBloke/Mistral-7B-Instruct-v0.2-GGUF model (Figure 3). The model is around 4GB so it might take some time. 

    Podman AI Lab Models screen with  ThBloke/Mistral-7B-Instruct-v0.2-GGUF highlighted.
    Figure 3: Download Model.

    Podman AI Lab allows you to get started quickly with downloaded models through Services, Playgrounds, and the Recipes Catalog. 

    The Services section allows you to create a model service endpoint for models you've downloaded. Client code is provided (cURL by default) in multiple formats to get you started quickly with sending in requests to the model service endpoint. See Figure 4.

    Podman AI Lab -> Services -> Models and Client Code sections as well as cURL drop down highlighted.
    Figure 4: Podman AI Lab Services.

    The Playgrounds area allows you to define system prompts and experiment with different settings like temperature, max tokens, and top-p, as shown in Figure 5.

    Podman AI Lab -> Playgrounds -> User prompt, Settings, and prompt input text highlighted.
    Figure 5: Podman AI Lab Playgrounds.

    The Recipes Catalog contains demo applications for natural language processing (NLP), computer vision, and audio. We'll be using the ChatBot recipe demo code in this example.

  3. Create the Chatbot: make sure to select TheBloke/Mistral-7B-Instruct-v0.2-GGUF as your model and then click the Start AI App button (Figure 6).

    Podman AI Lab -> Recipes Catalog -> ChatBot -> Start AI App highlighted.
    Figure 6: Start chatbot
  4. After the chatbot has started, open it up to test it out (Figure 7).

    Chatbot UI running in a web browser.
    Figure 7: Chatbot UI.
  5. At the bottom of the AI App Details section, you'll see an Open in VSCode button (Figure 8). Clicking that will open all of the code that is running your chatbot. Later, we'll modify that code to connect Langchain, TheBloke/Mistral-7B-Instruct-v0.2-GGUF model, and the Elasticsearch vector database. 

    Podman AI Lab -> Recipes Catalog -> ChatBot -> Model, Open in VSCode, and Open AI App highlighted.
    Figure 8: Podman AI Chatbot Recipe.

You’ve now installed Podman Desktop, Podman AI Lab, and installed and run an LLM locally. In the next section, you’ll install OpenShift AI and Elasticsearch vector database. You’ll then ingest some data into the vector database.

Previous resource
Overview: From Podman AI Lab to OpenShift AI
Next resource
Deploy OpenShift AI and Elasticsearch vector database