Diving Deeper with large language models and Node.js

In this learning path we dig deeper into using large language models (LLMs) with Node.js by looking at Ollama, LlamaIndex, function calling, agents and observability with Open Telemetry.

OpenShift.ai

In order to get full benefit from taking this lesson, you need:

  • An environment where you can install and run Node.js.
  • An environment where you can install and run Ollama
  • A Git client

In this lesson, you will:

  • Install Node.js
  • Install Ollama 
  • Clone the ai-experimentation repository to get the sample the lessons
  • Run the final example from the first learning path, updated to use an LLM running under Ollama.

Setting up the environment

If you don’t already have Node.js installed, install it using one of the methods outlined on the Nodejs.org download page.

Clone the ai-experimentation repository with:

git clone https://github.com/mhdawson/ai-experimentation

Download and Install Ollama. Supported platforms include macOS, Linux and Windows. You can find the download and install instructions here.

An introduction to Ollama

In the first learning path we ran an LLM locally using  node-llama-cpp. Through the magic of Node.js addons, along with node-addon-api (which we help maintain, which is cool) it loaded the LLM into the same process as the Node.js application being run. This was a fast and easy way to get started because it avoided installing and starting a separate application to run the LLM. However, in most cases we don’t want to have an LLM running in each of our Node.js processes, both due to the potential memory use but also because of the additional time to load the LLM when we start our Node.js application.

Enter Ollama, Ollama is a tool that lets you easily spin up a process that serves an LLM through a connection on a TCP port. In addition, it provides a command line tool to download LLMs. It supports Linux, Windows and MacOS and is already set up to leverage a GPU if one is available. We installed it on a windows machine because that is where the GPU we had available was installed. 

Take a look at the Ollama help by running Ollama without any arguments:

C:\Users\user1>ollama
Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

To stick to an LLM similar to what we used in the first learning path we will use mistral. Run the following command to pull the default mistral LLM. It may take a few minutes depending on your connection speed as the LLM file is 4.1G in size.

C:\Users\user1>ollama pull mistral

You should now be able to see the LLM in the list of LLMs available:

C:\Users\user1>ollama list
NAME            ID              SIZE    MODIFIED
mistral:latest  2ae6f6dd7a3d    4.1 GB  About a minute ago

Depending on your operating system, Ollama may automatically start/be ready to serve LLMs after you pull an LLM. This was the case for us running on Windows. If not start Ollama with: 

C:\Users\user1>ollama serve

What’s nice about Ollama is that in addition to serving the LLM (by default on localhost and port 11434) it also manages the LLMs so that they stay in memory when used, and are unloaded when not being used. You can see the running LLMs with Ollama ps:

C:\Users\user1>ollama ps
NAME    ID      SIZE    PROCESSOR       UNTIL

Since you have not run the Node.js program which uses the LLM yet, no LLMs are running/loaded. After running a program that uses the LLM you will see that an LLM is running:

C:\Users\user1>ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
mistral:latest  2ae6f6dd7a3d    5.1 GB  100% GPU        4 minutes from now

It also tells you how long the LLM will remain running if it is not used. This behavior is particularly useful if your application starts/stops frequently, avoiding the overhead of starting the LLM each time, while at the same time releasing the resources used by a running LLM if it has not been used for a while.

As you can see it was quite easy to download and make a LLM available for use locally. One last bit of configuration we did was to allow Ollama to be accessed remotely. We did this because while we ran Ollama on the windows machine with the GPU we wanted to run our Node.js programs from a Linux machine and later from an OpenShift cluster. The IPs on which Ollama runs can be configured by setting OLLAMA_HOST. We set it as follows to allow access from all of the interfaces on the machine:

OLLAMA_HOST=0.0.0.0

Ollama has other features/functions but we’ve covered what we need for our examples so we’ll leave it up to you to explore the other features you find interesting.

Running the basic Langchain.js example with Ollama

While we said we’d move on to exploring other libraries like LLamaIndex.ts, we’ll first show that just like switching a Langchain.js based application between running with node-llama-cpp , OpenAI and Red Hat OpenShift AI, it's easy to switch to accessing an LLM served by Ollama.

Start by changing into the lesson-1 directory:

cd lesson-1

In that directory you will find a file called langchainjs-ollama.mjs. If you went through the first learning path you will recognize the `get-model` function which we have extended to have an option for Ollama

  } else if (type === 'ollama') {
    ////////////////////////////////
    // Connect to ollama endpoint
    const { Ollama } = await import("@langchain/community/llms/ollama");
    model = new Ollama({
      baseUrl: "http://10.1.1.39:11434", // Default value
      model: "mistral", // Default value
    });
  };

You can see that we specify the baseURL as the remote machine where we are running Ollama, and ask for the mistral LLM that we pulled in the earlier step. If you are running the examples on the same machine on which you ran Ollama, you can change 10.1.1.39 to 127.0.0.1 or remove the baseUrl configuration completely.

Note: you will need to update each of the examples to either remove the baseUrl line if you have it running locally or update it to point to the IP of the machine where you have it running.

The application itself simply asks the LLM the question “Should I use npm to start a node.js application”:

import { ChatPromptTemplate } from "@langchain/core/prompts";
import path from "path";
import {fileURLToPath} from "url";

////////////////////////////////
// GET THE MODEL
const model = await getModel('ollama', 0.9);
//const model = await getModel('llama-cpp', 0.9);
//const model = await getModel('openAI', 0.9);
//const model = await getModel('Openshift.ai', 0.9);

////////////////////////////////
// CREATE CHAIN
const prompt =
  ChatPromptTemplate.fromTemplate(`Answer the following question if you don't know the answer say so:

Question: {input}`);

const chain = prompt.pipe(model);

////////////////////////////////
// ASK QUESTION
console.log(new Date());
let result = await chain.invoke({
  input: "Should I use npm to start a node.js application",
});
console.log(result);
console.log(new Date());

Install the packages required for the application with:

npm install

And then run the application with

node langchainjs-ollama.mjs

When we ran the application the answer was as follows, of course it can be different every time you run with an LLM:

Loading model - Fri Jun 07 2024 16:43:44 GMT-0400 (Eastern Daylight Saving Time)
2024-06-07T20:43:44.302Z
 It is common and generally acceptable to use npm (Node Package Manager) when starting a Node.js application. npm makes it easy to install, manage, and share packages among projects. However, it's essential to understand that while npm comes packaged with Node.js, there are alternative package managers available for Node.js, such as Yarn or pnpm. It's always a good idea to explore the different tools available and choose the one that best fits your needs and workflow.
2024-06-07T20:43:46.796Z

We just ran our first Node.js application using a large language model served by Ollama!

Note: to keep it simple, we’ve not included the Retrieval Augmented Generation (RAG) that we used in the first learning path and, therefore, we get the answer that is not what we’d recommend based on the Node.js Reference Architecture. If you want to learn how to fix that you can go back to the first learning path to see how we did that.

Previous resource
Overview: Diving Deeper with large language models and Node.js
Next resource
Getting to know LLamaIndex.ts with Node.js