Diving Deeper with large language models and Node.js

In this learning path we dig deeper into using large language models (LLMs) with Node.js by looking at Ollama, LlamaIndex, function calling, agents and observability with Open Telemetry.

OpenShift.ai

In order to get full benefit from taking this lesson, you need to:

  • The environment set up in earlier lessons.

In this lesson, you will:

  • Explore how to provide JavaScript functions that can be called by an LLM to help the LLM answer questions.
  • Run an application and observe the function calling behavior with custom built prompts.
  • Run an application and observe the function calling behavior of a ReACT agent built with LLamaIndex.TS

Setting up the environment

Start by changing into the lesson-3 directory.

cd ../lesson-3

Install the packages used in this lesson by running

npm install

Providing Functions to an LLM in JavaScript

Without help, LLMs are limited to the knowledge that was used during their training. Approaches like Retrieval Augmented Generation (RAG) work to provide additional knowledge by providing additional information to the LLM when the question is asked.

Function calling also works to provide additional knowledge. A significant difference, however, is that the LLM decides what additional information is needed and attempts to use the tools (which are represented by functions) that have been provided to it in order to get that information.

For example, an LLM may be provided with the ability to call two functions, one which returns the current weather and one which returns the user's favorite color. Given a specific question the LLM may call one or both of those functions in order to answer the question it is asked.

LangchainOllama and LLamaIndex provide abstractions that make it easier to build an application that leverages function calling. However, in order to better understand how function calling works it is better to start out by using custom prompts to explore how functions are provided to an LLM and how they are called. 

Interaction with LLMs today is generally through a request/response where the request includes any history or additional context that the LLM can use in answering the question in the request. If you remember the Retrieval Augmented Generation (RAG) example from part one, in that case it meant finding the most relevant parts of the Node.js Reference Architecture and including that in the context included in the request.

In the case of function calling, the additional context is the list of functions that can be used by the LLM and any instructions related to using the functions. To understand what this looks like we will explore llamaindex-function-ollama.mjs. This example extends the example in lesson-6 with a more elaborate prompt for the request. The first part tells the LLM about the functions which are available:

You are a helpful research assistant but do not mention that. The following functions are available for you to fetch further data to answer user questions, if relevant:
[{
    "function": "favoriteColorTool",
    "description": "returns the favorite color for person given their City and Country",
    "arguments": [
        {
            "name": "city",
            "type": "string",
            "description": "the city for the person"
        },
        {
            "name": "country",
            "type": "string",
            "description": "the country for the person"
        }
    ]
}]

To call a function respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": "argument_value",
        "argument2": "argument_value"
    }
}

The specific format of how functions are described and the instructions to get the LLM to call that a function may vary between LLMs, but given that JSON is often the format used, it’s likely that for many LLMs it will look somewhat similar to the above.

Functions are shared as an array, in which each array entry includes a name and description for the function and then an array with the arguments that the function receives. This is the information used by the LLM to decide when/what function to call. It is important to provide enough information about both the functions and the parameters so that the LLM can figure out when to call the function and what parameters to pass.

The second part of the prompt includes some of our attempts to influence the behavior of the LLM so that we get better and more accurate responses:

Only use one of these tools if it is relevant to the question.

When using the favoriteColorTool if you do not have the users city and Country ask for it first. Do not guess the users city.

Do not not mention any tools

Do not show JSON when asking user for a city or country


${query}`
return { message: input };
}

At this point we’ll share that we found the behavior when calling functions quite inconsistent. Despite our efforts to guide the LLM in how to respond when using functions, the level of maturity with function calling seems like it is enough to explore what might be possible, however, using it in a real application would come with a number of challenges. For example, despite trying to tell the LLM not to guess the city for a user, the function always seems to be called (if it is called at all) before asking the user for their city. Similarly, JSON was occasionally presented when asking for the user's city or country.

Excluding the logic to call the functions requested by the LLM, the rest of the application is an array of questions to be asked in sequence. We used this approach as it is easy to repeatedly ask the same series of questions in order to observe the consistency and repeatability of the behavior.

const questions = ['What is my favorite color?',
                   'My city is Ottawa',
                   'My country is Canada',
                   'I moved to Montreal. What is my favorite color now?',
                   'My city is Montreal and my country is Canada',
                  ];

for (let i = 0; i< questions.length; i++) {
  console.log('QUESTION: ' + questions[i]);
  let response = await chatEngine.chat(getQuery(questions[i]));
  console.log('  RESPONSE:' + (await handleResponse(chatEngine, response)).response);
}

While this occasionally means that the next question after an LLMs response does not make sense, it works well enough for you to run the questions a number of times and observe the behavior.

An LLM calling a function provided in JavaScript

In the previous section we covered how your program provides the functions that can be called to the LLM, but how does the LLM actually call them? That depends on the instructions given in the request. Remembering this section which defined our custom prompt:

To call a function respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": "argument_value",
        "argument2": "argument_value"
    }
}

The LLM should respond with JSON in the format specified and without any other text. The ability of the LLM to do this depends on its training, with LLMs being trained to support function calling in specific ways/formats. In our case the mistral LLM seems reasonably good at requesting functions be called using that format. As an example it often returns the following before it knows the city and country for the user:

 {
    "function": "favoriteColorTool",
    "arguments": {
        "city": "Your City",
        "country": "Your Country"
    }
}

At this point it's up to our application to recognize the request to run a function. Since the prompt has asked for JSON it’s quite easy to parse in JavaScript and we used the simple logic that decides a response is a function request if it parses as JSON. If the response from the LLM is a function call request, it calls the function and then makes a request to the LLM with the response. That was implemented as follows:

function getFavoriteColor(city, country) {
  if ((city === 'Ottawa') && (country === 'Canada')) {
    return 'the favoriteColorTool returned black';
  } else if ((city === 'Montreal') && (country === 'Canada')) {
    return 'the favoriteColorTool returned red';
  } else {
    return `the favoriteColorTool returned The city or country
            was not valid, please ask the user for them`;
  }
}

async function handleResponse(chatEngine, response) {
  try {
    const functionRequest = JSON.parse(response.response);
    if (functionRequest.function === 'favoriteColorTool') {
      // log the function call so that we see when they are called
      console.log('  FUNCTION CALLED WITH: ' + inspect(functionRequest.arguments));

      // call the function requested
      const favColor = getFavoriteColor(functionRequest.arguments.city,
                                        functionRequest.arguments.country);

      // send the response to the chat engine
      return (handleResponse(chatEngine,
        await chatEngine.chat({message: favColor})));
    } else if (functionRequest.function === 'sendMessage') {
      // LLM sometimes asked to send a message to the user
      return { response: functionRequest.arguments.message };
    } else {
        return (handleResponse(chatEngine,
          await chatEngine.chat({message: 'that function is not available'})));
    }
  }
  catch {
    // not a function request so just return to the user
    return response;
  }
}

One thing you’ll notice is the response from the function. We found that it was good to provide as much context about the answer as possible and in the case of an error, even include instructions to the LLM on what to do. These end up influencing how the LLM will handle the error. For example, telling it to ask the user for the city and country greatly increased the chances it would do so.

The other thing you may notice is that handleFunction() calls itself recursively. This is because the response to a request with the answer to a function call may be another function call. In a real world application we’d want a more complex state machine to handle all of the possible cases.

Interestingly, the function “sendMessage” is not a function that we’ve told the LLM it can call, however, we saw it making this request a number of times and the flow worked better when the message provided with that request was passed onto the user so we’ve included it.

We hope you now have a good understanding of how function calling works with LLMs. You should also be starting to realize why you would want to use a library to simplify how to use functions and how the functions you provide are called. We’ll look at doing that with LLamaindex in a later section.

Running the function calling large language model Node.js example

Now let's run the example and observe the results. Start by changing into the lesson-3 directory. Run the program with:

node llamaindex-function-ollama.mjs

In the perfect world you would see the following:

[user1@fedora lesson-7]$ node llamaindex-function-ollama.mjs 
QUESTION: What is my favorite color?
  FUNCTION CALLED WITH: { city: 'Your City', country: 'Your Country' }
  RESPONSE: What is your city and country?
QUESTION: My city is Ottawa
  FUNCTION CALLED WITH: { city: 'Ottawa', country: 'Canada' }
  RESPONSE: Your favorite color is black.
QUESTION: My country is Canada
  FUNCTION CALLED WITH: { city: 'Ottawa', country: 'Canada' }
  RESPONSE: Your favorite color is black.
QUESTION: I moved to Montreal. What is my favorite color now?
  FUNCTION CALLED WITH: { city: 'Montreal', country: 'Canada' }
  RESPONSE: Your favorite color is red.
QUESTION: My city is Montreal and my country is Canada
  RESPONSE: The favorite color of someone from Montreal, Canada is red.

And often you will or some variant of it! We never managed to get the LLM to ask for the city and country first, so we see that it tries to call the function with ‘Your City’ and ‘Your Country’. The function tells the LLM this is invalid and to ask the user for their city and country and the LLM does so.

The user responds with only the City(Ottawa), but the LLM manages to get the correct Country and call the function with ‘Ottawa’ and ‘Canada’ and returns the correct answer which is ‘black’.

We then provide the Country anyway and it still gets the answer right, although sometimes the LLM asks for the city again.

Now we tell the LLM that we moved to Montreal and it correctly calls the function with ‘Montreal’ and ‘Canada’ and gives us the new answer which is ‘red’.

Finally we give the city and Country and it answers correctly once again.

Unfortunately when we say “in a perfect world” that is because although you will often see this flow, there are many others where the LLM fails to call the function, just guesses the favorite color or gets stuck in a loop calling the function.

One common failure with our current program is that it often includes additional text along with the request to call a function in contravention of the instructions. For example:

QUESTION: My city is Montreal and my country is Canada
  RESPONSE: Based on your provided information, here's the function call to find out your favorite color in Montreal:

{
    "function": "favoriteColorTool",
    "arguments": {
        "city": "Montreal",
        "country": "Canada"
    }
}

With our limited implementation this ends up getting output to the user instead of resulting in another function call because we don’t expect anything other than the JSON in a request to run a function. This is another illustration that there will be a lot of subtleties in the interaction with LLMs calling functions and using a library to handle these will make things a lot easier.

Go ahead and run the example a number of times and you will see the variations between successful runs and complete failures. Sometimes the LLM is on a roll and it follows the successful flow a bunch of times in a row and it answers well but sometimes it guesses nonsense :(. 

With asking our favorite color it's not too bad if the answer is wrong, but imagine if the function was to get your bank balance and it returned a guess!

Function calls with Llamaindex.TS with Node.js

The last section gave us a good understanding of how function calling works, but we don’t necessarily want to have to define the prompts and handle all of the function calling infrastructure ourselves. This is where a library like LLamaindex.TS can help.

As mentioned earlier, the manner in which functions are defined and passed to LLMs can vary both based on the LLM but also based on how the LLM is run (for example using Ollama or something else). Libraries aim to provide a higher level abstraction so that your application does not have to worry about these differences.

This is the definition of a slimmed down favorite color tool that we used in the previous sections. The full source code is in llamaindex-agent-ollama.mjs:

const invalidInfoMessage = { message: 'The city or country was invalid, please ask for the city and country' };
const getFavoriteColor = (info) => {
  console.log(' FUNCTION CALLED WITH: ' + inspect(info));
  if (!info || !(info.city && info.country)) {
    return invalidInfoMessage;
  }

  // return the favorite color based on city and country
  if ((info.city !== 'Ottawa') || (info.country !== 'Canada')) {
    return invalidInfoMessage;
  }

  return {answer: 'black'};
}

const tools = [
    FunctionTool.from(
        getFavoriteColor,
        {
            name: 'favoriteColorTool',
            description: 'return the favorite color for a person based on their city and country',
            parameters: {
                type: 'object',
                properties: {
                    city: {
                        type: 'string',
                        description: 'city'
                    },
                    country: {
                        type: 'string',
                        description: 'country'
                    },
                },
                required: ['city', 'country']
            }
        }
    )
]

This should look familiar to how we defined functions earlier, however additional information is provided about the types for the parameters and parameters can also be specified as required or not. While the manner in which functions are bound into the API differs across libraries, the way the function itself is specified is quite similar.

How are these functions used in Llamaindex.TS? They are called by agents. We will dive into that in the next section.

AI Agents with Llamaindex.TS and Node.js

The ability for an LLM to call functions is often linked to the concept of agents. Agents add the ability to decompose a question, potentially call several tools, and create next actions while figuring out how to answer the question asked.

LLamaindex.ts currently supports 3 types of agents, OpenAI, Anthropic and ReACT. Only the last one, ReACT seems to be usable with a local LLM so that is what we used in our example. If you want to learn more about ReACT agents they were introduced in this paper.

The code to use the agent with the functions defined in the previous section is as follows (The full source code is in llamaindex-agent-ollama.mjs)

function getQuery(query) {
  return { message: `When asked for a favorite color always call favoriteColorTool.
                     A city and country is needed to get the favorite color.
                     When asked for a humans favorite color, if you don't know their city ask for it.
                     Answer the following question: ${query}` };
}

const agent = new ReActAgent({tools})

const questions = ['What is my favorite color? I live in Ottawa, Canada',
                   'My city is Ottawa',
                   'My country is Canada' ];

for (let i = 0; i< questions.length; i++) {
  console.log('QUESTION: ' + questions[i]);
  let response = await agent.chat(getQuery(questions[i]));
  console.log('  RESPONSE: ' + response.response.message.content);
}

As you can see, the agent handles figuring out when a response is a function call, calling the function when necessary as well as providing a higher level of abstraction. When we tried our earlier example with OpenAI it did not work properly, likely because of the differences between the LLMs, while the agent example worked when we switched the LLM and agent over to the OpenAI versions supported by LLamaindex.ts. This shows that providing a higher layer API that works with both is useful.

We mentioned earlier that function calling seems to be at an early maturity level. The documentation on using it with JavaScript seems sparse across all of the libraries and with LLamaindex.ts we often got exceptions when running the example as it failed to parse the response from the LLM. To be able to experiment we patched the implementation of the ReACT agent to improve the behavior we saw. A copy of the patched version is under the lesson-3 directory.

To run the example first copy our patched version to the node_modules directory with:

cp react.js node_modules/llamaindex/dist/agent/react.js

Then run the example with

node llamaindex-agent-ollama.mjs 

You will notice that the responses are still quite inconsistent with a typically run being:

[user1@fedora lesson-7]$ node llamaindex-agent-ollama.mjs 
QUESTION: What is my favorite color? I live in Ottawa, Canada
  RESPONSE: Your favorite color is red.
QUESTION: My city is Ottawa
 FUNCTION CALLED WITH: { city: 'Ottawa', country: 'Canada' }
  RESPONSE: In the city of Ottawa, Canada, my favorite color is black.
QUESTION: My country is Canada
  RESPONSE: Your favorite color is Blue according to the provided information.

In this case we provide the city and country to start to make it easier but it still just guesses without calling the function after the first question.

When we repeat the city it then correctly calls the function with the city and country and returns the right answer.

When we repeat the country, however, it simply provides another guess without calling the function and the guess does not even match the answer given to the two earlier questions. 

Switching to the OpenAI agent/llm the agent seemed to provide more consistent results but it still seems like early days with respect to function calling and their use by agents.

Now run the example a number of times and observe the output to get a feel for how the agent calls the function and the level of variation run-to-run. You can also edit the list of questions to see how that affects the flow.

Previous resource
Getting to know LLamaIndex.ts with Node.js
Next resource
Observability with large language models and Node.js with Open Telemetry