Page
Getting to know LangChain.js
In this lesson, you will learn about some of the key APIs in LangChain.js and run a simple application that interacts with a running model using Node.js.
In order to get full benefit from taking this lesson, you need:
- An environment where you can install and run Node.js.
- A Git client.
In this lesson, you will:
- Install Node.js.
- Clone the
ai-experimentation
repository to get the sample lessons. - Explore the key LangChain.js APIs that are used in the basic example.
- Run the example to send questions to the running model and display the responses.
Set up the environment
If you don’t already have Node.js installed, install it using one of the methods outlined on the Nodejs.org download page.
Clone the
ai-experimentation
repository with:git clone https://github.com/mhdawson/ai-experimentation
Change into the
ai-experimentation/lesson-1-2
directory with:cd ai-experimentation/lesson-1-2
- Create a directory called
models
. - Download the mistral-7b-instruct-v0.1.Q5_K_M.gguf model from HuggingFace and put it into the model’s directory. This might take a few minutes, as the model is over 5GB in size.
Explore a basic LangChain.js example
We will start by working through the contents of langchainjs-basic.mjs.
The first thing we need to do is to load the model:
//////////////////////////////// // GET THE MODEL const __dirname = path.dirname(fileURLToPath(import.meta.url)); const modelPath = path.join(__dirname, "models", "mistral-7b-instruct-v0.1.Q5_K_M.gguf") const { LlamaCpp } = await import("@langchain/community/llms/llama_cpp"); const model = await new LlamaCpp({ modelPath: modelPath });
This introduces the first LangChain.js API, which is for models. Instances of the model’s API allow you to easily load models using different back ends and then access the model with a common API.
The example loads the model file that we downloaded earlier and stored in the model’s directory. We are using node-llama-cpp to load the model into the same process running Node.js. We won’t do that in production, where we most likely will be accessing an external model, but it’s great for getting started quickly. The magic of Node.js addons, along with node-addon-api (which we help maintain, which is cool) means that when you runnpm install
, thenode-llama-cpp
shared library needed to run the model (there are pre-built binaries for Linux, Windows, and Mac OS X) is either installed or compiled if necessary. If you want to learn more about Node.js addons ornode-addon-api
, check out the video Building native addons for Node.js (and more JavaScript engines) like it's 2023.The next step is to create a “chain” (it is called Langchain.js, after all):
//////////////////////////////// // CREATE CHAIN const prompt = ChatPromptTemplate.fromTemplate(`Answer the following question if you don't know the answer say so: Question: {input}`); const chain = prompt.pipe(model);
This introduces the next two LangChain.js APIs: prompts and chains. Prompts represent the question and related context that you are sending to the model, and the chain represents the steps that are used to build the question and context.
Finally we can start asking questions:
//////////////////////////////// // ASK QUESTION console.log(new Date()); let result = await chain.invoke({ input: "Should I use npm to start a node.js application", }); console.log(result); console.log(new Date());
In this step, we invoke the chain with the input, which asks if we should use npm to start a Node.js application.
At this point you might be wondering: Is that really all you need to run a model locally and ask questions? Surprisingly, yes, although it might be a bit slow. In our case, it took about 25 seconds to answer the question on a Ryzen 5700X with lots of memory.
We’ve kept the example as simple as possible and as a script instead of bundling it into an HTTP service. Our take is that creating a HTTP-based UI that takes input and displays a response is something Node.js developers will already know how to do. The interesting part is what you need to do behind the scenes to talk to the large language model.
Run the basic LangChain.js example
To run the example:
- Run
npm install
. - Run
node langchainjs-basic.mjs
. When the example runs, first you’ll see:
[user1@fedora lesson-1-2]$ node langchainjs-basic.mjs llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/user1/src/learning-path/ai-experimentation/lesson-1-2/models/mistral-7b-instruct-v0.1.Q5_K_M.gguf (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.1 llama_model_loader: - kv 2: llama.context_length u32 = 32768 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 17 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q5_K: 193 tensors llama_model_loader: - type q6_K: 33 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q5_K - Medium llm_load_print_meta: model params = 7.24 B llm_load_print_meta: model size = 4.78 GiB (5.67 BPW) llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.1 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.11 MiB llm_load_tensors: CPU buffer size = 4892.99 MiB ................................................................................................... llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 512.00 MiB llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB llama_new_context_with_model: CPU input buffer size = 17.04 MiB llama_new_context_with_model: CPU compute buffer size = 288.00 MiB llama_new_context_with_model: graph splits (measure): 1 2024-03-11T22:06:38.328Z
That’s the model being loaded and
llama-cpp
printing out some information about the model.Next you’ll see it pause for a while (about 25 seconds in our case) and then you should see an answer something like this:
2024-03-11T22:08:23.372Z Assistant: Yes, you should use npm to start a Node.js application. NPM (Node Package Manager) is the default package manager for Node.js and it provides a centralized repository of packages that can be used in your applications. It also allows you to manage dependencies between packages and automate tasks such as testing and deployment. If you are new to Node.js, I would recommend using npm to get started with your application development. 2024-03-11T22:08:45.774Z
If you’ve read the Node.js Reference Architecture, you’ll know that is not necessarily the answer we’d like people to get (we will revisit that later) but is not unexpected based on common practice.
Conclusion
In this lesson, we introduced the basic LangChain.js APIs, including those for models, prompts, and chains, worked through a simple example that uses those APIs ,and ran the example using a local model.
We’ll build on this in following lessons by:
- Speeding things up if you have a GPU.
- Building a more complex example that supports retrieval-augmented generation.
- Showing how LangChain.js makes it easy to develop, experiment, and test in one environment while being able to easily deploy to another environment with minimal changes to your application.