Awesome

Rivet Ollama Plugin

The Rivet Ollama Plugin is a plugin for Rivet to allow you to use Ollama to run and chat with LLMs locally and easily. It adds the following nodes:

Ollama Chat
Ollama Embedding
Get Ollama Model
List Ollama Models
Pull Model to Ollama

Table of Contents

Running Ollama
Using the plugin
- In Rivet
- In the SDK
Configuration
- In Rivet
- In the SDK
Nodes
Local Development

Running Ollama

To run Ollama so that Rivet's default browser executor can communicate with it, you will want to start it with the following command:

OLLAMA_ORIGINS=* ollama serve

If you are using the node executor, you can omit the OLLAMA_ORIGINS environment variable.

Using the plugin

In Rivet

To use this plugin in Rivet:

Open the plugins overlay at the top of the screen.
Search for "rivet-plugin-ollama"
Click the "Add" button to install the plugin into your current project.

In the SDK

Import the plugin and Rivet into your project:

import * as Rivet from "@ironclad/rivet-node";
import RivetPluginOllama from "rivet-plugin-ollama";

Initialize the plugin and register the nodes with the globalRivetNodeRegistry:
```
Rivet.globalRivetNodeRegistry.registerPlugin(RivetPluginOllama(Rivet));
```
(You may also use your own node registry if you wish, instead of the global one.)
The nodes will now work when ran with runGraphInFile or createProcessor.

Configuration

In Rivet

By default, the plugin will attempt to connect to Ollama at http://localhost:11434. If you would like you change this, you can open the Settings window, navigate to the Plugins area, and you will see a Host setting for Ollama. You can change this to the URL of your Ollama instance. For some users it works using http://127.0.0.1:11434 instead.

In the SDK

When using the SDK, you can pass a host option to the plugin to configure the host:

Using createProcessor or runGraphInFile, pass in via pluginSettings in RunGraphOptions:

await createProcessor(project, {
  ...etc,
  pluginSettings: {
    ollama: {
      host: "http://localhost:11434",
    },
  },
});

Nodes

Ollama Chat

The main node of the plugin. Functions similarly to the Chat Node built in to Rivet. Uses /api/chat route

Inputs

Title	Data Type	Description	Default Value	Notes
System Prompt	`string`	The system prompt to prepend to the messages list.	(none)	Optional.
Messages	'chat-message[]'	The chat messages to use as the prompt for the LLM.	(none)	Chat messages are converted to the OpenAI message format using "role" and "content" keys

Outputs

Title	Data Type	Description
Output	`string`	The response text from the LLM.
Messages Sent	`chat-message[]`	The messages that were sent to Ollama.
All Messages	`chat-message[]`	All messages, including the reply from the LLM.

Editor Settings

Setting	Description	Default Value	Use Input Toggle	Input Data Type
Model	The name of the LLM model in to use in Ollama.	(Empty)	Yes	`string`
Prompt Format	The way to format chat messages for the prompt being sent to the ollama model. Raw means no formatting is applied. Llama 2 Instruct follows the Llama 2 prompt format.	Llama 2 Instruct	No	N/A
JSON Mode	Activates JSON output mode	false	Yes	`boolean`
Parameters Group
Mirostat	Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	(unset)	Yes	`number`
Mirostat Eta	Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)	(unset)	Yes	`number`
Mirostat Tau	Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)	(unset)	Yes	`number`
Num Ctx	Sets the size of the context window used to generate the next token. (Default: 2048)	(unset)	Yes	`number`
Num GQA	The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b	(unset)	Yes	`number`
Num GPUs	The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.	(unset)	Yes	`number`
Num Threads	Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores).	(unset)	Yes	`number`
Repeat Last N	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	(unset)	Yes	`number`
Repeat Penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)	(unset)	Yes	`number`
Temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)	(unset)	Yes	`number`
Seed	Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)	(unset)	Yes	`number`
Stop	Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return.	(unset)	Yes	`string`
TFS Z	Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)	(unset)	Yes	`number`
Num Predict	Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)	(unset)	Yes	`number`
Top K	Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)	(unset)	Yes	`number`
Top P	Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)	(unset)	Yes	`number`
Additional Parameters	Additional parameters to pass to Ollama. Numbers will be parsed and sent as numbers, otherwise they will be sent as strings. See all supported parameters in Ollama	(none)	Yes	`object`

Ollama Embedding

Embedding models are models that are trained specifically to generate vector embeddings: long arrays of numbers that represent semantic meaning for a given sequence of text. The resulting vector embedding arrays can then be stored in a database, which will compare them as a way to search for data that is similar in meaning.

Inputs

See Editor Settings for all possible inputs.

Outputs

Title	Data Type	Description	Notes
Embedding	`vector`	Array of numbers that represent semantic meaning for a given sequence of text.

Editor Settings

Setting	Description	Default Value	Use Input Toggle	Input Data Type
Model Name	The name of the model to get.	(Empty)	Yes (default off)	`string`
Text	The text to embed.	(Empty)	Yes (default off)	`string`

Ollama Generate

Previously the main node of the plugin. Allows you to send prompts to Ollama and receive responses from the LLMs installed with deep customization options even including custom prompt formats. Uses /api/generate route

Inputs

Title	Data Type	Description	Default Value	Notes
System Prompt	`string`	The system prompt to prepend to the messages list.	(none)	Optional.
Messages	'chat-message[]'	The chat messages to use as the prompt for the LLM.	(none)	Chat messages are converted to a prompt in Ollama based on the "Prompt Format" editor setting. If "Raw" is selected, no formatting is performed on the chat messages, and you are expected to have already formatted them in your Rivet graphs.

Additional inputs available with toggles in the editor.

Outputs

Title	Data Type	Description	Notes
Output	`string`	The response text from the LLM.
Prompt	`string`	The full prompt, with formatting, that was sent to Ollama.
Messages Sent	`chat-message[]`	The messages that were sent to Ollama.
All Messages	`chat-message[]`	All messages, including the reply from the LLM.
Total Duration	`number`	Time spent generating the response.	Only available if the "Advanced Outputs" toggle is enabled.
Load Duration	`number`	Time spent in nanoseconds loading the model.	Only available if the "Advanced Outputs" toggle is enabled.
Sample Count	`number`	Number of samples generated.	Only available if the "Advanced Outputs" toggle is enabled.
Sample Duration	`number`	Time spent in nanoseconds generating samples.	Only available if the "Advanced Outputs" toggle is enabled.
Prompt Eval Count	`number`	Number of tokens in the prompt.	Only available if the "Advanced Outputs" toggle is enabled.
Prompt Eval Duration	`number`	Time spent in nanoseconds evaluating the prompt.	Only available if the "Advanced Outputs" toggle is enabled.
Eval Count	`number`	Number of tokens in the response.	Only available if the "Advanced Outputs" toggle is enabled.
Eval Duration	`number`	Time spent in nanoseconds evaluating the response.	Only available if the "Advanced Outputs" toggle is enabled.
Tokens Per Second	`number`	Number of tokens generated per second.	Only available if the "Advanced Outputs" toggle is enabled.
Parameters	`object`	The parameters used to generate the response.	Only available if the "Advanced Outputs" toggle is enabled.

Editor Settings

Setting	Description	Default Value	Use Input Toggle	Input Data Type
Model	The name of the LLM model in to use in Ollama.	(Empty)	Yes	`string`
Prompt Format	The way to format chat messages for the prompt being sent to the ollama model. Raw means no formatting is applied. Llama 2 Instruct follows the Llama 2 prompt format.	Llama 2 Instruct	No	N/A
JSON Mode	Activates JSON output mode	false	Yes	`boolean`
Advanced Outputs	Add additional outputs with detailed information about the Ollama execution.	No	No	N/A
Parameters Group
Mirostat	Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	(unset)	Yes	`number`
Mirostat Eta	Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)	(unset)	Yes	`number`
Mirostat Tau	Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)	(unset)	Yes	`number`
Num Ctx	Sets the size of the context window used to generate the next token. (Default: 2048)	(unset)	Yes	`number`
Num GQA	The number of GQA groups in the transformer layer. Required for some models, for example it is 8 for llama2:70b	(unset)	Yes	`number`
Num GPUs	The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable.	(unset)	Yes	`number`
Num Threads	Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores).	(unset)	Yes	`number`
Repeat Last N	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	(unset)	Yes	`number`
Repeat Penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)	(unset)	Yes	`number`
Temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)	(unset)	Yes	`number`
Seed	Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)	(unset)	Yes	`number`
Stop	Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return.	(unset)	Yes	`string`
TFS Z	Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)	(unset)	Yes	`number`
Num Predict	Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)	(unset)	Yes	`number`
Top K	Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)	(unset)	Yes	`number`
Top P	Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)	(unset)	Yes	`number`
Additional Parameters	Additional parameters to pass to Ollama. Numbers will be parsed and sent as numbers, otherwise they will be sent as strings. See all supported parameters in Ollama	(none)	Yes	`object`

List Ollama Models

Lists the models installed in Ollama.

Inputs

This node has no inputs.

Outputs

Title	Data Type	Description	Notes
Model Names	`string[]`	The names of the models installed in Ollama.

Editor Settings

This node has no editor settings.

Get Ollama Model

Gets the model with the given name from Ollama.

Inputs

See Editor Settings for all possible inputs.

Outputs

Title	Data Type	Description
License	`string`	Contents of the license block of the model.
Modelfile	`string`	The Ollama modelfile for the model"
Parameters	`string`	The parameters for the model.
Template	`string`	The template for the model.

Editor Settings

Setting	Description	Default Value	Use Input Toggle	Input Data Type
Model Name	The name of the model to get.	(Empty)	Yes (default on)	`string`

Pull Model to Ollama

Downloads a model from the Ollama library to the Ollama server.

Inputs

See Editor Settings for all possible inputs.

Outputs

Title	Data Type	Description	Notes
Model Name	`string`	The name of the model that was pulled.

Editor Settings

Setting	Description	Default Value	Use Input Toggle	Input Data Type
Model Name	The name of the model to pull.	(Empty)	Yes (default on)	`string`
Insecure	Allow insecure connections to the library. Only use this if you are pulling from your own library during development.	No	No	N/A

Local Development

Run yarn dev to start the compiler and bundler in watch mode. This will automatically recombine and rebundle your changes into the dist folder. This will also copy the bundled files into the plugin install directory.
After each change, you must restart Rivet to see the changes.