Interface description
api.chat.v1beta.generate

Introduction

This document deals with the Mindbreeze Web API for generating chat completions using RAG pipelines.

Generate requests are sent as HTTP POST requests to a client service. The path for Generate requests is the following:

<Client Service>/api/chat/v1beta/generate

A JSON document describing the Generate request is sent in the body of the HTTP request. The structure of this JSON document is described in the section "Request Fields".

Events (stream) sent by the server are also received as a response. The format is described in the chapter "Response Fields".

An OpenAPI specification of the API is also available. More detailed instructions can be found here: OpenAPI Interface Description.

Request Fields [ConversationInput]

Field	Description
id (optional)	Identifier of the request (optional). This ID is shown in app.telemetry. Type: String
inputs	Input text for the Generate request. Typ: String
stream	Controls whether the generation is streamed. Type: Boolean Default setting: true
model_id	ID of the RAG pipeline to be used. Type: String
pipeline_id	Alias for model_id. Type: String
pipeline_key	The key to a pipeline. For more information, see here. Type: String
messages	Conversation history (optional). See the chapter messages [List].
parameters	Generation parameters (optional). See the chapter parameters.
prompt_dictionary	Additional values of the prompt template (optional). See the chapter prompt dictionary.
retrieval_options	Additional search restrictions (optional). See the chapter retrieval_options.
generation_options	Additional setting options for the generation (optional). For more information, see the chapter generation_options.

{

"inputs": "Who is the CEO of Mindbreeze?",

"stream": false,

"model_id": "3a0e8612-a24f-4b16-93cc-aa6307d0c62b",

"retrieval_options": {

"constraint": {

"unparsed": "title:management"

}

messages [List]

The API is stateless. To use the Generate Endpoint for chat applications, a chat history can optionally be specified as a list of messages. The "Use chat history" option must also be activated in the pipeline for this.

Option	Description
from	Sender of the message ("user" or "assistant"). Typ: String
id	Message identifier (optional). Typ: String
content	Text content of the message. Typ: String
content_processed	Completed prompt template with search results (optional). Typ: String

[

{

"from": "user",

"content": "Who is the CEO of Mindbreeze?",

"content_processed": "Given the following extracted parts of ..."

{

"from": "assistant",

"content": "Daniel Fallmann is the CEO of Mindbreeze"

}

]

parameters

Optionally, the generation parameters of the pipeline used can be overwritten.

Parameter	Description
temperature	Overwrites "Randomness of the response (temperature)" Controls the randomness of the generated response (0 - 100%). Higher values make the output more creative, while lower values make it more targeted and deterministic. Typ: Integer
max_new_tokens	Overwrites "Maximum response length (tokens)" Limits the number of tokens generated (100 tokens ~ 75 words; depending on the tokenizer). Typ: Integer
details	Adds more detailed information about the individual tokens to the response in addition to the generated text. Typ: Boolean Hint: Is only relevant if it is not streamed.
retrieval_details	Adds more detailed information about the retrieved answers to the response in addition to the generated text. Typ: Boolean

{

"temperature": 5,

"max_new_tokens": 500

"details": true,

"retrieval_details": true

}

optional_parameters

This can be used to transfer optional parameters that are not supported by all LLMs.

Name

Description

Supported LLM protocols

do_sample

If do_sample is set to false, the text generation is deterministic. The model always selects the token with the highest probability (logits value). This setting is recommended for clearly defined and predictable tasks.

If do_sample is set to true, the selection of the next tokens is stochastic, based on the probability distributions calculated by the model. This enables more creative and diverse outputs.

Type: Boolean

InSpire LLM

truncate

Reduces the number of tokens to the specified size. Defining this setting is useful for maintaining the context size for LLMs. The configuration of this setting is recommended, if it can be assumed that very long prompts are used, which could possibly exceed the context length of the LLM.

Type: Integer

Attention: Information may be lost if the „truncate“ parameter is used.

InSpire LLM

{
"do_sample": true,
"truncate": 8000
}

prompt_dictionary

Optionally, key-value pairs can be specified to fill in placeholders in the prompt template of the pipeline used. To overwrite default placeholders, the setting "Allow overwriting of system prompt template variables" must be activated in the pipeline.

"prompt_dictionary": {

"question": "Tell me about Mindbreeze",

"answer": "Mindbreeze is fast!"

}

retrieval_options

Optionally, the retrieval settings of the pipeline used can be overwritten.

Option	Description
constraint	Query expression that is used for retrieval in addition to the search restriction configured in the pipeline. Typ: String
search_request	Extends the search query that is used for the retrieval. Fields that are not present in the search query in the pipeline are added. To allow fields to be overwritten, the setting "Allow overwriting of search query template" must be activated in the pipeline. Typ: Object For more information, see api.v2.search Interface Description - Fields in the search query.
use_inputs	Controls whether the input text (inputs) is used as a query for retrieval. Default: true Typ: Boolean
skip_retrieval	Skips the retrieval part. This setting is helpful if you want to generate answers without an additional context or if you want to specify the answers yourself. Default value: false Typ: Boolean For more information, see the setting “answers” in the chapter generation_options.

"retrieval_options": {

"constraint": {

"unparsed": "title:management"

"search_request": {

"term": "mind"

"use_inputs ": false

}

generation_options

Optionally, the generation settings of the pipeline that is being used can be overwritten:

Option	Description
prompt_dictionary	This „prompt dictionary“ has priority over the other prompt_dictionary. For more information, see the chapter prompt_dictionary
llm_selector	This setting can be used to select an LLM for the generation via the name or the family. This is only possible if no pipeline has been specified with a model_id. The values of the individual LLMs can be found via the /data interface.
answers	With this setting, you can specify the answers yourself, provided that the retrieval has been deactivated with skip_retrieval in the retrieval_options. Type: List[Answer] For more information, see api.v2.search Interface Description - answers.
message_templates	With this setting, the messages to be sent to the LLM can be specified very precisely. For more information, see the chapter message_templates [List].

"generation_options": {
   "prompt_dictionary": {
      "company": "Mindbreeze"
   },
   "llm_selector": {
      "family": "Meta Llama 3 Instruct"

}

message_templates [List]

Option

Description

role

Defines the role of the conversation participant of this message.

Possible values are

system
user
assistant

Type: String

content

The content of the message, which can consist of several parts.

For more information, see the chapter content [List].

{
   "role": "user",
   "content": [
      {
         "type": "text",
         "text": "Who is the CEO of Mindbreeze?"
      }
   ]
}

content [List]

Option

Description

type

Defines the type of this content.

Possible values are:

text

text/fstring-template

Type: String

text

Defines the actual content.

If the setting type has the value text/fstring-template, placeholders from the setting prompt_dictionary or the standard placeholders (summaries and question) can be used here.

Type: String

{
"type": "text/fstring-template",
"text": "You are a helpful AI assistant. Please answer the question with the context below:\n{summaries}"

}

Response Fields

The structure depends on the "stream" field in the request.

stream (Request)	Response structure
true (Default)	TokenStreamEvent
false	GeneratTextResponse

TokenStreamEvent

When streaming, only the last TokenStreamEvent contains the complete generated text.

data: {"token": {"text": " Daniel", "logprob": -0.055236816, "id": 4173}}

data: {"token": {"text": " ", "logprob": -0.0005774498, "id": 32106}}

data: {"token": {"text": " case", "logprob": -7.176399e-05, "id": 2589}}

...

"data": {

"token":{

"text":"</s>",

"logprob":-0.22509766,

"id":1,

"special":true

"generated_text": "Daniel Fallmann is the CEO of Mindbreeze.\n\nRetrieved Sources: ...",

"details":{

"finish_reason": "eos_token",

"generated_tokens":19.0,

"seed":null

"content_processed": "Given the following ..."

}

token

Contains information on the generated token.

Information	Description
text	Text content of the token. Type: String
logprob	Logarithmic probability of the token. Type: Float ]-inf,0]
id	Identification of the token in relation to its context Type: Integer
special	Token has a special meaning (e.g. end-of-sequence). Type: Boolean

"token": {

}, "text": "mind",

"logprob": -0.0029792786,

"id": 1,

"special": false

}

generated_text [String]

The complete generated text, i.e. all streamed tokens except special tokens.

"generated_text": "The CEO of Mindbreeze is...",

details

Option	Description
finish_reason	Reason for completing the token generation (e.g. "eos_token"). Type: String
generated_tokens	Number of tokens generated. Type: Float
seed	When generating used Seed. Type: String \| null

"details": {

}, "finish_reason": "eos_token",

"generated_tokens": 51.0,

"seed": null

}

content_processed [String]

Contains the text that was sent to the LLM as input for generation (prompt). The text is generated using the prompt template of the pipeline used.

"content_processed": "Given the following extracted parts of ..."

retrieval_details

Contains information on the received answers from the search. Only present if the parameter “retrieval_details” was sent with “true” in the request.

Option

Description

answers

The list of answers from the search.

Type: List[Answer]

For more information, see api.v2.search Interface Description - answers.

GenerateTextResponse

Without streaming, only the generated text is returned, without additional information as with streaming.

{

"generated_text": "Daniel Fallmann is the CEO…",

"details": {

"generated_tokens": 19,

"tokens": [

{

"text": "Daniel",

"logprob": -0.055236816

{

"text": " ",

"logprob": -0.0005774498

{

"text": "Fall",

"logprob": -7.176399e-05

…

{

"text": "</s>",

"logprob": -0.22509766,

"special":true

}

]

"retrieval_details" {

"answers": [

{

"score": 0.7742171883583069,

"text": {

"text": "Management \nDaniel Fallmann Founder & CEO\nDaniel Fallmann founded Mindbreeze in 2005 and as its CEO he is a living example of high quality and innovation standards.",

"context_after": " From the company’s very beginning, Fallmann, together with his team, laid the foundation for the highly scalable and intelligent Mindbreeze InSpire appliance.",

"text_start_pos": 0,

"text_end_pos": 164

"property_name": "content",

"properties": [

{

"id": "extension",

"name": "Extension",

"data": [

{

"value": {

"str": "html"

}

]

...

]

}

]

}

generated_text [String]

The complete generated text, i.e. all streamed tokens except special tokens.

"generated_text": "The CEO of Mindbreeze is…",

details

Contains information about the generated text. Only present if the "details" parameter was sent with "true" in the request.

Option

Description

generated_tokens

Number of tokens generated.

Type: Integer

tokens

The generated tokens.

Type: Array[Token]

token

Contains information on the generated token.

Information

Description

text

Text content of the token.

Type: String

logprob

Logarithmic probability of the token.

Type: Float ]-inf,0]

retrieval_details

Contains information on the received answers from the search. Only present if the parameter “retrieval_details” was sent with “true” in the request.

Option

Description

answers

The list of answers from the search.

Type: List[Answer]

For more information, see api.v2.search Interface Description - answers.

Interface description
api.chat.v1beta.generate

Introduction

Request Fields [ConversationInput]

messages [List]

parameters

optional_parameters

prompt_dictionary

retrieval_options

generation_options

message_templates [List]

content [List]

Response Fields

TokenStreamEvent

token

generated_text [String]

details

content_processed [String]

retrieval_details

GenerateTextResponse

generated_text [String]

details

token

retrieval_details

Download PDF

Download PDF

{{{i18n.refineSearch}}}

Interface description api.chat.v1beta.generate

Introduction

Request Fields [ConversationInput]

messages [List]

parameters

optional_parameters

prompt_dictionary

retrieval_options

generation_options

message_templates [List]

content [List]

Response Fields

TokenStreamEvent

token

generated_text [String]

details

content_processed [String]

retrieval_details

GenerateTextResponse

generated_text [String]

details

token

retrieval_details

Download PDF

Download PDF

Interface description
api.chat.v1beta.generate