Copyright ©
Mindbreeze GmbH, A-4020 Linz, 06.12.2024.
All rights reserved. All hardware and software names used are trade names and/or trademarks of the respective manufacturers.
These documents are strictly confidential. The transmission and presentation of these documents alone does not establish any rights to our software, our services and service results or other protected rights.
Passing on, publication or duplication is not permitted.
For reasons of easier readability, gender-specific differentiation, e.g. users, is not used. Corresponding terms apply to both genders in the interests of equal treatment.
This document deals with the Mindbreeze Web API for generating chat completions using RAG pipelines.
Generate requests are sent as HTTP POST requests to a client service. The path for Generate requests is the following:
<Client Service>/api/chat/v1beta/generate
A JSON document describing the Generate request is sent in the body of the HTTP request. The structure of this JSON document is described in the section "Request Fields".
Events (stream) sent by the server are also received as a response. The format is described in the chapter "Response Fields".
An OpenAPI specification of the API is also available. More detailed instructions can be found here: OpenAPI Interface Description .
id (optional) | Identifier of the request (optional). This ID is shown in app.telemetry. Type: String |
inputs | Input text for the Generate request. Type: String |
stream | Controls whether the generation is streamed. Default: true Type: Boolean |
model_id | ID of the RAG pipeline to be used. Type: String |
messages | Conversation history (optional). See the chapter messages [List]. |
parameters | Generation parameters (optional). See the chapter parameters. |
prompt_dictionary | Additional values of the prompt template (optional). See the chapter prompt dictionary. |
retrieval_options | Additional search restrictions (optional). See the chapter retrieval_options. |
generation_options | Additional setting options for the generation (optional). For more information, see the chapter generation_options. |
{
"inputs": "Who is the CEO of Mindbreeze?",
"stream": false,
"model_id": "3a0e8612-a24f-4b16-93cc-aa6307d0c62b",
"retrieval_options": {
"constraint": {
"unparsed": "title:management"
}
}
}
The API is stateless. To use the Generate Endpoint for chat applications, a chat history can optionally be specified as a list of messages. The "Use chat history" option must also be activated in the pipeline for this.
from | Sender of the message ("user" or "assistant"). Type: String |
id | Message identifier (optional). Type: String |
content | Text content of the message. Type: String |
content_processed | Completed prompt template with search results (optional). Type: String |
[
{
"from": "user",
"content": "Who is the CEO of Mindbreeze?",
"content_processed": "Given the following extracted parts of ..."
},
{
"from": "assistant",
"content": "Daniel Fallmann is the CEO of Mindbreeze"
}
]
Optionally, the generation parameters of the pipeline used can be overwritten.
temperature | Overwrites "Randomness of the response (temperature)" Controls the randomness of the generated response (0 - 100%). Higher values make the output more creative, while lower values make it more targeted and deterministic. Type: Integer |
max_new_tokens | Overwrites "Maximum response length (tokens)" Limits the number of tokens generated (100 tokens ~ 75 words; depending on the tokenizer). Type: Integer |
details | Adds more detailed information about the individual tokens to the response in addition to the generated text. Type: Boolean Hint: Is only relevant if it is not streamed. |
retrieval_details | Adds more detailed information about the retrieved answers to the response in addition to the generated text. Type: Boolean |
{
"temperature": 5,
"max_new_tokens": 500
"details": true,
"retrieval_details": true
}
This can be used to transfer optional parameters that are not supported by all LLMs.
Name | Description | Supported LLM protocols |
do_sample | If do_sample is set to false, the text generation is deterministic. The model always selects the token with the highest probability (logits value). This setting is recommended for clearly defined and predictable tasks. If do_sample is set to true, the selection of the next tokens is stochastic, based on the probability distributions calculated by the model. This enables more creative and diverse outputs. Type: Boolean | InSpire LLM |
truncate | Truncate input tokens to the given size. Type: Integer | InSpire LLM |
{
"do_sample": true,
"truncate": 8000
}
Optionally, key-value pairs can be specified to fill in placeholders in the prompt template of the pipeline used. To overwrite default placeholders, the setting "Allow overwriting of system prompt template variables" must be activated in the pipeline.
"prompt_dictionary": {
"question": "Tell me about Mindbreeze",
"answer": "Mindbreeze is fast!"
}
Optionally, the retrieval settings of the pipeline used can be overwritten.
constraint | Query expression that is used for retrieval in addition to the search restriction configured in the pipeline. Type: String |
search_request | Extends the search query that is used for the retrieval. Fields that are not present in the search query in the pipeline are added. To allow fields to be overwritten, the setting "Allow overwriting of search query template" must be activated in the pipeline. Type: Object For more information, see api.v2.search Interface Description - Fields in the search query. |
use_inputs | Controls whether the input text (inputs) is used as a query for retrieval. Default: true Type: Boolean |
skip_retrieval | Skips the retrieval part. This setting is helpful if you want to generate answers without an additional context or if you want to specify the answers yourself. Default value: false Type: Boolean For more information, see the setting “answers” in the chapter generation_options. |
"retrieval_options": {
"constraint": {
"unparsed": "title:management"
},
"search_request": {
"term": "mind"
},
"use_inputs ": false
}
Optionally, the retrieval settings of the pipeline that is being used can be overwritten:
prompt_dictionary | This „prompt dictionary“ has priority over the other prompt_dictionary. For more information, see the chapter prompt_dictionary |
llm_selector | This setting can be used to select an LLM for the generation via the name or the family. This is only possible if no pipeline has been specified with a model_id. The values of the individual LLMs can be found via the /data interface. |
answers | With this setting, you can specify the answers yourself, provided that the retrieval has been deactivated with skip_retrieval in the retrieval_options. Type: List[Answer] For more information, see api.v2.search Interface Description - Answer. |
message_templates | With this setting, the messages to be sent to the LLM can be specified very precisely. For more information, see the chapter message_templates [List]. |
"generation_options": {
"prompt_dictionary": {
"company": "Mindbreeze"
},
"llm_selector": {
"family": "Meta Llama 3 Instruct"
}
}
role | Defines the role of the conversation participant of this message. Possible values are
Type: String |
content | The content of the message, which can consist of several parts. For more information, see the chapter content [List]. |
{
"role": "user",
"content": [
{
"type": "text",
"text": "Who is the CEO of Mindbreeze?"
}
]
}
type | Defines the type of this content. Possible values are: text text/fstring-template Type: String |
text | Defines the actual content. If the setting type has the value text/fstring-template, placeholders from the setting prompt_dictionary or the standard placeholders (summaries and question) can be used here. Type: String |
{
"type": "text/fstring-template",
"text": "You are a helpful AI assistant. Please answer the question with the context below:\n{summaries}"
}
The structure depends on the "stream" field in the request.
stream (Request) | Response structure |
true (Default) | TokenStreamEvent |
false | GeneratTextResponse |
When streaming, only the last TokenStreamEvent contains the complete generated text.
data: {"token": {"text": " Daniel", "logprob": -0.055236816, "id": 4173}}
data: {"token": {"text": " ", "logprob": -0.0005774498, "id": 32106}}
data: {"token": {"text": " case", "logprob": -7.176399e-05, "id": 2589}}
...
"data": {
"token":{
"text":"</s>",
"logprob":-0.22509766,
"id":1,
"special":true
},
"generated_text": "Daniel Fallmann is the CEO of Mindbreeze.\n\nRetrieved Sources: ...",
"details":{
"finish_reason": "eos_token",
"generated_tokens":19.0,
"seed":null
},
"content_processed": "Given the following ..."
}
Contains information on the generated token
text | Text content of the token. Type: String |
logprob | Logarithmic probability of the token. Type: Float ]-inf,0] |
id | Identification of the token in relation to its context Type: Integer |
special | Token has a special meaning (e.g. end-of-sequence). Type: Boolean |
"token": {
}, "text": "mind",
"logprob": -0.0029792786,
"id": 1,
"special": false
}
The complete generated text, i.e. all streamed tokens except special tokens.
"generated_text": "The CEO of Mindbreeze is...",
finish_reason | Reason for completing the token generation (e.g. "eos_token"). Type: String |
generated_tokens | Number of tokens generated. Type: Float |
seed | When generating used Seed. Type: String | null |
"details": {
}, "finish_reason": "eos_token",
"generated_tokens": 51.0,
"seed": null
}
Contains the text that was sent to the LLM as input for generation (prompt). The text is generated using the prompt template of the pipeline used.
"content_processed": "Given the following extracted parts of ..."
Without streaming, only the generated text is returned, without additional information as with streaming.
{
"generated_text": "Daniel Fallmann is the CEO…",
"details": {
"generated_tokens": 19,
"tokens": [
{
"text": "Daniel",
"logprob": -0.055236816
},
{
"text": " ",
"logprob": -0.0005774498
},
{
"text": "Fall",
"logprob": -7.176399e-05
},
…
{
"text": "</s>",
"logprob": -0.22509766,
"special":true
}
]
}
}
The complete generated text, i.e. all streamed tokens except special tokens.
"generated_text": "The CEO of Mindbreeze is…",
Contains information about the generated text. Only present if the "details" parameter was sent with "true" in the request.
generated_tokens | Number of tokens generated. Type: Integer |
tokens | The generated tokens. Type: Array[Token] |
Contains information on the generated token
text | Text content of the token. Type: String |
logprob | Logarithmic probability of the token. Type: Float ]-inf,0] |