Copyright ©
Mindbreeze GmbH, A-4020 Linz, 09.04.2024.
All rights reserved. All hardware and software names used are trade names and/or trademarks of the respective manufacturers.
These documents are strictly confidential. The transmission and presentation of these documents alone does not establish any rights to our software, our services and service results or other protected rights.
Passing on, publication or duplication is not permitted.
For reasons of easier readability, gender-specific differentiation, e.g. users, is not used. Corresponding terms apply to both genders in the interests of equal treatment.
This document deals with the Mindbreeze Web API for generating chat completions using RAG pipelines.
Generate requests are sent as HTTP POST requests to a client service. The path for Generate requests is the following:
<Client Service>/api/chat/v1beta/generate
A JSON document describing the Generate request is sent in the body of the HTTP request. The structure of this JSON document is described in the section "Request Fields".
Events (stream) sent by the server are also received as a response. The format is described in the chapter "Response Fields".
id (optional) | Identifier of the request (optional). Type: String |
inputs | Input text for the Generate request. Type: String |
stream | Controls whether the generation is streamed. Default: true Type: Boolean |
model_id | ID of the RAG pipeline to be used. Type: String |
{
"inputs": "Who is the CEO of Mindbreeze?",
"stream": false,
"model_id": "3a0e8612-a24f-4b16-93cc-aa6307d0c62b",
"retrieval_options": {
"constraint": {
"unparsed": "title:management"
}
}
}
The API is stateless. To use the Generate Endpoint for chat applications, a chat history can optionally be specified as a list of messages. The "Use chat history" option must also be activated in the pipeline for this.
from | Sender of the message ("user" or "assistant"). Type: String |
id | Message identifier (optional). Type: String |
content | Text content of the message. Type: String |
content_processed | Completed prompt template with search results (optional). Type: String |
[
{
"from": "user",
"content": "Who is the CEO of Mindbreeze?",
"content_processed": "Given the following extracted parts of ..."
},
{
"from": "assistant",
"content": "Daniel Fallmann is the CEO of Mindbreeze"
}
]
Optionally, the generation parameters of the pipeline used can be overwritten.
temperature | Overwrites "Randomness of the response (temperature)" Controls the randomness of the generated response (0 - 100%). Higher values make the output more creative, while lower values make it more targeted and deterministic. Type: Integer |
max_new_tokens | Overwrites "Maximum response length (tokens)" Limits the number of tokens generated (100 tokens ~ 75 words; depending on the tokenizer). |
{
"temperature": 5,
"max_new_tokens": 500
}
Optionally, key-value pairs can be specified to fill in placeholders in the prompt template of the pipeline used. To overwrite default placeholders, the setting "Allow overwriting of system prompt template variables" must be activated in the pipeline.
"prompt_dictionary": {
"question": "Tell me about Mindbreeze",
"answer": "Mindbreeze is fast!"
}
Optionally, the retrieval settings of the pipeline used can be overwritten.
constraint | Query expression that is used for retrieval in addition to the search restriction configured in the pipeline. Type: String |
search_request | Extends the search query that is used for the retrieval. Fields that are not present in the search query in the pipeline are added. To allow fields to be overwritten, the setting "Allow overwriting of search query template" must be activated in the pipeline. Type: String |
use_inputs | Controls whether the input text (inputs) is used as a query for retrieval. Default: true Type: Boolean |
"retrieval_options": {
"constraint": {
"unparsed": "title:management"
},
"search_request": {
"term": "mind"
},
"use_inputs ": false
}
The structure depends on the "stream" field in the request.
stream (Request) | Response structure |
true (Default) | TokenStreamEvent |
false | GeneratTextResponse |
When streaming, only the last TokenStreamEvent contains the complete generated text.
data: {"token": {"text": " Daniel", "logprob": -0.055236816, "id": 4173}}
data: {"token": {"text": " ", "logprob": -0.0005774498, "id": 32106}}
data: {"token": {"text": " case", "logprob": -7.176399e-05, "id": 2589}}
...
"data": {
"token":{
"text":"</s>",
"logprob":-0.22509766,
"id":1,
"special":true
},
"generated_text": "Daniel Fallmann is the CEO of Mindbreeze.\n\nRetrieved Sources: ...",
"details":{
"finish_reason": "eos_token",
"generated_tokens":19.0,
"seed":null
},
"content_processed": "Given the following ..."
}
Contains information on the generated token
text | Text content of the token. Type: String |
logprob | Logarithmic probability of the token. Type: Float ]-inf,0] |
id | Identification of the token in relation to its context Type: Integer |
special | Token has a special meaning (e.g. end-of-sequence). Type: Boolean |
"token": {
}, "text": "mind",
"logprob": -0.0029792786,
"id": 1,
"special": false
}
The complete generated text, i.e. all streamed tokens except special tokens.
"generated_text": "The CEO of Mindbreeze is...",
finish_reason | Reason for completing the token generation (e.g. "eos_token"). Type: String |
generated_tokens | Number of tokens generated. Type: Float |
seed | When generating used Seed. Type: String | null |
"details": {
}, "finish_reason": "eos_token",
"generated_tokens": 51.0,
"seed": null
}
Contains the text that was sent to the LLM as input for generation (prompt). The text is generated using the prompt template of the pipeline used.
"content_processed": "Given the following extracted parts of ..."
Without streaming, only the generated text is returned, without additional information as with streaming.
{
"generated_text": "Daniel Fallmann is the CEO…"
}
generated_text | Generated Text Type: String |