Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names used are registered trade names and/or registered trademarks of the respective manufacturers.
These documents are highly confidential. No rights to our software or our professional services, or results of our professional services, or other protected rights can be based on the handing over and presentation of these documents. Distribution, publication or duplication is not permitted.
.
.
Mindbreeze provides languge dectection for documents using the LangugageDector ItemTransformer plugin.
To use the language detection the LanguageDetector has to be added to you Mindbreeze installation by loading the corresponding plugin (the Item Transformation Services are included in the package “ Mindbreeze Item Transformation Plugins”). Install the plugin use the manager UI.
The plugin also has to be included in your Mindbreeze license.
The LanguageDetector plugin can be used not only as an item transformation service, but also as a separate service. This can provide performance advantages for large installations with multiple indices, since only one single LanguageDetector service is operated for all indices and not one instance per index.
To run the LanguageDetector plugin as a standalone service, install the MetadataTransformationService-<version>.zip plugin. Add a new service in the "Indices" tab in the "Services" section and select "ItemTransformationServicePlugin.LanguageDetector". In the settings of the new service set a "Display Name" and the "Bind port" to a free TCP port. The remaining settings are to be set according to the section "Configuration". Finally, switch to the "Indices" section in the "Indices" tab and add an Item Transformation Service to the respective index and reference the created service.
Here you can find a list of language profiles supported by the LanguageDetector. The listed languages can be used in the configuration of the LanguageDetector ("Included Languages" option). The option "Short Text Algorithm Text Length" defines for which text lengths the long or short text profile of the respective languages is selected. For more information on configuration, see below.
Abbreviation | Language | Long text | Short text profile available |
af | Afrikaans | X | |
an | Aragonese | X | |
ar | Arabic | X | |
ast | Asturian | X | |
be | Belarusian | X | |
br | Breton | X | |
ca | Catalan | X | |
bg | Bulgarian | X | |
bn | Bengali | X | |
cs | Czech | X | X |
cy | Welsh | X | |
da | Danish | X | X |
de | German | X | X |
el | Greek | X | |
en | English | X | X |
es | Spanish | X | X |
et | Estonian | X | |
eu | Basque | X | |
fa | Persian | X | |
fi | Finnish | X | X |
fr | French | X | X |
ga | Irish | X | |
gl | Galician | X | |
gu | Gujarati | X | |
he | Hebrew | X | |
hi | Hindi | X | |
hr | Croatian | X | |
ht | Haitian | X | |
hu | Hungarian | X | |
id | Indonesian | X | X |
is | Icelandic | X | |
it | Italian | X | X |
ja | Japanese | X | |
km | Khmer | X | |
kn | Kannada | X | |
ko | Korean | X | |
lt | Lithuanian | X | |
lv | Latvian | X | |
mk | Macedonian | X | |
ml | Malayalam | X | |
mr | Marathi | X | |
ms | Malay | X | |
mt | Maltese | X | |
ne | Nepali | X | |
nl | Dutch | X | X |
no | Norwegian | X | X |
oc | Occitan | X | |
pa | Punjabi | X | |
pl | Polish | X | X |
pt | Portuguese | X | X |
ro | Romanian | X | X |
ru | Russian | X | |
sk | Slovak | X | |
sl | Slovene | X | |
so | Somali | X | |
sq | Albanian | X | |
sr | Serbian | X | |
sv | Swedish | X | X |
sw | Swahili | X | |
ta | Tamil | X | |
te | Telugu | X | |
th | Thai | X | |
tl | Tagalog | X | |
tr | Turkish | X | X |
uk | Ukrainian | X | |
ur | Urdu | X | |
vi | Vietnamese | X | X |
yi | Yiddish | X | |
zh-cn | Simplified Chinese | X | |
zh-tw | Traditional Chinese | X |