Installation and Configuration
Mindbreeze GmbH, A-4020 Linz, 2021.
All rights reserved. All hardware and software names used are registered trade names and/or registered trademarks of the respective manufacturers.
These documents are highly confidential. No rights to our software or our professional services, or results of our professional services, or other protected rights can be based on the handing over and presentation of these documents.
Distribution, publication or duplication is not permitted.
The term ‘user‘ is used in a gender-neutral sense throughout the document.
Before installing the Documentum Connector ensure that the Mindbreeze Server is already installed and this connector is also included in the Mindbreeze license.
Needed Rights for Crawling
The Documentum Connector allows you to index and search in Documentum repository.
The following requirements must be met before configuring Documentum connector:
- Superuser name and password
Configuration of Mindbreeze
Click on the “Indices” tab and then on the “Add new index” symbol to create a new index.
Enter the index path, e.g. “/data/Indices/documentum”. Change the Display Name of the Index Service and the related Filter Service if necessary.
Add a new data source with the symbol “Add new custom source” at the bottom right.
Configuration of Data Source
- Superuser: user name of the superuser user.
- Password: password of superuser.
- Respository Name: the repository name.
- Webtop URL: the URL to webtop e.g. http://documentum.mycompany.com:9080/webtop/
- DFC Properties File: path to DFC properties file. Place the file in the config subfolder of dfc.data.dir. Verify that the followings properties are configured. (see dfc.properties file on Documentum server)
Notes on deleting documents
By default, documents deleted in Documentum are automatically removed from the index using Documentum Audit Trail. If the Audit Trail is not available, (e.g. due to missing access rights) there are following options to delete documents in this case:
- Trash Bin: A certain folder can be used as a trash bin in Documentum. Documents that are moved to the trash bin will be deleted from the index during the next crawl run. See option "Trash Bin Path Pattern".
- Delete Not Existing Documents: Periodically the index is compared with the Documentum database to detect deleted documents. See "Delete Not Existing Documents Schedule" option.
It is possible to limit the data that should be crawled, for instance some particular document.
- Object Type: defines the root object type.
- DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document')
- If this field is empty then dm_sysobject is used as r_object_type.
- (DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_sysobject')
- Additional Object Type: enables crawling of further particular object types for example: custom_document.
- DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document' OR r_object_type='custom_document')
- Index Constraint (DQL): Restricts the crawling of documents with some certain properties. For example documents modified after 2012-10-01.
- DQL: SELECT * FROM sysobject WHERE (r_object_type='dm_document' OR r_object_type='custom_document') AND (r_modify_date > date('2012-10-01 08:00:00','yyyy-mm-dd hh:mi:ss')).
- Trash Bin Path Pattern: Defines a path to a directory using a regular expression (Java). Documents in this path are not indexed. Existing documents are removed from the index if they are moved to this path.
Crawler Performance Settings
- Batch Size: the number of documents that are sent to index and after which the connector state (checkpoint) is persisted. For example if Batch Size is 500 the following DQL query is used:
- DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document') ORDER BY r_modify_date, r_object_id ENABLE (return_top 500)
- Number of Threads: the number of threads which crawls documents in parallel. All documents are partitioned according to their IDs. For example one thread crawls all documents that have IDs ending with ‘1’.
- DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document') AND (r_object_id LIKE ‘%1’)
- Synchronize with Index on Startup: the crawler persist its state periodically, from which it resumes documents traversals. If some documents are not indexed correctly because of transport or filter errors this option can be used to synchronize index on startup.
- Disable Query for Deleted Documents: When selected, deleted documents are not removed from the index. For example, if the user does not have permission to the audit trail, this setting should be selected to avoid errors during crawling.
- Delete Not Existing Documents Schedule: If configured, at certain times the current index is compared with the Documentum database and documents deleted in the Documentum are also removed from the index. The format is a cron expression (quartz). Example: 0 0 22 1/1 * ? * (Daily at 22:00) (Default value: not set)
To keep the ACLs of indexed documents up to date, the dm_save, dm_destroy, and dm_saveasnew events for the dm_acl object type are audited (see the audit management reference for Documentum). The crawler searches the entries in the dm_audittrail_acl table for these events at each crawl run.
- Disable Query For Modified ACLs: Allows you to disable ACL updates. This means that no queries are performed to find the changed ACLs. If this option is selected, the crawler must be restarted to perform ACL updates.
- Disable Processing ACL Updates: Allows you to disable ACL updates. This means that no further queries are performed to locate the document concerned. If this option is selected, the crawler must be restarted to perform ACL updates. For example, if the user does not have permission to the audit trail, this setting should be selected to avoid errors during crawling.
Audit Trail Clean-up
The crawler detects documents which are deleted by tracking events “Audit Trail Event Type (DQL)” in audit trail (dm_audittrail ). If the user provided in “Documentum Connection” section does not have access right to audit trail, another user can be configured here.
If required, the ACLs of the documents can be overwritten statically. To do this, activate the "Advanced Settings". In the section "Authorization Settings" the functionality is activated with "Enable Static Access Rules". "Access Check Principal" determines the name of the authorized or unauthorized principal. The "Access Check Action" determines whether the principal is authorized or not.
Note: If the rules are changed and documents already exist in the index, the option "Synchronize with Index on Startup" must be activated in the "Crawer Performance Settings" so that the changes are applied when the crawler is started.