Documentum Connector
Installation and Configuration

Installation

Before installing the Documentum Connector ensure that the Mindbreeze Server is already installed and this connector is also included in the Mindbreeze license.

Needed Rights for Crawling

The Documentum Connector allows you to index and search in Documentum repository.

The following requirements must be met before configuring Documentum connector:

Superuser name and password

Configuration of Mindbreeze

Click on the “Indices” tab and then on the “Add new index” symbol to create a new index.

Enter the index path, e.g. “/data/Indices/documentum”. Change the Display Name of the Index Service and the related Filter Service if necessary.

Add a new data source with the symbol “Add new custom source” at the bottom right.

Configuration of Data Source

Documentum Connection

Superuser: user name of the superuser user.
Password: password of superuser.
Respository Name: the repository name.
Webtop URL: the URL to webtop e.g. http://documentum.myorganization.com:9080/webtop/
DFC Properties File: path to DFC properties file. Place the file in the config subfolder of dfc.data.dir. Verify that the followings properties are configured. (see dfc.properties file on Documentum server)
- dfc.data.dir=/data/documentum
- dfc.docbroker.host[0]=documentum.myorganization.com
- dfc.docbroker.port[0]=1489

Notes on deleting documents

By default, documents deleted in Documentum are automatically removed from the index using Documentum Audit Trail. If the Audit Trail is not available, (e.g. due to missing access rights) there are following options to delete documents in this case:

Trash Bin: A certain folder can be used as a trash bin in Documentum. Documents that are moved to the trash bin will be deleted from the index during the next crawl run. See option "Trash Bin Path Pattern".
Delete Not Existing Documents: Periodically the index is compared with the Documentum database to detect deleted documents. See "Delete Not Existing Documents Schedule" option.

Document Types

It is possible to limit the data that should be crawled, for instance some particular document.

Object Type: defines the root object type.
DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document')
If this field is empty then dm_sysobject is used as r_object_type.
(DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_sysobject')

Additional Object Type: enables crawling of further particular object types for example: custom_document.
DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document' OR r_object_type='custom_document')
Index Constraint (DQL): Restricts the crawling of documents with some certain properties. For example documents modified after 2012-10-01.

DQL: SELECT * FROM sysobject WHERE (r_object_type='dm_document' OR r_object_type='custom_document') AND (r_modify_date > date('2012-10-01 08:00:00','yyyy-mm-dd hh:mi:ss')).
Trash Bin Path Pattern: Defines a path to a directory using a regular expression (Java). Documents in this path are not indexed. Existing documents are removed from the index if they are moved to this path.

Additional Connector Settings (Advanced)

Connector State Directory Path: The path to a directory in which the crawler persists the status of the documents already indexed, which is used after a crawl run or restart of the crawler. If this field is empty, a directory is created in /data/servicetempdata/.

Crawler Performance Settings

Batch Size: the number of documents that are sent to index and after which the connector state (checkpoint) is persisted. For example if Batch Size is 500 the following DQL query is used:
DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document') ORDER BY r_modify_date, r_object_id ENABLE (return_top 500)

Number of Threads: the number of threads which crawls documents in parallel. All documents are partitioned according to their IDs. For example one thread crawls all documents that have IDs ending with ‘1’.
DQL: SELECT * FROM dm_sysobject WHERE (r_object_type='dm_document') AND (r_object_id LIKE ‘%1’)

Synchronize with Index on Startup: the crawler persist its state periodically, from which it resumes documents traversals. If some documents are not indexed correctly because of transport or filter errors this option can be used to synchronize index on startup.
Disable Query for Deleted Documents: When selected, deleted documents are not removed from the index. For example, if the user does not have permission to the audit trail, this setting should be selected to avoid errors during crawling.
Delete Not Existing Documents Schedule: If configured, at certain times the current index is compared with the Documentum database and documents deleted in the Documentum are also removed from the index. The format is an extended cron expression. Example: 0 0 22 1/1 * ? * (Daily at 22:00) (Default value: not set).
Documentation and other examples of cron expressions can be found here.

Updating ACLs

To keep the ACLs of indexed documents up to date, the dm_save, dm_destroy, and dm_saveasnew events for the dm_acl object type are audited (see the audit management reference for Documentum). The crawler searches the entries in the dm_audittrail_acl table for these events at each crawl run.

Disable Query For Modified ACLs: Allows you to disable ACL updates. This means that no queries are performed to find the changed ACLs. If this option is selected, the crawler must be restarted to perform ACL updates.
Disable Processing ACL Updates: Allows you to disable ACL updates. This means that no further queries are performed to locate the document concerned. If this option is selected, the crawler must be restarted to perform ACL updates. For example, if the user does not have permission to the audit trail, this setting should be selected to avoid errors during crawling.

Audit Trail Clean-up

The crawler detects documents which are deleted by tracking events “Audit Trail Event Type (DQL)” in audit trail (dm_audittrail ). If the user provided in “Documentum Connection” section does not have access right to audit trail, another user can be configured here.

Static ACLs

If required, the ACLs of the documents can be overwritten statically. To do this, activate the "Advanced Settings". In the section "Authorization Settings" the functionality is activated with "Enable Static Access Rules". "Access Check Principal" determines the name of the authorized or unauthorized principal. The "Access Check Action" determines whether the principal is authorized or not.

Note: If the rules are changed and documents already exist in the index, the option "Synchronize with Index on Startup" must be activated in the "Crawer Performance Settings" so that the changes are applied when the crawler is started.

Principal Resolution Service

In the new or existing service, select the Documentum Principal Resolution option in the Service setting. For more information about additional configuration options and how to create a cache and how to do the basic configuration of a cache for a Principal Resolution Service, see Installation & Configuration - Caching Principal Resolution Service.

Configuration - Documentum Connection

Superuser Credential	Credential of the superuser that will be used for crawling. Must contain the username and password. These credentials should match the credentials set in the Documentum Connector settings. Unlike the connector, these credentials must be configured as Mindbreeze credentials in the Network tab.
Repository Name	The name of the Documentum repository to connect to. Must match the credentials set in the Documentum Connector settings.
DFC Properties File	The path to the “DFS Properties” file. Should match the path specified in the Documentum Connector settings. The file must be located in the config directory of dfc.data.dir. The dfc.data.dir must have write permissions with the user mes. The following properties must be configured (see dfc.properties file on the Documentum server): dfc.data.dir=/documentum dfc.docbroker.host[0]=documentum.myorganization.com dfc.docbroker.port[0]=1489
Exclude LDAP groups from Cache (Advanced Setting)	If this option is selected, groups assigned via LDAP will be excluded from this cache. Only use this option if you have configured an LDAP cache as a parent cache. It is necessary that the imported LDAP groups are always synchronised in Documentum.
Get User Distinguished Ldap Name from Documentum (Advanced Setting)	If this option is enabled, the Principal Resolution Cache will no longer make a separate LDAP request for each user. With this the performance can be improved. This only works for users imported from LDAP. Users who are not from LDAP will find fewer or no documents. Of course, it is necessary that the imported LDAP users are always synchronised in Documentum.

General information

The following username aliases are used by the Principal Resolution Service to authenticate users:

user_name from Documentum (from database column dm_user)
user_login_name from Documentum (from database column dm_user)
LDAP Distinguished Name(DN)via LDAP client. If the option Get User Distinguished Ldap Name from Documentum is enabled, the value user_ldap_dn from Documentum (from database column dm_user) is used instead.

This means that a Mindbreeze end user must authenticate to Mindbreeze with one of these three usernames to find documents from Documentum.

Documentum Connector
Installation and Configuration

Installation

Needed Rights for Crawling

Configuration of Mindbreeze