Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
Using the Box Connector, files and folders from Box can be indexed with their metadata.
You can create a new app in the Box Dev Console. To do this, click Create New App under My Apps and select Custom App. For Authentication Method, select Server Authentication (Client Credentials Grant) and give the app a name. Then click “Create App” to create the app.
In the Configuration area of the created app you can then view and retrieve the Client ID and the Client Secret. These are needed for the option "OAuth Credential" in the MMC.
In addition, the "App Access Level" and "Application Scopes" options must be set in the Configuration area. The Box Crawler requires App + Enterprise Access and the following “Application Scopes”:
In addition, the option "Make API calls using the as-user header" must be activated in the "Advanced Features".
After that you can click on "Review and Submit" in the Authorization section of the app so that the app can be approved by the admin. The authorization can be done in the Admin Console under the tab Apps -> Custom Apps Manager.
Open the Mindbreeze Management Center in the browser to start configuration.
In the Indices tab, add a new index using the +Add Index button. Select the desired Index Node and Client Service and specify the data source Box in the Data Source field. Then confirm your entries with the Apply button.
Now configure the data source.
Legend:
Enterprise Id* | The Enterprise ID of your Box instance. You can find it in the Box Admin Console under "Account & Billing". Alternatively, you can go to https://www.box.com/master/settings and log in as Enterprise Admin. | ||||||||
Box Domain* | The URL of your box instance, e.g. https://mycompany.app.box.com/ | ||||||||
OAuth Credential* | The OAuth 2 credential created in the Network tab.
| ||||||||
Page Size | The maximum number of elements that are fetched per API request. If this is increased, fewer requests may need to be made to the API, but it may result in increased memory usage. The maximum number is 1000. | ||||||||
Log All Requests | If enabled, all requests to the Box API are written to a "request-log.csv" file. |
User Emails* | E-mail addresses of the users whose content is to be indexed. All content to which the specified users have access is indexed. If you want to have precise control over which content is indexed by the crawler, you can create a separate user who can see all the content to be indexed. More on this in the chapter Creating a crawling user. |
Excluded Files/Folders (regex) | If this option is configured, those files and directories that match the specified pattern (Regular Expression) will be ignored. The regex is applied to the full path, e.g. Parentfolder/Childfolder/MyFile.docx Excludes have higher priority than includes (i.e. if a document is both included and excluded, it will not be indexed). |
Maximum File Size (MB) | The maximum size of files (in MB) whose content is to be indexed. If a file exceeds this size, it will be indexed without the file content and only with the metadata. |
Index Only Files | If enabled, folders are not indexed as documents. |
Fetch Custom Metadata | If enabled, the custom metadata is additionally fetched for all files and folders. If you do not use these, you should disable this option to speed up the crawl run. |
Included Files/Folders (regex) | If this option is configured, only those files and directories are indexed which match the specified pattern (Regular Expression). The regex is applied to the full path, e.g. Parentfolder/Childfolder/MyFile.docx If this option is left empty, everything will be included. Excludes have higher priority than includes (i.e. if a document is both included and excluded, it will not be indexed). |
In the new or existing service, select the Box Principal Resolution Service option in the Service setting. For more information about additional configuration options and how to create a cache and how to do the basic configuration of a cache for a Principal Resolution Service, see Installation & Configuration - Caching Principal Resolution Service.
These configuration options are described in the chapter Crawler Settings.
If you want to have precise control over which content is indexed by the crawler, you can create a separate user who can see all the content to be indexed.
A new user can be created in the Admin Console in the menu item Users & Groups.
To give this user access to all folders that are to be indexed, there are two options:
This user must then also be activated via a login so that it can be used by the Box crawler.