Mindbreeze GmbH, A-4020 Linz, 2019.
All rights reserved. All hardware and software names used are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes, or any other protected rights. The dissemination, publication, or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
Before installing the SharePoint Online connector, make sure that the Mindbreeze server is installed and the SharePoint Online connector is included in the license. Use the Mindbreeze Management Center to install or update the connector.
To install the plug-in, open the Mindbreeze Management Center. Select “Configuration” from the menu pane on the left-hand side. Then navigate to the “Plugins” tab. Under “Plugin Management,” select the appropriate zip file and upload it by clicking “Upload.” This automatically installs or updates the connector, as the case may be. In the process, the Mindbreeze services are restarted.
Select the “Advanced” installation method for configuration.
To create a new index, navigate to the “Indices” tab and click the “Add new index” icon in the upper right corner.
Enter the path to the index and change the display name as necessary.
Add a new data source by clicking the “Add new custom source” icon at the top right. Select the category “Microsoft SharePoint Online” and configure the data source according to your needs.
In the "Sharepoint Online" area you can define your Microsoft SharePoint Online installation that is to be indexed. The following options are available:
The URL of the Sharepoint Online instance, e.g.: https://mycompany.sharepoint.com
“Admin Server URL“
The admin URL of the SharePoint Online instance. Often this is just the server URL with the suffix -admin. e.g: https://mycompany-admin.sharepoint.com
"Site Relative URL.
The relative paths to the sites to be crawled, starting with a slash, e.g.: /sites/mysite.
Each line can contain a path.
If this is left empty all detected sites are crawled.
If Sites are specified here only this sites and their subsites are crawled
"Included Sites URL (regex)"
Regular expression that can be used to specify which subsites are to be crawled. If this option is left empty, all subsites will be crawled. The regex matches releative URLs. e.g /sites/mysite
"Excluded Sites URL (regex)"
Regular expression that can be used to specify which subsites are to be excluded. The regex matches releative URLs. e.g /sites/mysite
"Included Lists/Files/Folders URL (regex)"
Regular Expression, which can be used to specify which lists, files and folders should be included. The metadata "url" (absolute url) is compared. If this option is left empty, everything is included.
Note: If you want to include/exclude complete subsites, please use the option "Included Sites URL (regex)" or "Excluded Sites URL (regex)"
"Excluded Lists/Files/Folders URL (regex)"
Regular Expression, which can be used to specify which lists, files and folders should be excluded. The metadata "url" (absolute url) is compared.
For example, if you find a document in the Mindbreeze search that you want to exclude, you can copy the URL from the "Open" action and use it in the "Excluded Lists/Files/Folders URL (regex)" option
“Included Metadata Names (regex)”
Regular expression used to include generic metadata with the name of the metadata. If nothing is specified, all metadata is included. The regex is applied to the name of the metadata (without the sp_ prefix).
“Excluded Metadata Names (regex)”
Regular expression that excludes generic metadata with the name of the metadata. If nothing is specified, all metadata is included. The regex is applied to the name of the metadata (without the sp_ prefix).
“Included Content Types (regex)”
Regular expression that includes content types (e.g. file, folder) via the name of the content type. If nothing is specified, all content types are included. The content type of objects can be found in the contenttype metadata.
“Excluded Content Types (regex)”
Regular expression that excludes content types (e.g. file, folder) via the name of the content type. If nothing is specified, all content types are included. The content type of objects can be found in the contenttype metadata.
“Use delta key format”
If set, a different format is used for the keys. Certain functions of the delta crawl (e.g. renaming lists, deleting attachment files) do not work without this option. If this option is changed, the index should be cleaned and re-indexed.
“Enable Delta Crawl”
If set, the Sharepoint Online API will only fetch changes to files instead of crawling over the whole Sharepoint Online instance. For full functionality "Use delta key format" should be set.
This option is still in development and should not be used yet. If you enable it anyways, it can lead to inconsistencies regarding deleted files.
"Crawl hidden lists"
If set, lists that are defined as hidden are also indexed
“Max Change Count Per Site”
Number of changes that are processed in the delta crawl per page before the next page is processed. The remaining changes are processed at the next crawl run.
"Crawl lists with property 'NoCrawl'"
If this option is set, those lists are also indexed that have the "NoCrawl" property in Microsoft SharePoint Online
“Parallel Request Count”
Limits the number of parallel HTTP requests sent by the crawler.
"Max Content Length (MB)"
Limits the maximum document size. If a document is larger than this limit, the content of the document is not downloaded (the metadata is retained).
The default value is 50 megabytes
„Thumbnail Generation for Web Content“ (Advanced Setting)
If set, thumbnails are generated for web documents. It is not recommended to enable this feature as it only works for public pages with anonymous access which have already been discontinued.
“Dump Change Responses”
If set, the Sharepoint API responses are written to a log file during delta crawling.
“Log All HTTP Requests”
If set, all HTTP requests sent by the crawler during the crawl run are written to a .csv file (sp-request-log.csv).
Custom Delta CSV Path
With this option a path to a .csv file can be specified, with which own delta points can be set. Each line must contain two entries separated by semicolons: First the Site Relative URL and then a time in the format yyyy-MM-ddTHH:mm:ssZ.
Example: /sites/MySite; 2019-10-02T10:00:00Z
This state is not adopted if a state already exists. If the old state is to be overwritten, it must be deleted from the DeltaState file.
„Ignore Sharepoint ACLs“ (Advanced Setting)
If set, no access permissions to lists or documents are fetched from Sharepoint. This option can only be set if at least one Site ACL is configured.
With this option you can set your own ACLs. The Site URL Pattern is a regular expression for which pages this principal should be configured, Access Check Action can be used to select whether it is a grant or deny, and Principal is used to specify the group/user to which the principal should apply (e.g. everyone or firstname.lastname@example.org).
Only enter the URL for the Azure ACS endpoint in the “Azure ACS endpoint” field if your SharePoint environment is hosted in a special environment (such as Germany).
The following environments require special URLs:
Configure the options as follows:
“Use App-Only authentication”
When this option is selected, app-only authentication is used instead of username and password authentication. If this option is selected, “Client ID” and “Client secret” also need to be configured. In addition, you need to perform all the “App Registration in Sharepoint” steps below.
The client ID that is generated as described below.
The client secret that is generated as described below.
App Registration in Sharepoint: Step 1
Click the two buttons "Generate" (for "client Id" and for "client secret") and enter the other information as follows:
Then click “Create."
Then enter the client id and the client secret into the Mindbreeze InSpire configuration. Otherwise you will not be able to access the client secret later.
App Registration in Sharepoint: Step 2
Enter the client id in the “App Id” field and click “Lookup.” “Title,” “App Domain,” and “Redirect URL” will be filled in automatically. Then enter the following in the “Permission Request XML” field:
Note: "FullControl" is required so that Mindbreeze InSpire has access to the access rights of the SharePoint documents to be indexed in order to map the authorizations in Mindbreeze InSpire.
Then click “Create."
App Registration in Sharepoint: Step 3
Additional rights are required so that the ACL information on the users and groups required by the Principal Resolution Service can also be downloaded from SharePoint Online.
Now enter the following URL in the browser:
ATTENTION: Make sure that you are on the admin page. For example, if the URL is https://mycompany.sharepoint.com, then the admin page is usually https://mycompany-admin.sharepoint.com.
Enter the Client Id in the "App Id" field and activate the "Lookup" button. "Title", "App Domain" and "Redirect URL" will be filled in automatically. Then enter the following in the "Permission Request XML" field:
Then activate the "Create" button.
“Do Not Request File Author Metadata”
If active no authors are requested for document libraries. This may help resolve the following Error: „HTTP 500: User cannot be found".
“List All Content Types“
If active, an all-content-types.csv file is created in the log directory at the beginning of a crawl run, which contains all content types of all lists of all configured pages.
“Include Unpublished Documents“
If active, all documents/items are always indexed in their most recent version, regardless of whether they have already been published. If this option is disabled, only published sites and only major versions (1.0, 2.0 etc., usually created with each publish) will be indexed.
Select “Advanced Settings” to configure the following settings.
Enable the option “Enforce ACL Evaluation.”
Add a new service under “Services” by clicking on “Add new service.” Select “SharepointOnlinePrincipalCache” and assign a display name.
Enter the information about your Microsoft SharePoint Online installation under “Sharepoint Settings.” “Server URL” and “Site Relative URL” must match the settings in the “Data Source” area.
With the option "Enable Delta Update", you can set whether only the changes to the groups should be fetched from Sharepoint Online after the first cache creation, instead of fetching all groups each time. This is especially recommended for very large Sharepoint instances, as a regular cache update can otherwise take a long time. With the advanced option "Dump Change Responses", the changes that we receive from Sharepoint Online during the delta update can also be dumped into a file. This is very helpful for troubleshooting.
Under “Regex for your organization” you can enter a regular expression that defines whether or not a user belongs to your organization. The regular expression can refer to the e-mail address, the ObjectSID, or the ObjectGUID from LDAP.
The option "Parallel Request Count" can be used to define how many HTTP requests are sent simultaneously by the crawler. The higher the value, the faster the crawl run should be, but too high a value can also lead to a lot of "Too Many Requests" errors on the Sharepoint side. A value above 30 is not recommended.
This is only necessary if you have also configured app-only authentication for the data source.
If you have not configured “AD Connect” in the Azure Active Directory, select “AD Connect is NOT configured” and fill in the fields “Tenant Context ID,” “Application ID,” “Generated Key,” and “Protected Resource Hostname.” You can find the corresponding values in the Azure Portal.
You have to register a new app under the tab "Azure Active Directory" and then under the tab "App registrations":
In the newly generated app you will find the application id and under the tab "Certificates & Secrets" you can generate a key.
If AD Connect is set up in your Azure Active Directory, do not enable the “AD Connect is NOT configured” option.
The following table lists the protected resource hostnames for different cloud environments:
The following values should be entered in the LDAP cache under “User Alias Name LDAP Attributes” or “User Alias Name LDAP Attributes”:
Enter the information about the LDAP cache under “LDAP Settings.” Enable the option “Use LDAP Principal Cache Service” and enter the corresponding port of your LDAP principal cache.
Under “Cache Settings,” configure where you want the database for the cache to be located and set the desired interval for the updates.
Under “Service Settings,” enter a free port to be used for the principal cache and enable the “Lowercase Principals” option so that the SharePoint groups can be resolved correctly.
Only enter the URLs for Azure AD Endpoint and Azure ACS Endpoint in the “Azure AD Endpoint” and “Azure ACS endpoint” fields if your SharePoint environment is hosted in a special environment (such as Germany).
The following environments require special URLs for Azure AD Endpoint:
The following environments require special URLs for Azure ACS Endpoint:
If you are using app-only authentication, this section is NOT applicable to you. Otherwise, proceed as follows:
Navigate to the “Network” tab and add a new credential for Microsoft SharePoint Online under “Credentials” by clicking “Add Credential.”
Enter the credentials for the user you want to use for indexing and assign a name for the credential. Select a user with adequate permissions to read all relevant pages and authorizations.
Then add a new endpoint for the credential you just created by clicking on “Add Endpoint” under “Endpoints.” Enter the server URL of your Microsoft SharePoint Online installation as the location and select the credential you just created.
With the help of the Sharepoint Online Connector, OneDrive pages can also be crawled.
For this you have to consider some points for the configuration:
A login token with the entered username/password credentials is retrieved.
Login cookies are retrieved with the previously retrieved token. The cookies of the admin URL are required for SiteDiscovery.
With the previously retrieved cookies, a digest hash is retrieved, which is required for SiteDiscovery.
With this endpoint all sites of the Sharepoint Online instance will be discovered..
With this endpoint the direct subpages of pages are fetched.
This endpoint retrieves all lists of a page and some additional metadata, including the RoleAssignments field, for which the Enumerate Permissions permissions are required, which are only included in FullControl.
This endpoint retrieves all items in a list and some additional metadata, including the RoleAssignments field, for which the Enumerate Permissions permissions are required, which are only included in FullControl.
<Direkter Link auf eine Datei>/$value
This endpoint is used to download the contents of a file.
<Direkter Link auf ein Listitem>/ ListItemAllFields
All metadata of a list item are retrieved with this endpoint.
With this endpoint, all groups of a page and all users in these groups are fetched. For this endpoint "Enumerate Permissions" rights are required, which are only included in FullControl.