Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names are brand names and/or trademarks of their respective manufacturers.
These documents are strictly confidential. The submission and presentation of these documents does not confer any rights to our software, our services and service outcomes or other protected rights. The dissemination, publication or reproduction hereof is prohibited.
For ease of readability, gender differentiation has been waived. Corresponding terms and definitions apply within the meaning and intent of the equal treatment principle for both sexes.
The Google Drive Connector generally uses a server-to-server authentication when communicating with Google. This is the method recommended by Google, and there is no need for anyone to be present to operate the connector. This method requires a Google Service Account that has the “G-Suite Domain-wide Delegation” option. This account can then be used to “impersonate” any person in the G-Suite (impersonation) and their files. If you can use this method, follow the steps below in the "Service Account" sections.
If you do not have a Google service account, the alternative method is to run the Google Drive Connector using OAuth. No Google Service Account is required for this method. The OAuth method is not recommended, and it should only be used in exceptional cases. With this very complicated method, the person whose files are to be indexed must be present to enter the password in the Google OAuth process one time and accept the read permissions. If you want to use this non-recommended method, follow these steps in the "OAuth" sections.
Navigate to console .developers.google.com/ and register there.
Click on “Credentials” on the left-hand menu bar and then click on “Manage Service Accounts“.
Click on "Create Service Account" and enter an account name. Select the "Service Account Admin" role, and click "Enable G Suite Domain-wide Delegation".
Open the account options on the right-hand side and click "Create Key".
Select "P12" and save the .p12 file.
Note: The .p12 file is necessary for the Google Driver Crawler and Google Drive Principal Resolution Service.
Now you should see the client ID of your Google service account. You’ll need to note this ID because it is necessary for the next step (1.2.2).
Navigate to Gsuite https://gsuite.google.com/intl/de/ and log in with your domain with access to the Admin console. If necessary, log in with the same user that you used to register at console.developers.google.com.
Click “Security > Show more > Advanced Settings > Manage API Client Access
Enter the client ID of the Google service account in the client name field.
Enter the correct API ranges.
Necessary entries:
https://www.googleapis.com/auth/admin.directory.group.readonly,https://www.googleapis.com/auth/admin.directory.user.readonly,https://www.googleapis.com/auth/drive.readonly
A Google account within a GSuite domain is required. The account is used to index files that this account can access.
This step is required to obtain access to the Google Drive files from the account.
Pull up the “Library Page” in the API Console. (https://console.developers.google.com/apis/library). Make sure you are logged on with the correct account.
Click on “G Suite APIs” => “Drive API”.
When the Google Drive API page says: "A project is needed to enable APIs", then click "Create Project" and then "Create," and specify a project name, such as "Mindbreeze Crawler."
Return to the Google Drive API page and click the "Enable" button.
Open the Credentials page by clicking the "Credentials" button on the left side. (https://console.developers.google.com/apis/credentials)
Click "Create credentials" and choose "OAuth Client ID"
When the page prompts you to set up the Consent Screen, click "Configure consent screen". Set the "Product name shown to users" to: "Mindbreeze Crawler", for example. Then click on "Save".
Back on the Create Client ID page, you will be prompted by a wizard to select which “Application Type” you want to use. Select "Other" and enter a name, such as "Mindbreeze Crawler", and click "Create".
A pop-up dialog appears, "OAuth Client". Close the dialog by clicking on "OK". The displayed information will be downloaded separately later. The newly created credential appears in the list. Click on the "Download JSON" icon on the right-hand side to download the credential as a JSON file.
This step is necessary to enable the account to have access to the users and group names of GSuite. From this information, the access rights for a specific file are computed for search.
Enable the Admin SDK
Pull up the “Library Page” in the API Console. (https://console.developers.google.com/apis/library). Make sure you are logged on with the correct account.
Click on “G Suite APIs” -> “Admin SDK”
Make sure that the correct project (for example, "Mindbreeze Crawler") is selected above. Then click on "Enable".
Only the “Users.Read” and “Groups.Read” permissions are required. This means that the Google account is only able to read its own files and the names of the groups and users in the GSuite domain. The Google account cannot impersonate other users. The Google account cannot read files from other users unless the files have been explicitly shared.
Open the Google Admin Console (https://admin.google.com) and login with a GSuite account with administrator privileges.
Click "Admin roles" to navigate to the Admin roles page.
Click "Create a new Role". This role is configured with minimal access rights. Select a name (for example, "Mindbreeze Crawler") and click "Create".
The "Privileges" tab for the new role will appear. Under the "Admin API Privileges" section, expand the "Users" entry and check the "Read" box. Then, expand the "Groups" entry and check the "Read" box. Then click "Save" below. The Mindbreeze Crawler role now has the Admin API permissions “Users.Read” and “Groups.Read”.
Navigate to the "Admins" tab above. Click on "Assign Admins". Enter the e-mail address of the Google Account that you used to activate the Admin SDK. Click "Confirm Assignment". The Mindbreeze Crawler role now has an assigned administrator.
Before you install the Google Drive connector, make sure that the Mindbreeze server is installed and the Google Drive Connector is included in the license. To install or update the connector, use the Mindbreeze Management Center.
To install the plug-in, open the Mindbreeze Management Center. Select "Configuration" from the menu on the left-hand side. Then navigate to the "Plugins" tab. In the "Plugin Management" section, select the appropriate zip file and upload it by clicking the "Upload" button. This automatically installs or updates the connector, such as the case may be. In this process, the Mindbreeze services are restarted.
Select the installation method "Advanced" for configuration.
In a new or existing service, select the option Google Drive Principal Resolution Service in the setting service. For more information on creating and configuring a basic configuration of a Principal Resolution Service cache, see Installation & Configuration - Caching Principal Resolution Service.
In the "Connection Settings" section, set the connection settings. The connector supports two configuration variants: Service Account (recommended) and OAuth.
The following settings must be made:
“GSuite Domain“ | The domain that is used in Google Drive |
“Service Account Name“ | The e-mail address of the Google service account. |
“GSuite Admin User Mail Address” | The e-mail address of the GSuite administrator. |
“Path to P12 Certificate“ | Path to the P12 certificate generated when the Google service account was created. |
In addition, the following settings must be set in the "Cache Settings" section:
“Database Directory Path“ | Directory in which the cache data may be stored |
“Cache Update Interval (Minutes)“ | Specifies the duration of the update interval of the cache in minutes |
The following settings must be made:
“GSuite Domain“ | The domain that is used in Google Drive |
“Use OAuth instead” | Tick this box so that OAuth is used |
“Client Secret JSON File Path“ | Path to the JSON file that was downloaded when the OAuth credentials were created. |
“Credential Persistence Directory Path“ | Path to a directory where the credentials can be stored. |
“OAuth response receive Port (HTTP)“ | Port to receive the OAuth code. Select a free port. The port is opened only during initial setup. |
In addition, the following settings must be set in the "Cache Settings" section:
“Database Directory Path“ | Directory in which the cache data may be stored |
“Cache Update Interval (Minutes)“ | Specifies the duration of the update interval of the cache in minutes |
Navigate to the "Indices" tab and click on the "Add new index" icon in the upper right corner to create a new index.
Enter the path to the index and, if necessary, change the display name.
Add a new data source by clicking the "Add new custom source" icon at the top right. Select the Google Drive category. For "Caching Principal Resolution Service," select the previously configured Google Drive Caching Principal Resolution Service.
In the "Connection Settings" section, set the connection settings. The connector supports two configuration variants: Service Account (recommended) and OAuth.
The following settings must be made:
“Service Account Name“ | The e-mail address of the Google service account. |
“GSuite Admin User Mail Address” | The e-mail address of the GSuite administrator. |
“Path to P12 Certificate“ | Path to the P12 certificate generated when the Google service account was created. |
Make sure you've successfully set up the DriveCaching Principal Resolution Service.
The following settings must be made:
“Use OAuth instead” | Tick this box so that OAuth is used |
“Client Secret JSON File Path“ | Path to the JSON file that was downloaded when the OAuth credentials were created. |
“Credential Persistence Directory Path“ | Path to a directory where the credentials can be stored. Select the same directory as the Google Drive Caching Principal Resolution Service. |
“OAuth response receive Port (HTTP)“ | Port to receive the OAuth code. Select a free port. The port is opened only during initial setup. |
In order to make these settings visible, activate the "Advanced" option in the filter tab at the top right. The following settings are available:
In the section "Crawler Settings":
“Maximum File Size (MB)“ | Maximum file size. Larger files are ignored. This is ineffective with Google Docs objects |
„Corpora“ | Determines which document body is indexed. Available are the values "User" (indexes documents of the crawling user) and "Domain" (indexes documents released in the domain of the crawling user). Default Value: "User“ |
“Number of Crawler Threads“ | Number of threads that download parallel documents. Too high a number may cause Google Drive API errors. |
“Max Fetch Retry Count“ | For some errors when downloading a document, the connector tries to download the document again. This determines the maximum number of attempts. |
“Exponential Backoff Wait Time(s)“ | For some errors when downloading a document, the connector tries to download the document again. The connector waits for the set time before each attempt. The waiting time is doubled each time (exponential backoff). |
In the section “Content Settings“:
“Exclude MIME Types Pattern“ | If the MIME type of a document matches this regular expression, the document is ignored. Example: application/vnd\.google\-apps.* ignores all Google Docs documents. |
“Exclude Filename Pattern“ | If the filename of a document matches this regular expression, the document is ignored. For example: .*\.zip ignores all ZIP archives. |
“Enable GoogleDrive Delta Crawling” | If set, the GoogleDrive API will only fetch changes to files instead of crawling over the whole GoogleDrive instance. |