Copyright ©
Mindbreeze GmbH, A-4020 Linz, 2024.
All rights reserved. All hardware and software names used are registered trade names and/or registered trademarks of the respective manufacturers.
These documents are highly confidential. No rights to our software or our professional services, or results of our professional services, or other protected rights can be based on the handing over and presentation of these documents.
Distribution, publication or duplication is not permitted.
The term “user” is used in a gender-neutral sense throughout the document.
The Mindbreeze Sitemap Generator add-on generates a sitemap of the Atlassian Confluence pages. The pages contained are restricted by rights of the user generating the sitemap. Additionally you can exclude pages using regular expressions.
The Remote API-Interface of Atlassian Confluence has to be enabled in order for the Mindbreeze Sitemap Generator add-on to work. Activate at: “Further Configuration > Remote API (XML-RPX & SOAP)”
Install the add-on using “Manage add-ons” and “Upload add-on”:
Please refer to the chapter Supported Data Sources in the Product Information for the latest supported version.
The plugin files are stored in the plugin folder of the Mindbreeze Confluence Connector. They can be uploaded via the button „Datei auswählen“.
Submit the file with the “Upload” button:
The plugin installation is finished:
Use the “Configure” button to change the settings of the Mindbreeze Sitemap Generator add-on:
Sitemap Generating User | Atlassian Confluence user, used to generate the sitemap. Recommended: admin. |
Sitemap Downloader Group | Only members of the given Atlassian Confluence group are allowed to download the sitemap. It is highly recommended to limit this to a user group which is allowed to view all data. |
ACL Encryption Password | A password used for encrypting the ACL elements. If this parameter is left empty, the ACL elements will not be encrypted. |
Confluence Base URL | the base URL that should be used for generating the links in the sitemap. |
Sitemap Cache Directory | A directory where the generated sitemap.xml is stored on the Atlassian Confluence Server. |
Use Attachment Version | If active, the current version of attachments is included in the URL. This allows them to be updated if they are edited. |
Disable Parent Reference Metadata for Pages | If enabled, no reference metadata to the parent document is generated for Confluence pages. This reduces the number of database queries. |
Add Performance Metrics to Sitemap | If enabled, the times required for sitemap generation tasks are entered as comments in the sitemap. |
ACL Exempt Group Name (ex. confluence-administrators) | Group that has read-access to all Confluence Content regardless of the explicit rights. |
Custom Content Property Key Pattern | With this option, custom content properties can be included in the sitemap. A regular expression is defined that matches the name of the custom content properties (without the prefix custprop_ ). Matching properties are included in the sitemap. Note: Custom Content Property values of type JSON Object, are flattened into one or more metadata. Furthermore, custom content properties are only supported for pages and not for attachments. Default value: not set. Example values: .* (includes all custom content properties) or myProp.* (includes all custom content properties that begin with myProp, e.g. myPropLikes). Note: This feature is only supported for Confluence Version 5.6+. |
Generate Delta sitemap for the Latest Changes (Minutes) | The delta sitemap contains all documents that have been changed in the last minutes. How many minutes this actually is can be configured with this option. If this option is not set, the delta sitemap will not contain any <url> elements. |
Generate REST URLs | Instead of the normal Confluence Sitemap URLs, REST API URLs are generated which are set as document key in the Confluence crawler. This has the advantage, for example, that no temporary duplicates are created during a delta crawl run if the title of pages has been changed. If you enable this option, please also make sure that the option "Use Rest API for Page Content" is active in the Atlassian Confluence Crawler. Attention: If you have already indexed Confluence and want to enable or disable this option afterwards, you need an empty index before changing this option. This would otherwise lead to document duplicates, since the mes:key scheme changes in the process. |
REST URL Base Path | If the REST API endpoint is not located directly on <your-confluence-url>/rest/api, the "REST URL Base Path" can be specified. For example, if it is located at <your-confluence-url>/mybasepath/rest/api, the "REST URL Base Path" value must be /mybasepath. |
Include Labels | If active, label metadata ("labels") for sites, spaces and attachments are included in the sitemap. |
Grant Everyone to Anonymous Spaces | If enabled, all users get access to Spaces that allow access for anonymous users. If this option is disabled, no access will be granted to anonymous users. Note: It is possible to configure Atlassian Confluence in a way, that logged in users do not have access to documents, but anonymous users do. In this case, if this setting is enabled, users might find more documents in Mindbreeze than in Atlassian Confluence. |
After a successful installation of the Atlassian Confludence Sitemap Generator Add-on, the sitemap can be generated with a scheduled job. To set up the sitemap generator job navigate to the Confluence Admin interface to the section “Scheduled Jobs”
The sitemap generator job can be started automatically according to a given schedule. This schedule can be specified using standard cron expressions by clicking on the “Edit” action of the “scheduledjob.desc.mindbreezeGenerateSitemapJob”.
The sitemap generator job can also be started manually by clicking on the “Run” action.
After the sitemap generator job has completed the sitemap is available using the following URL: <confluence_url>/plugins/servlet/sitemapservlet?jobbased=true.
The Delta Sitemap is available at
<Atlassian Confluence URL>/plugins/servlet/sitemapservlet?jobbased=true&delta=true
Per default nothing is configured and the log file won’t show any messages regarding the sitemap generator.
You can configure the log-level for the Atlassian Confluence Sitemap Generator Add-On at “Administration -> Logging and Profiling”.
Create a new Entry for the Class/Packet name: “com.mindbreeze.enterprisesearch.connectors” and select the log-level.
The logfiles are available at the folloging path: <Confluence Home>/logs/atlassian-confluence.log
If the connector is not indexing documents, check the following path in the connectors log directory: jobs/logs/crawl.log. If you notice the error codes 401 or 403, you may have login or permissions issues. In that case, make sure that all the documents in the sitemap are reachable for the crawling user.
You can test this by opening a document inside the sitemap in an incognito tab.
The page that is opened MUST be the Confluence login, with a username and password field. If e.g. a different login page of an external Identity Provider or a 2FA (Two-Factor Authentication) login pops up, the connector will not be able to login and crawl the document. In that case, contact your Confluence administrator to set up Confluence in a way, that the connector can log in.
It is possible to configure the Confluence Base URL setting to change the URLs in the sitemap, if that is necessary to get to the correct login page.
If you reach the Confluence login, enter the username and password of the crawling user and ensure that the login is possible and that the document is accessible.