Salesforce Data Cloud Ingestion from Sitemaps - Implementation Template
Application details
Technical considerations
- An instance of the Mule application is deployed per domain
- Support discovery of Sitemaps from the organization’s
robots.txt
file - Processing Sitemap index files is out of scope
- Content from a Sitemap should generally be provided as "text/html"
- No authentication is required/supported
- Synchronous and Asynchronous scans will ingest the full load
- The Mule application is designed to be stateless
Activity diagrams
The following activity diagrams illustrate the sequence of processing to ingest the unstructured metadata and its content on-demand.
Initial Load/Full Refresh Synchronous
Initial Load/Full Refresh Asynchronous
Get Content
Processing logic
The primary handling and orchestration of unstructured metadata ingestion will be implemented in the Salesforce Data Cloud Ingestion from the Sitemaps Process API. This process is described in more detail in the following sections.
Initial Load/Full Refresh Synchronous
- A user action from Data Cloud initiates the request for a full refresh of the content metadata
- Data Cloud invokes the Mule application without a continuation token to start the process
- Mule application receives the request and will:
- Retrieve the content metadata from all the configured organizations' Sitemaps
- Transform the results into the Data Cloud format and return the results
Initial Load/Full Refresh Asynchronous
- Mule application receives a request to perform an asynchronous refresh of all metadata and will:
- Retrieve the content metadata from all the configured organizations' Sitemaps
- Transform the results into the required format for the ingestion API
- Send the transformed data to the ingestion endpoint
Get Content
- Data Cloud initiates the request to retrieve the content
- Mule application receives the request to retrieve and stream the page content from a Sitemap
Success conditions
Upon successful completion, the following conditions will be met:
- All metadata associated with unstructured content in the organization's Sitemaps is retrieved and processed.
- The full load of metadata is retrieved on demand.
- Retrieval of content is supported.