Salesforce Data Cloud Ingestion from Google - Implementation Template

(0 reviews)

Application details

Technical considerations

The implementation uses OAuth 2.0 authentication code grant type
One instance of the Mule application is deployed per Google Drive
Content from Google Drive related to Google Workspace is converted into application/pdf format and sent to Data Cloud
Content export for Google Workspace files is restricted to a maximum size of 10 MB
Content from Google Drive for non-workspace files is retrieved, optionally encoded as Base64 text, and sent to Data Cloud
The /ping endpoint will make an authenticated request to Google Drive
Metadata for updated content is sent as a notification to Data Cloud
The API uses the GET /change endpoint to retrieve metadata updates instead of relying on the channel for change notifications. This approach is preferred to minimize the number of notifications, such as not sending a notification when a user simply opens a file or folder in Google Drive
The Mule application is designed to be stateless

Activity diagrams

The following activity diagrams illustrate the sequence of processing to ingest the unstructured metadata and its content on-demand.

Initial Load/Full Refresh Synchronous

Initial Load/Full Refresh Asynchronous

Incremental Load

Get Content

Processing logic

The primary handling and orchestration of unstructured metadata ingestion will be implemented in the Salesforce Data Cloud Ingestion from the Google Process API. This process is described in more detail in the following sections.

Initial Load/Full Refresh Synchronous

This flow is triggered by the end user.

A user clicks the Refresh Now button on the UDLO page to initiate the request for a full refresh of resource metadata
Data Cloud invokes the Mule application without a continuation token to start the process
Mule application receives the request and will:
- Retrieve the content metadata from Google Drive
- Transform the results into the Data Cloud format with a continuation token
Data Cloud invokes the Mule application in a loop to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token provided in a previous response

Initial Load/Full Refresh Asynchronous

This flow is triggered by an external application, such as Postman.

Mule application receives a request to perform an asynchronous refresh of all metadata and will:
- Retrieve the content metadata from Google Drive
- Transform the results into the required format for the ingestion API
- Send the transformed data to the ingestion endpoint
Mule application loops to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token from Google Drive

Incremental Load

This flow is triggered by Data Cloud.

Mule application runs a scheduler at a given frequency
Mule application invokes the Get Changes API on the Google Drive API to get changes in metadata from Google Drive
Mule application transforms the changes and pushes them to the Data Cloud Ingestion API

Get Content

This flow is triggered by Data Cloud.

Data Cloud initiates the request to retrieve the content
Mule application receives the request to retrieve and stream the content from Google Drive
Mule application will attempt to transcode the file to the preferred mime-type as requested by Data Cloud and as supported by the Google Drive API

Important note: Requesting binary content with the encodeBinaryContent flag set to true will disable streaming due to the nature of the Base64 encoding operation. This may result in request timeouts when attempting to encode very large files.

Success conditions

Upon successful completion, the following conditions will be met:

All metadata associated with unstructured content in Google Drive is retrieved and processed
Changes to metadata related to unstructured content for a Google Drive are processed in scheduled time intervals and sent to Data Cloud
The content on-demand for files stored in Google Drive are retrieved and processed successfully

Type	Template
Organization	MuleSoft
Published by	MuleSoft Solutions
Published on	Jan 21, 2025

Version	Actions
1.0.11
1.0.10
1.0.9

Salesforce Data Cloud Ingestion from Google - Implementation Template

Application details

Technical considerations

Activity diagrams

Initial Load/Full Refresh Synchronous

Initial Load/Full Refresh Asynchronous

Incremental Load

Get Content

Processing logic

Initial Load/Full Refresh Synchronous

Initial Load/Full Refresh Asynchronous

Incremental Load

Get Content

Success conditions

Reviews

Asset versions for 1.0.x