Salesforce Data Cloud Ingestion from Google - Implementation Template
Application details
Technical considerations
- The implementation uses OAuth 2.0 authentication code grant type
- One instance of the Mule application is deployed per Google Drive
- Content from Google Drive related to Google Workspace is converted into
application/pdf
format and sent to Data Cloud - Content from Google Drive for non-workspace files is retrieved, optionally encoded as Base64 text, and sent to Data Cloud
- The /ping endpoint will make an authenticated request to Google Drive
- Metadata for updated content is sent as a notification to Data Cloud
- The API uses the GET /change endpoint to retrieve metadata updates instead of relying on the channel for change notifications. This approach is preferred to minimize the number of notifications, such as not sending a notification when a user simply opens a file or folder in Google Drive.
- The Mule application is designed to be stateless
Activity diagrams
The following activity diagrams illustrate the sequence of processing to ingest the unstructured metadata and its content on-demand.
Initial Load/Full Refresh Synchronous
Initial Load/Full Refresh Asynchronous
Incremental Load
Get Content
Processing logic
The primary handling and orchestration of unstructured metadata ingestion will be implemented in the Salesforce Data Cloud Ingestion from the Google Process API. This process is described in more detail in the following sections.
Initial Load/Full Refresh Synchronous
- A user action from the Data Cloud initiates the request for a full refresh of the content metadata
- Data Cloud invokes the Mule application without a continuation token to start the process
- Mule application receives the request and will:
- Retrieve the content metadata from Google Drive
- Transform the results into the Data Cloud format with a continuation token
- Data Cloud invokes the Mule application in a loop to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token provided in a previous response
Initial Load/Full Refresh Asynchronous
- Mule application receives a request to perform an asynchronous refresh of all metadata and will:
- Retrieve the content metadata from Google Drive
- Transform the results into the required format for the ingestion API
- Send the transformed data to the ingestion endpoint
- Mule application loops to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token from Google Drive
Incremental Load
- Mule application runs a scheduler at a given frequency
- Mule application invokes the Get Changes API on the Google Drive API to get changes in metadata from Google Drive
- Mule application transforms the changes and pushes them to the Data Cloud Ingestion API
Get Content
- Data Cloud initiates the request to retrieve the content
- Mule application receives the request to retrieve and stream the content from Google Drive
- Mule application will attempt to transcode the file to the preferred mime-type as requested by Data Cloud and as supported by the Google Drive API
Important note: Requesting binary content with the encodeBinaryContent
flag set to true will disable streaming due to the nature of the Base64 encoding operation. This may result in request timeouts when attempting to encode very large files.
Success conditions
Upon successful completion, the following conditions will be met:
- All metadata associated with unstructured content in Google Drive is retrieved and processed
- Changes to metadata related to unstructured content for a Google Drive are processed in scheduled time intervals and sent to Data Cloud
- The content on-demand for files stored in Google Drive are retrieved and processed successfully