RCG AWS Lambda Unzip and Split Files - Source

This asset is used in the following MuleSoft Accelerators:

MuleSoft Accelerator for Consumer Goods — includes pre-built APIs, connectors, integration templates, and reference architecture to enable retail IT teams to jumpstart digital transformation initiatives.

AWS LAMBDA to unzip and split files

This asset provides an AWS Lambda script to un-archive a .csv.gz file when placed in S3 buckets. It then splits the .csv file into multiple files with the same headers using the provided configuration for LINE_COUNT. The split files are then processed by the RCG Data Normalization Process API

This asset has the following dependencies:

Set up buckets and folders in S3
Set up the AWS Lambda

Setup the buckets and folders in S3

Login into the AWS console.
Under services, select S3.
Create a bucket that will be used for the Trade Promotion Effectiveness use case.
Under this bucket, create folders for
- inbox: folder for the .csv.gz that is placed by the Data Normalization Process API
- error: folder for error files if any.
- archive : folder for successful files.

Set up the AWS Lambda

In the AWS console, under services, select Lambda
Create a new lambda function.
Add a bash layer - arn:aws:lambda::744348701589:layer:bash:8
Create a trigger to listen to events only from the inbox folder with the configured prefix and .csv.gz suffix
Create a new file and add the below code. Save it as unzipSplit.sh
Under configuration, select Environment Variables. Add a key for LINE_COUNT with value as desired split limit. For example, 1000000

handler () {
    set -e

    # Event Data is sent as the first parameter

    EVENT_DATA=$1

    S3_BUCKET=$(echo $EVENT_DATA | jq '.Records[0].s3.bucket.name'  | tr -d \")
    FILENAME=$(echo $EVENT_DATA | jq '.Records[0].s3.object.key' | tr -d \")
    LINECOUNT=$LINE_COUNT

    # Start processing here
    INFILE=s3://"${S3_BUCKET}"/"${FILENAME}"
    OUTFILE=s3://"${S3_BUCKET}"/"${FILENAME%%.*}"
    NEWFILE=s3://"${S3_BUCKET}"/"${FILENAME::-3}"
    echo $S3_BUCKET, $FILENAME, $INFILE, $NEWFILE

    # Un-archive the .csv.gz file
    aws s3 cp ${INFILE} - | gunzip -c | aws s3 cp - ${NEWFILE}


    #Split the extracted file per LINE_COUNT
    FILES=($(aws s3 cp ${NEWFILE} - | split -d -l ${LINECOUNT} --filter "{ \[ "\$FILE" != "x00" \] && echo $(aws s3 cp "${NEWFILE}" - | head -n 1) ; cat; } | aws s3 cp - \"${OUTFILE}\${FILE}.csv\""))

    #Remove the original .csv.gz file from inbox folder ( Can also be moved to archive folder)
    aws s3 rm ${INFILE}

    #Remove the extracted file from inbox folder
    aws s3 rm ${NEWFILE}

}

Type	Custom
Organization	MuleSoft
Published by	MuleSoft Solutions
Published on	Nov 2, 2023

home

AWS LAMBDA to unzip and split files

Setup the buckets and folders in S3

Set up the AWS Lambda

Asset versions for 1.0.x