RCG AWS Lambda Unzip and Split Files - Source
home
This asset is used in the following MuleSoft Accelerators:
MuleSoft Accelerator for Consumer Goods — includes pre-built APIs, connectors, integration templates, and reference architecture to enable retail IT teams to jumpstart digital transformation initiatives.
AWS LAMBDA to unzip and split files
This asset provides an AWS Lambda script to un-archive a .csv.gz file when placed in S3 buckets. It then splits the .csv file into multiple files with the same headers using the provided configuration for LINE_COUNT. The split files are then processed by the RCG Data Normalization Process API
This asset has the following dependencies:
- Set up buckets and folders in S3
- Set up the AWS Lambda
Setup the buckets and folders in S3
- Login into the AWS console.
- Under services, select S3.
- Create a bucket that will be used for the Trade Promotion Effectiveness use case.
Under this bucket, create folders for
- inbox: folder for the .csv.gz that is placed by the Data Normalization Process API
- error: folder for error files if any.
- archive : folder for successful files.
Set up the AWS Lambda
- In the AWS console, under services, select Lambda
- Create a new lambda function.
- Add a bash layer - arn:aws:lambda::744348701589:layer:bash:8
- Create a trigger to listen to events only from the inbox folder with the configured prefix and
.csv.gz suffix
- Create a new file and add the below code. Save it as unzipSplit.sh
- Under configuration, select Environment Variables. Add a key for LINE_COUNT with value as desired split limit. For example, 1000000
handler () {
set -e
# Event Data is sent as the first parameter
EVENT_DATA=$1
S3_BUCKET=$(echo $EVENT_DATA | jq '.Records[0].s3.bucket.name' | tr -d \")
FILENAME=$(echo $EVENT_DATA | jq '.Records[0].s3.object.key' | tr -d \")
LINECOUNT=$LINE_COUNT
# Start processing here
INFILE=s3://"${S3_BUCKET}"/"${FILENAME}"
OUTFILE=s3://"${S3_BUCKET}"/"${FILENAME%%.*}"
NEWFILE=s3://"${S3_BUCKET}"/"${FILENAME::-3}"
echo $S3_BUCKET, $FILENAME, $INFILE, $NEWFILE
# Un-archive the .csv.gz file
aws s3 cp ${INFILE} - | gunzip -c | aws s3 cp - ${NEWFILE}
#Split the extracted file per LINE_COUNT
FILES=($(aws s3 cp ${NEWFILE} - | split -d -l ${LINECOUNT} --filter "{ \[ "\$FILE" != "x00" \] && echo $(aws s3 cp "${NEWFILE}" - | head -n 1) ; cat; } | aws s3 cp - \"${OUTFILE}\${FILE}.csv\""))
#Remove the original .csv.gz file from inbox folder ( Can also be moved to archive folder)
aws s3 rm ${INFILE}
#Remove the extracted file from inbox folder
aws s3 rm ${NEWFILE}
}