RCG AWS Lambda Unzip and Split Files - Source

(0 reviews)

home

This asset is used in the following MuleSoft Accelerators:

MuleSoft Accelerator for Consumer Goods — includes pre-built APIs, connectors, integration templates, and reference architecture to enable retail IT teams to jumpstart digital transformation initiatives.


AWS LAMBDA to unzip and split files

This asset provides an AWS Lambda script to un-archive a .csv.gz file when placed in S3 buckets. It then splits the .csv file into multiple files with the same headers using the provided configuration for LINE_COUNT. The split files are then processed by the RCG Data Normalization Process API

This asset has the following dependencies:

  • Set up buckets and folders in S3
  • Set up the AWS Lambda

Setup the buckets and folders in S3

  1. Login into the AWS console.
  2. Under services, select S3.
  3. Create a bucket that will be used for the Trade Promotion Effectiveness use case.
  4. Under this bucket, create folders for

    • inbox: folder for the .csv.gz that is placed by the Data Normalization Process API
    • error: folder for error files if any.
    • archive : folder for successful files.

Set up the AWS Lambda

  1. In the AWS console, under services, select Lambda
  2. Create a new lambda function.
  3. Add a bash layer - arn:aws:lambda::744348701589:layer:bash:8
  4. Create a trigger to listen to events only from the inbox folder with the configured prefix and .csv.gz suffix
  5. Create a new file and add the below code. Save it as unzipSplit.sh
  6. Under configuration, select Environment Variables. Add a key for LINE_COUNT with value as desired split limit. For example, 1000000
handler () {
    set -e

    # Event Data is sent as the first parameter

    EVENT_DATA=$1

    S3_BUCKET=$(echo $EVENT_DATA | jq '.Records[0].s3.bucket.name'  | tr -d \")
    FILENAME=$(echo $EVENT_DATA | jq '.Records[0].s3.object.key' | tr -d \")
    LINECOUNT=$LINE_COUNT

    # Start processing here
    INFILE=s3://"${S3_BUCKET}"/"${FILENAME}"
    OUTFILE=s3://"${S3_BUCKET}"/"${FILENAME%%.*}"
    NEWFILE=s3://"${S3_BUCKET}"/"${FILENAME::-3}"
    echo $S3_BUCKET, $FILENAME, $INFILE, $NEWFILE

    # Un-archive the .csv.gz file
    aws s3 cp ${INFILE} - | gunzip -c | aws s3 cp - ${NEWFILE}


    #Split the extracted file per LINE_COUNT
    FILES=($(aws s3 cp ${NEWFILE} - | split -d -l ${LINECOUNT} --filter "{ \[ "\$FILE" != "x00" \] && echo $(aws s3 cp "${NEWFILE}" - | head -n 1) ; cat; } | aws s3 cp - \"${OUTFILE}\${FILE}.csv\""))

    #Remove the original .csv.gz file from inbox folder ( Can also be moved to archive folder)
    aws s3 rm ${INFILE}

    #Remove the extracted file from inbox folder
    aws s3 rm ${NEWFILE}

}

Reviews

TypeCustom
OrganizationMuleSoft
Published by
MuleSoft Solutions
Published onNov 2, 2023
Asset overview

Asset versions for 1.0.x

Asset versions
VersionActions
1.0.0