AWSCloud.WorksServerlessApril 20, 2020Running Pandas Analytics using AWS Lambda

https://datavizz.in/wp-content/uploads/2021/06/pandas-on-serverless.png

Amazon Lambda is something that I’ve been using for quite some time now. With its low cost and mainly no idle pricing, I’ve been going back again with any new challenges that I get to solve with Serverless stack. It can be implemented with any of your favorite languages like Python, Node, C#, Java, Go, etc.

So to discuss the challenge that I got recently is, How can we use Serverless for running Data Analytics workloads?

I agree that most of the workloads usually need much more time than the usual boundaries of a Lambda function which would limit the runtime to 15 mins. There are memory constraints as well.

But what If you need to analyze a simple XLS file that can be analyzed within these boundaries? So Here’s an article that explains how can you run Pandas – A python data analysis library onto AWS Lambda functions.

So Solving a question, How can I deploy a lambda function with Pandas library?
To solve this, We would first be creating a lambda layer which would then be integrated with the AWS Lambda function to execute Pandas-related functions.

Prerequisites

First thing, We would have to ensure that we create this layer with Pandas python libraries only from Linux-based kernel. So in this case, I’d created an Ubuntu container locally on my machine to get the necessary python libraries and then pushing it to create a layer.

  1. Create an Ubuntu Container and upgrade it to have Python(pick the runtime that you are looking for. In my case I am using Python3.6)
% docker run -ti --entrypoint /bin/bash ubuntu
root@600e51f11253:/# apt-get update
root@600e51f11253:/# apt-get upgrade
root@600e51f11253:/# apt-get install python3.6
root@600e51f11253:/# apt-get install python-pip
  1. Now we would create the necessary files and directory structure that we would need to create the lambda layer.
root@600e51f11253:/# mkdir -p build/python/lib/python3.6/site-packages
root@600e51f11253:/build# tree
.
|-- python
|   `-- lib
|       `-- python3.6
|           `-- site-packages
`-- requirements.txt

Note: For the rest of the article, you would need an AWS account with access to create Lambda Functions, Lambda Layers and S3 Upload access.

Setting up Lambda layer

  • We would have to first use pip to load the required modules in the container.
root@600e51f11253:/build# echo pandas > requirements.txt 
root@600e51f11253:/build# pip3 install -r requirements.txt -t python/lib/python3.6/site-packages/
  • Let’s create the zip file of the modules. Keep in mind that we need to preserve the path of the modules to python/lib/python3.6/site-packages.
root@600e51f11253:/build# zip -X -r Pandas.zip python/*
exit
  • Copy the Zip file from the local container to your local setup.
docker cp <<dockerid>>:/build/Pandas.zip .
  • Let’s push this layer to AWS now.
aws s3 cp Pandas.zip s3://<<S3 Bucket name>>
aws lambda publish-layer-version --layer-name <<Layer Name>> --description <<"Description"> --content S3Bucket=<<Bucket Name>,S3Key=<<File Name>> --region <<Region>> --compatible-runtime python3.6

Creating & Deploying a Lambda Function

  • Go to AWS Console, Find “Lambda” from the services. and Hit Create Function
  • Pick Author from Scratch, Provide Function name, and pick Python Runtime
  • Click on Layer and hit Add a Layer
  • Put some code in the lambda function to check if the Pandas are available or not.
import json
import pandas as pd

def lambda_handler(event, context):
    # TODO implement
    data = {'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],
        'Age': [24, 23, 22, 19, 10]}
    df = pd.DataFrame(data)
    print(df)    
    return {
        'statusCode': 200,
        'body': json.dumps('Testing with pandas!')
    }
  • You can configure a test event in form of the Hello World function and send a request. Logs should return 200 a response that would ensure that your pandas are working perfectly fine.
Response:
{
  "statusCode": 200,
  "body": ""Hello from Lambda!""
}

Request ID:
"47050e1e-8460-4076-a5f9-7285202c404c"

Function Logs:
START RequestId: 47050e1e-8460-4076-a5f9-7285202c404c Version: $LATEST
      Name  Age
0   Ashika   24
1     Tanu   23
2   Ashwin   22
3    Mohit   19
4  Sourabh   10
END RequestId: 47050e1e-8460-4076-a5f9-7285202c404c
REPORT RequestId: 47050e1e-8460-4076-a5f9-7285202c404c  Duration: 310.54 ms Billed Duration: 400 ms Memory Size: 128 MB Max Memory Used: 122 MB Init Duration: 1490.02 ms

We at DataVizz specialize in creating products using Serverless Frameworks and help enterprises move their products from Monolith to cloud-native or Serverless.

Don’t forget to leave down your review and do give me your comments on what you would like to see next!

Share