Part 2: Portable Function Pattern for the Scaling Business
The feedback loop for business logic development should be totally contained on the engineers' machine with a few commands.
A continued look into portable function patterns
Introduction
The landscape of platforms and runtimes is ever increasing. There are more ways of packaging and running code than are possible to keep up with. Engineers are now required to learn highly specific workflows in order to remain relevant. This divides the talent pool. When a business decides to choose specific technologies, they effectively also select their available talent pools as well.
In order to dramatically scope down this article, we will go with AWS Lambda for serverless and Docker for containerization.
This is Part 2 of a series - Read Part 1
A workflow that meets high standards
The feedback loop for business logic development should be totally contained on the engineers' machine with a few commands. Engineer's should also have total control over every element within the application lifecycle (every component shown on the product diagram within the demarcation point).
terminal map
1 | 2
--|--
3 | 4
Terminal 3
(mock infrastructure S3/SQS/MySQL)
make shim
This command creates a docker network containing a fake AWS environment configured with relevant components. It also creates an actual MySQL database that our app will point to. These resources are isolated and will not overlap with other developers working at the same time (may seem like a silly thing to point out, but many shops do not have this luxury). Even if the shop does not use AWS or MySQL, this command is responsible for knowing what needs to be shimmed and does so. Any stack choices ought to be considered in light of how complex it is to mock. Whoever selects the stack ought to be able to additionally write a make shim
command as a part of a spike or other investigation.
Terminal 1
the worker (or whatever program that needs testing)
make run FUNC=worker RUN_ARGS=' \
--read_queue=pending-worker \
--sqs_endpoint=http://localhost:4576 \
--db_host=127.0.0.1 \
--db_port=13306 \
--db_name=activity \
--db_user=root \
--db_pass=password \
'
The app code runs in terminal 1 with log output. It can be fully parameterized to redirect any i/o that occur within the app.
Terminal 2
cli playground
aws --endpoint-url=http://localhost:4572 s3 cp tests/data/large.csv s3://raw-data
...
aws --endpoint-url=http://localhost:4576 sqs send-message \
--queue-url http://localhost:4576/queue/pending-worker \
--message-body '{"bucket": "raw-data", "key": "large.csv"}'
...
mysql -u root -h 127.0.0.1 -P 13306 -ppassword
The second terminal is the cli playground which allows a window into the system. This can be used for both viewing results and simulating events. In the above gif example, we have a simulated s3->sqs event, then a database is connected to in order to validate the results of the local app code.
Terminal 4
test runner
make watch
A test runner watches changes made to src/
files and reruns unit tests as an engineer works. We are using a thin homebrew runner that simply runs on 15-second intervals. The test runner here is quite rudimentary, yet captures the essence of the test runner for the article. If a language does not have a test runner, it isn't a deal-breaker. Typing out `make test` on-demand is a feasible low-cost step in the development workflow, or even better, a pre-commit hook.
Refactored Legacy ETL
Here is an altered version of the legacy ibm-mainframe-worker.py
that does the exact same thing. The differences are how injectable the function is. One might say, "hey! `import json` seems like a dependency that cannot be easily mocked!" That would be correct. The difference between libs `json` and `pymysql` or `boto3` is input/output activity. Libraries that use `i/o` in form of disk or network must be injected into the function as an argument so that they can be easily controlled.
src/func/worker.py
import json
def run(event, context):
work = json.loads(event)
bucket = work['bucket']
key = work['key']
records = []
obj = context.s3.get_object(Bucket=bucket, Key=key)
raw = obj['Body'].read()
for line in raw.split(context.var.newline):
records.append(tuple(line.split(context.var.delimiter)))
insert_statement = context.var.sql.insert_statement
context.rds.executemany(insert_statement, records)
Why is "context" everywhere?
The runtime context is your one stop shop for injectable values. Configuration values, database connections, disk reads/writes, service broker, and other network operations. The context will be the object that gets manipulated by the various runtimes so that your function can run happily no matter where it is.
Context Configuration Priorities
In order to handle the various runtime configuration requirements, the context has to have a way of prioritizing conflicting configurations from multiple sources. The following priority has been safe for many shops; configuration file < environment variables < cli arguments
. Configuration files house the "sane-defaults", ideally, no secrets are here, but things like "what kind of delimiter", or "how long to wait for the database before timing out"; are good sane-default values. Sure you can hardcode them, but why not parameterize instead? You too could be a hero, if you parameterize configuration values. Finally, CLI arguments supersede whatever else may have been set by any other configuration. When running locally, there should be ZERO magic. Every adjustable parameter should be controllable on execution from the entry point. Developers cursed with maintaining your repo should feel like Magneto on the Golden Gate Bridge. Totally unfettered and powerful.
src/common/context.py
class CustomContext:
def __init__(self, aws_ctx=None, args=None):
const_path = args.get('const_path') if args else './const.yml'
const = load_const(const_path)
env = load_env()
var = dict()
var.update(const)
var.update(env)
if args:
var.update(args)
self.var = Box(var)
self.aws_context = aws_ctx
self.s3 = boto3.client(
's3',
endpoint_url=self.var.get('s3_endpoint')
)
self.rds = None
def __enter__(self):
self.rds = Aurora(
host=self.var.db_host,
port=self.var.db_port,
db=self.var.db_name,
user=self.var.db_user,
password=self.var.db_pass
).__enter__()
return self
def __exit__(self, _type, _value, _traceback):
return self.rds.conn.close()
Context is injectable too
parent
, args
, s3_endpoint
, and Aurora
are some injectable values that allow one to control a context. While the injection in the example is limited, it is sufficient to grant enough control to meet the needs of this article. In an even more flexible context, we would be able to pass in any library itself as a parameter, making for even easier control over things like s3, database access, parsing, etc. For example: def __init__(self, boto, rds, parent=None, args=None):
.
The importance of CustomContext's `parent`
parent
is the context provided by the runtime. AWS Lambda provides a context that has very limited use in python. It is more useful in Node.js runtime, but even with Python there are some useful methods. aws_context.get_remaining_time_in_millis()
will let the runtime know if being cut off by lambda's strict time limit is near; allowing one to stop a long running download mid flight and safely exit. If this sort of function is needed, a CustomContext can have an abstracted call to the parent
which requests for the time remaining. In a unit-test environment, one could easily mock the function call to return a time that would trigger the desirable code path.
What is an "event"
An event is defined by the architect of the application and represents a unit of work achievable by the function. It can be whatever shape required to do the job. The application will understand this event shape as a trustworthy fact. Note: A function is idempotent if it produces the same end result when being passed the same event over and over.
event
{
"bucket": "upload-test",
"key": "dynamic-file-name-123.txt"
}
Note that the above event is a simplification of the `s3->sns->sqs` pipeline.
The real world s3 notification event looks like this:
{
"Records": [
{
"eventVersion": "2.0",
"eventSource": "aws:s3",
"awsRegion": "us-west-2",
"eventTime": "1970-01-01T00:00:00.000Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "EXAMPLE"
},
"requestParameters": {
"sourceIPAddress": "127.0.0.1"
},
"responseElements": {
"x-amz-request-id": "EXAMPLE123456789",
"x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "testConfigRule",
"bucket": {
"name": "example-bucket",
"ownerIdentity": {
"principalId": "EXAMPLE"
},
"arn": "arn:aws:s3:::example-bucket"
},
"object": {
"key": "test/key",
"size": 1024,
"eTag": "0123456789abcdef0123456789abcdef",
"sequencer": "0A1B2C3D4E5F678901"
}
}
}
]
}
The Wrappers
The wrapper will surround the CustomContext and function, marrying the two in perfect harmony. The wrapper knows where it is meant to be run. It knows if it is being run in Lambda, Docker, tests, or from the CLI. It is possible to create a single wrapper file that contains all this knowledge and knows how to swap behavior based on the runtime. Even though it is possible to create a single wrapper, it tends to be more self-explanatory, straight forward, and maintainable to have one wrapper for each diverse environment. In the case of tests, one might actually create multiple wrappers in order to manipulate the conditions for one's nefarious purposes. Ironically, the wrapper itself is often not unit-testable because of its environmental awareness. It is meant to be extremely thin and "dumb". The most complex wrapper ought to be the local/container variant because it is a no-magic zone.
Wrapper relies on the build step
The wrapper does NOT know what function it is calling. It generically loads the same path and calls the function with a known signature run(event, context)
. Not knowing what function it is calling allows the wrapper to be reused for many functions with the same dependencies. This places the burden of knowing which function to run on the build step (make build
). The build step will treat both wrapper and CustomContext as build dependencies since both wrapper and CustomContext are shared with all functions.
Lambda Wrapper
Lambda has some good default behavior, and then, it has some bad default behavior. Automatic, unaccounted retries are a big fat no-no for pub-sub models. In general, automatic retries are undesirable, but especially for non-idempotent designs. AWS has finally exposed a configurable value "retry attempts", which is default set to "2". Before this value was exposed, one had to use the following global cache space to store all aws.context.request_id
and short circuit the function manually.
Some other points to note are the assignment of parent
. Allowing the context access to AWS lambda's default context. Finally, the with CustomContext
invokes an __enter__
and __exit__
function when scope is entered and exited. Allowing for connection setup and cleanup respectively.
src/common/lambda_wrapper.py
from func import run
from context import CustomContext
RETRY = []
def handler(event, aws_ctx):
print(event)
if aws_ctx.aws_request_id in RETRY:
return None
RETRY.append(aws_ctx.aws_request_id)
with CustomContext(parent=aws_ctx) as custom_context:
return run(event, custom_context)
Local Wrapper
Local should be simpler, and it is, even though there are more lines of code. The process is still much simpler because the script has the same behavior as the SQS-Lambda integration with zero magic. All of the features of the SQS-Lambda integration are called out explicitly and are directly manipulatable.
src/common/local_wrapper.py
import time
import boto3
import traceback
from box import Box
import configargparse
from func import run
from context import CustomContext
def main(args):
sqs = boto3.client(
'sqs',
endpoint_url=args.sqs_endpoint
)
read_url = sqs.get_queue_url(QueueName=args.read_queue)['QueueUrl']
res = sqs.receive_message(
QueueUrl=read_url,
MaxNumberOfMessages=args.poll_batch_size,
)
for message in res.get('Messages', []):
event = message['Body']
with CustomContext(None, args) as custom_context:
try:
run(event, custom_context)
sqs.delete_message(QueueUrl=read_url, ReceiptHandle=message['ReceiptHandle'])
except Exception as e:
print({
'exception_type': type(e).__name__,
'error_reason': e.args,
'traceback': traceback.format_exc()
})
time.sleep(args.poll_interval)
##########################
if __name__ == '__main__':
P = configargparse.ArgumentParser()
P.add_argument('--read_queue', env_var='READ_QUEUE', type=str)
P.add_argument('--poll_batch_size', type=int, default=1)
P.add_argument('--poll_interval', type=int, default=15)
P.add_argument('--region', type=str, env_var='AWS_DEFAULT_REGION', default='us-east-1')
P.add_argument('--const_path', type=str, default='./const.yml')
P.add_argument('--s3_endpoint', env_var='S3_ENDPOINT', type=str, default='http://localhost:4572')
P.add_argument('--sqs_endpoint', env_var='SQS_ENDPOINT', type=str, default='http://localhost:4576')
P.add_argument('--db_host', env_var='DB_HOST', type=str, default='localhost')
P.add_argument('--db_port', env_var='DB_PORT', type=int, default=3306)
P.add_argument('--db_name', env_var='DB_NAME', type=str, default='activity')
P.add_argument('--db_user', env_var='DB_USER', type=str, default='root')
P.add_argument('--db_pass', env_var='DB_PASS', type=str, default='password')
P.add_argument('--newline', type=str)
P.add_argument('--delimiter', type=str)
P.add_argument('--insert_statement', type=str)
main(P.parse_args(namespace=Box()))
Here we implement our own poller.
This is the equivalent SQS-to-Lambda integration AWS does under the hood, with some mechanisms to prevent duplicates. The sqs.delete_message
must be explicitly called here since AWS lambda handles that bit for us automagically when a function reaches the end without failing. The same is true for this script. If run()
completes, then the message is cleared from the queue. On the other hand, if there is some unhandled error, we detect it, but just move onto the next message in the queue. We rely on the deadletter redrive policy to move failed messages of the queue into another location for later review.
Finally, the arguments at the bottom of the file allow us to modify any connection value. We can change the queue that is being read, the database, AWS endpoints, some of the values within the config file, or even the entire configuration file. The powerful configargparse
has similar libraries in other languages which allow for straightforward config defaults with environment variable support.
Engineers familiar with LocalStack may recognize the default SQS and S3 endpoints. With a prebuilt LocalStack docker image, one can mock simple AWS services like SQS, S3, and more. Similarly, with MySQL, prebuilt docker images would allow for easy bootstrapping of a fully local environment. This does not replace unit testing, rather it allows for swift debugging of the function code without interrupting a non-prod environment's resources, or colliding with other real-time debugging from other developers.
tests/docker-compose.yml
version: '3'
services:
db:
build:
context: ../
dockerfile: ./tests/.docker/db/Dockerfile
image: example-db
ports:
- 13306:3306
networks:
- backend
environment:
MYSQL_DATABASE: activity
MYSQL_ROOT_PASSWORD: password
localstack:
image: localstack/localstack
hostname: localhost
ports:
- 4572:4572
- 4576:4576
networks:
- backend
environment:
SERVICES: s3,sqs
AWS_ACCESS_KEY_ID: fake
AWS_SECRET_ACCESS_KEY: fake
AWS_DEFAULT_REGION: us-east-1
volumes:
- ../tests/.docker/localstack:/docker-entrypoint-initaws.d
networks:
backend:
driver: bridge
./tests/.docker/db/Dockerfile
FROM mysql:5.7
ENV MYSQL_ALLOW_EMPTY_PASSWORD true
COPY tests/.docker/db/schema/*.sql /docker-entrypoint-initdb.d/
tests/.docker/localstack/bootstrap.sh
awslocal s3 mb s3://raw-data
awslocal sqs create-queue --queue-name pending-worker
Docker Wrapper?
Incidentally, the local wrapper is the docker wrapper. The key to gluing it together is having a sufficiently generic Dockerfile that works for any function/container that one would ever need to build. Again the build step will have already placed contents into `$FUNC_BUILD_DIR` before the Dockerfile is ever run.
src/common/docker_wrapper
FROM python:3.7
ARG FUNC_BUILD_DIR
RUN mkdir /app
WORKDIR /app
COPY $FUNC_BUILD_DIR ./
RUN pip install -r requirements.txt
CMD ["python3", "-u", "./main.py" ]
Unit Wrapper
This is arguably the most valuable wrapper level, it informs developers in an instant if something is broken. It tells us sub-second, and over and over again, dozens of times a day if our function is behaving as expected. A continuous integration environment will also run these checks every time the codebase changes. The trouble with the unit wrapper is how subtle it is. It doesn't take the same obvious form as the other wrappers. It is defined within the unit test files, and there are usually many of them defined in specific ways in order to scaffold static scenarios.
tests/context/void_context.py
from mock.rds import Rds
from mock.s3 import VoidS3
class VoidContext:
def __init__(self):
self.var = Box({
'insert_statement': '...',
'delimiter': ',',
'newline': '\n'
})
self.aws_context = None
self.s3 = VoidS3()
self.rds = Rds()
tests/context/diff_delimiter_context.py
from mock.rds import Rds
from mock.s3 import DiffDelimiterS3
class DiffDelimiterContext:
def __init__(self):
self.var = Box({
'insert_statement': '...',
'delimiter': '!',
'newline': '\n'
})
self.aws_context = None
self.s3 = DiffDelimiterS3()
self.rds = Rds()
The build/deploy flow
In this example, terraform will be used as the IaC flavor (infrastructure as code). The same can be implemented in CloudFormation or other IaC solutions. All IaC flavors vary in integration level, bugs, syntax, quirks, and maintainability. Ultimately the choice will likely be whatever is most familiar to the shop's DevOps. Some key non-negotiable features are:
- Idempotent deploys
- Rerunning deploys should produce the same end result
- Awareness of new and preexisting resources
- Queues, storage locations, service endpoints, etc.
- Supports command line interface as first class
- We want errors to be exposed in both CI automation and local CLI commands
Aside from the IaC, the thing that will glue the code together with real life resources are the deployment scripts. The deployment scripts are captured in two clearly named steps; build
and deploy
.
Makefile
build:
ifeq ($(TARGET), lambda)
@make zip_lambda
endif
ifeq ($(TARGET), layer)
@make zip_layer
endif
@cd ./infra/${TARGET} \
&& terraform init \
&& terraform get
deploy:
@cd ./infra/${TARGET} \
&& . ../../config/secrets-${ENV}.env \
&& terraform apply ${AUTO_APPROVE} \
-var="environment=${ENV}"
Terraform in particular has some nasty race conditions, so we ensure that files to be uploaded are prepared before the deploy step in order to totally sidestep the issue. Terraform is also highly directory specific, so one must be in a specific directory in order to scan the appropriate IaC code.
The `build` step has some conditions.
- If we are building the lambda layer, then prepare the lambda layer files.
- If we are building the N lambda(s), then prepare the N lambda(s) files.
Terraform additionally has to acquire any provider lib dependencies (such as AWS) it may need in order to manage resources.
Finally the deploy
step can be run after everything is "built" and ready to roll out.
example deploy flow for lambda
make build TARGET=layer ENV=lab
make build TARGET=queue ENV=lab
make build TARGET=lambda ENV=lab
make deploy TARGET=layer ENV=lab
make deploy TARGET=queue ENV=lab
make deploy TARGET=lambda ENV=lab
The Ugly
Every decision has consequences. The poison chosen in this universal directory structure is that the file names on imports (for example from func import run
), do not actually exist since there is no file named "func". Similarly, the from lib.rds import Aurora
can also fail when calling any of the src/func/*.py
functions directly. By now, this repository has completely bought into unit tests and isolated local development environments, so that the way to execute code is not the language specific method anymore. Rather, we now test our code via make test
, and run local development setups with make run FUNC=...
. These abstractions handle the directory mishaps that would otherwise be encountered by directly calling the scripts. The glue code `make build` will also know where files are in order to move them from maintenance locations to the appropriate location by runtime. The theory is that a majority of work in the repository will be spent reading the code and searching for where something is. So the repository is catered to the HUMAN experience, and not the language-of-choice's quirks.
How it becomes powerful
Fiscal cost and time cost are large factors in architectural decisions (or at least they should be). There are costs in maintainability, cost in talent (or personnel), and the more direct infrastructure costs for resource consumption. The latter infrastructure costs are easily remedied if code can be run on the cheapest resources possible. Lambda happens to be one of the cheapest, yet maintainable ways of running business logic for more affordable talent. As a matter of fact, if a shop is small enough, you can run a business nearly for free with AWS's generous "free until 400k GB seconds" boundary. Still, there are times when lambda is no longer sufficient. I've seen shops data sizes that have gotten so large that lambda fleets start timing out. Maybe the code used to work, but now the infrastructure needs to scale beyond what lambda can handle. If a problem doesn't scale well horizontally, one might be in trouble using lambda as a solution to begin with. Even with these constraints, ideally any shop could still benefit from a pay-as-you-grow model.
Since shop requirements can change over time because of scaling needs, a common solution would be to use a stack that can handle the future desired "volume of tomorrow" instead of the actual volume today (at the cost of tomorrow as well). This pattern becomes powerful when shops can make a migration to/from lambda/containers with minimal effort. Finally, the portable function pattern allows for simple maintenance, averting costs incurred by confusing ad-hoc project layouts, or complex deploy pipelines (destroy all AWS resources daily, anyone?).
In closing
By keeping a thin glue layer written in a common well-known language, we ensure that just about anyone can come in and manage it if changes are required. When all resources/assets/infrastructure as code are within a repo as plain text, a clear accountability during the development lifecycle and billing period are created. Saving costs (both time and fiscally) in all three of the aforementioned areas (maintenance, talent, infrastructure) is a feasible and tangible goal. An example repository will be provided that lays out the pattern with live ecs and lambda code. The technology choices are minimal, requiring less training, and great flexibility. It becomes possible to swap out technologies such as the IaC terraform, or business logic language python, or even dev-ops glue makefile. Plug in your choice of CI/CD solution and monitoring technologies to take this pattern to enterprise grade. The key is to focus on the engineer's feedback loops (feeding the morale), and work from there to create that world with more productive engineers and lower expenses.
The JBS Quick Launch Lab
Free Qualified Assessment
Quantify what it will take to implement your next big idea!
Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.