How to Setup an Aws Batch Job That Uses Pure Code Uploaded
There are always the kind of tasks that need to run periodically, either to analyze and process information (like fraud detection), or to but do things like transport email reports. Simply for that, nosotros need to have a tool to schedule estimator resources and, of course, the script.
But what if we merely demand to worry about coding the script?
Introducing AWS Batch. It'south a free service that takes intendance of batch jobs you might need to run periodically or on-demand. And you only pay for the resources yous utilise.
In this tutorial, yous'll learn how to kick off your offset AWS Batch job past using a Docker container.
What Is AWS Batch? A Quick Overview
Before nosotros dive in, let's practise a quick overview of the subject at hand. AWS Batch is a service that lets yous run batch jobs in AWS. You lot don't have to worry about installing a tool to manage your jobs. AWS Batch will do that for you.
There are a lot of features you might not need when you're first starting out, but allow's explore a few of them anyway:
- Instances will run just for the fourth dimension that's needed, taking advantage of the per-second billing. You lot can also lower your costs by using spot instances.
- It's possible to configure how many retries you'd similar for any task.
- It offers queues where you send the jobs. Each queue could exist configured with a certain priority so y'all can configure which jobs volition run first. You can also have queues that utilise better resources to speed up the process.
- It supports Docker containers so that you tin focus merely on your lawmaking.
- And more…
Then, plenty theory. Let'south get our hands muddy.
Kick Off Your Offset Job
Before we start, there are some prerequisites that will make this tutorial easy to follow, and it will include some proficient practices in regards to security. If you call up yous demand more details, you can bank check the setup folio in AWS'southward official docs.
Prerequisites
- Have an AWS account.
- Create an IAM user with administrator permissions. To do this, y'all can simply follow this tutorial. I recommend you give granular permissions to the user that will practice the provisioning.
- Install and configure AWS CLI.
If something from the above doesn't work, it might be considering a permission is missing, or the CLI is non configured properly. I'll allow you know exactly what's needed in the following steps.
Become to AWS Batch
Log in to your AWS account and await for AWS Batch in the initial screen, or you tin get straight past using this link.
Yous'll see a screen similar the following:
Click the "Get started" push button. Then, this next screen will appear:
Click the "Skip wizard" button. Nosotros're non going to follow this magician because I want to explicate each step to you. Besides, afterward this, you'll probably utilize AWS CloudFormation or something else to provision, not the wizard.
Create a Compute Surroundings
The jobs will run on a compute environs. Here, you'll configure the instance type, family, and some other things that we'll encounter in a bit.
It's important that y'all know nosotros're non going to create any instances at present. AWS Batch volition create one when it'south needed. You can besides configure things to create instances right abroad, speeding up job scheduling, merely we won't tackle that in this post.
Click the "Compute environments" link that appears in the left card. You'll see the following screen:
Instance Type and Permissions
Now click the "Create environment" bluish push button and so you lot tin can start defining the compute environment. You lot'll beginning configuring the environment in the following screen:
For simplicity, we're going to choose all default values. You lot just need to proper name the environment. I called information technology "kickoff-compute-environment."
You don't have to worry about creating a service or instance office right now. Just choose the option "Create new role" for both, and AWS will create them for you with the proper permissions. It volition help you encounter which permissions are needed and suit them if you want to.
Exit the EC2 key pair blank because we don't demand to access the servers for now.
Compute Resources
Ringlet downward a little scrap, and let's talk about the compute resources department. Y'all'll see the following screen:
This is where you get to choose if you want to use on-demand or spot instances. For simplicity, let's cull "On-demand."
The "Allowed instance types" field is where you ascertain which family type you'd like these environments to create. This is where things get fun because you can create compute environments that are CPU-intensive and choose betwixt C family instance types. Or if at that place are jobs that are memory intensive, you can cull Grand family unit case types. Yous're limiting which instance types can be created. I chose "optimal," and then AWS decides for me which case is ameliorate based on the configuration of job queues.
At present, vCPUs are one of the most important things hither in gild for your first job to run.
If you're familiar with running workloads using ECS, you might get confused here. You might configure so many vCPUs that AWS won't be able to create the environment. And even if in that location are a few instances running, jobs won't run until the environment is ready. Then keep in mind that vCPUs are virtual CPUs, not CPU units that you configure in a container when running in ECS.
I configured a maximum of 4 vCPUs. It ways that if at some point the cluster has four vCPUs amidst all instances, information technology won't create more. Jobs will run slowly, only your costs volition remain controlled. I likewise put one vCPU as desired, just so information technology starts creating an instance right now. AWS volition adjust this later if needed, and you can modify it when submitting a job if y'all're in a hurry.
Networking
Curl down a footling bit, and y'all'll now configure the networking section and tags. Y'all'll see a screen like this:
Leave the VPC and subnets equally default for now. Click the "Create" blueish push and look a bit while the environment is created.
Create a Chore Queue
Now you lot demand a queue where you'll send the jobs to become executed. This queue will be attached to a compute environment and then the AWS Batch service will create the resource needed based on the load of the queue. It will use the min, max, and desired vCPUs configuration to know how many instances to create.
Click the "Job queues" link in the left bill of fare and you'll encounter the following screen:
Then, you lot can click the "Create queue" bluish button. You'll run into this:
Let's put a proper name to the queue and so it's like shooting fish in a barrel to identify. I chosen information technology "first-job-queue."
In the priority, make sure you type a value that lets you play with lower priority queues later. I put "100" in case I need to create a lower priority queue after—say, for example, 1 with 50.
Enable the job queue. By default, this checkbox will be checked. You should leave it that manner.
You now need to connect this queue to one or more than compute environments. I chose the 1 I just created, the "kickoff-compute-environment" i. If there were whatever other environment, this is where you lot'd cull information technology.
Why would I like to accept more than one compute environment? Well, it'due south useful if you want to speed up a task'south processing time by creating more instances using the spot market place. You can have an on-demand compute surroundings where you lot ever have resources available. And if the load increases, y'all can create spot instances if there are whatever bachelor, based on the bid you configured.
Click the "Create queue" blue button.
Create a Task Using Docker
Nosotros're going to utilize a "hello world" job that AWS evangelists have used for demo purposes. I couldn't notice a repository with all the files they've used, then I created i with all the files we're going to demand. You can find it on GitHub hither.
Let'southward explore what'southward in there, besides every bit why and how to use those files to create our first chore in AWS Batch.
Docker Image
We're going to create a simple job that volition pull a Bash script from S3 and execute it. The Dockerfile and the script that does what I only described is located in the "chore" binder of the repository.
I won't explain either the script or the Dockerfile just yet—we'll just use it. And so let's build the Docker image and push it to the Docker hub. You need to take Docker installed on your machine, a Docker hub business relationship, and a login for your calculator.
Let's build the Docker image. Yous can skip this step and use my epitome located here, or you can run the following command and tag the prototype using your username instead of mine:
docker build -t christianhxc/aws-batch-101:latest .
Now, let's push the prototype. You demand to exist logged in with your user ID. And make sure you push the image that has your username in the tag. Run the following control:
docker button christianhxc/aws-batch-101:latest
That's it! You now have the Docker image that will download a Bash script from S3 and run it.
A Bash Script
Let'due south create the Fustigate script. Yous can employ the ane I have in the repo. That script simply puts a Fibonacci sequence in a DynamoDB table. It uses an surroundings variable called FOO to create the series of numbers, and it uses an argument just to impress it in the console.
This script is in the root of the GitHub repository I linked earlier, and it's called mapjob.sh
Now, considering this is outside the scope of AWS Batch, I'm but going to list the actions you'll need for this guide to work. We'll need to do the post-obit:
- Create a DynamoDB table in the Virginia region with primary fundamental of "jobID". Mine is called "fetch_and_run." If you decide to enter a different proper name, make sure you change it at the end in the mapjob.sh script.
- Create an S3 bucket in the Virginia region. Mine is called "cm-aws-batch-101." Don't brand information technology public.
- Upload the mapjob.sh script in the bucket yous only created.
- Create an IAM role for an ECS service task with permissions to the S3 bucket and the DynamoDB table. If you don't know how to practice that, follow these instructions. I chosen my IAM role "aws-batch-101." We'll use this 1 next.
You lot're virtually ready to kick off your first chore. You lot already have a script and a Docker image to apply.
Let's create the job definition in AWS and so submit a job.
Create a Task Definition
At this point, you've defined the environment where your jobs will run and the queue, which means AWS takes intendance of creating resources simply when they're needed. Now you need to run the job definition. And this is where things get more than interesting.
Click the "Task definitions" link in the left menu and y'all'll see the post-obit screen:
Click the "Create" blue button and permit'south kickoff defining the chore.
Enter any proper name you'd like. I put "first-job." We set chore attempts to ane. Job attempts is the maximum number of times to retry your job if information technology fails. And Execution timeout, is the maximum number of seconds your chore attempts would run. For this example, nosotros set it to 60 seconds.
Scroll down a fleck and let me explicate what'southward there:
Job office provides a driblet down card where y'all select the job role. choose the IAM role you created previously; mine is "aws-batch-101."
Note that: Merely roles with Amazon Elastic Container Service Task Parttrust relationship will exist shown. You tin can larn more about creating roles with AWS ECS trust relationship here.
At present option a name for the container epitome. Similar I said before, for simplicity, you lot can use mine. I called it "christianhxc/aws-batch-101:latest." These values can't be changed when submitting a job, but the ones we're near to explore can be changed.
The control field describes the command passed to the container. It maps to the Control parameter to docker run. Here, we'll type the name of the script that will run the container and its parameters. Because we tin override this value, we'll exit information technology as information technology is right now.
Now, here'southward another trick to exist able to run a job. Unfortunately, you can't configure CPU units to a container, only vCPUs. It means that, at minimum, the container volition have 1024 CPU units because that'south the equivalent to one vCPU. Y'all can configure the CPU, then, in blocks of 1024. This is of import considering I entered 256, thinking that this was CPU units, and the job never started. It sticks in the RUNNABLE state if there'south nowhere to run information technology.
Configure how much retentivity this container will need. I put 256. Leave the rest every bit information technology is.
Submit a Job
Yous're now, finally, able to submit a job.
Click the "Jobs" link in the left menu, and you'll come across the post-obit screen:
Click the "Submit job" blue push. Let'south submit ane!
Next, name your job submission. I called information technology "my-showtime-job." Choose the job definition and the queue we only created, and choose "Single" as a job type.
Scroll down a little and let's override some values here:
In here, you'll demand to put the name of the script in the S3 bucket and the Fibonacci number equally parameter. But these are only for reference. I used "mapjob.sh sixty." Type in "1" for vCPU and "256" for retentivity.
Gyre down some because our scripts need environment variables in order to piece of work. Let's add the corresponding values:
Let's add the surround variables. For FOO, enter the Fibonacci number. I used 60. For BATCH_FILE_TYPE, put "script", and for BATCH_FILE_S3_URL, put the S3 URL of the script that will fetch and run.
Click the "Submit job" bluish button and wait a while. You lot can become to the computer environment and changed the desired vCPUs to 1 to speed upwardly the process.
It will start creating one case. When the instance is ready to process a job, the task volition transition from RUNNABLE to SUCCEEDED.
And you'll see a new entry in the DynamoDB table.
You can keep submitting jobs and modify the FOO var to generate a different sequence of numbers. When you don't submit any other job, AWS Batch will terminate the example it created.
It'due south Your Plow Now
You lot now have the basics to kick off a job in AWS Batch. Once y'all've finished this guide, it's upwardly to y'all which scripts or code y'all'll put in a container. AWS Batch will manage all the infrastructure, scheduling, and retries for you.
Now the challenge is in how to code your application so that you tin can submit several instances of a job. AWS Batch will run them as y'all submit them and will scale out/in when it'southward needed, saving yous some coin. Y'all tin can first by migrating whatsoever existing cronjob, only don't stop at that place. Many things could go incorrect with your job execution.
The container may be buggy. Operation may exist slow. Sometimes, y'all may demand to provision more retentiveness for the task.
To investigate and debug issues like these, you lot need monitoring tools.
This mail service was written by Christian Meléndez. Christian is a technologist that started as a software developer and has more recently become a cloud builder focused on implementing continuous delivery pipelines with applications in several flavors, including .Cyberspace, Node.js, and Java, often using Docker containers.
- Most the Author
- Latest Posts
Source: https://stackify.com/aws-batch-guide/
Post a Comment for "How to Setup an Aws Batch Job That Uses Pure Code Uploaded"