Cyber Security Institute

§ Current Worries

Top 3 Worries

  • Regulations
  • Old Firewall Configurations
  • Security Awareness

§ Listening

For the best information

  • The underground
  • Audible
  • Executive Excellence
  • Music (to keep me sane)

§ Watching

For early warnings

  • 150 Security Websites
  • AP Newsfeeds
  • Vendors

Monday, June 07, 2010

Review: Cloud automation tools

One of the promises of the public cloud is the opportunity to quickly dial up server resources in order to do a job that calls for heavy-duty batch processing.  But first you need a way to manage the life cycle of that job.  Fortunately, there are tools that can automate setting up and tearing down jobs in the public cloud.

In this groundbreaking test, we looked at products from RightScale, Appistry and Tap In Systems that automate and manage launching the job, scaling cloud resources, making sure you get the results you want, storing those results, then shutting down the pay-as-you-go process.  We found that each product gets to the finish line, but they each require some level of custom code and they each take different, vastly circuitous routes.

We liked RightScale’s ability to both monitor and control application use, as well as its wide base of template controls and thoughtfulness for overall control.  RightScale’s RightGrid methodology manages the full life cycle of apps and instances, and gave us the feeling that hardware could truly be disposable in the cloud.  Yet with a bit of work, we found that both Appistry and Tap In Systems offered task automation components that could also be successful for cloud-based jobs.

In our first public cloud management test, we focused on the ability of products from RightScale, Tap In Systems and Cloudkick to simply monitor public clouds like Amazon’s EC2, GoGrid and Rackspace.

This time around, our test bed was narrowed to Amazon’s public cloud, and we used a variety of Amazon cloud services, including Elastic Compute Cloud (EC2) server resources, Simple Queue Service (SQS) queuing system, and Simple Storage Service (S3).

The good news for enterprises is that Amazon’s pay-per-usage model can be a major cost saver.  In this real-world test, we were able to complete our tasks using an extraordinarily low amount of our Amazon processing budget.  Similar batch job cost savings can be realized using Amazon competitors like GoGrid, RackSpace and others, but only if the tasks are automated using the type of cloud management tools that we tested here.

The basic procedure was similar for all three products A job that needs to be performed requires application code, data files, a place to process the data (the cloud), and a place to put the results.

There are two options: make the job into a bundle where we could define code, data, outputs and options, or do that plus have a controller get messages from the job in progress.  That allowed us to take either pre-defined actions based on the messages, or allowed us to change what happens in the middle of a job.  First, we needed to create an Amazon execution image with the applications we would be using for automation.  We chose ffmpeg, an application suited towards video rendering jobs for processing by arrays.  Once created, we bundled the image and uploaded it to Amazon so we would have a copy to start with.

Each product then varied in terms of controlling the life cycle of the bundle.

Typically, the life cycle is the sequence of events that coordinates the process of doing jobs, gathering the results, storing them, and reporting success/failure (there will inevitably be both).  We gauged success by the degree of built-in controls, application customization that was necessary, how the management application would either programmatically or automatically scale resources to execute the job (by reading CPU or other resources, then adjusting jobs to add servers or resources), and communicating messages among job executors and coordinating processes.

RightScale’s flexibility became readily apparent early in our testing.  RightScale’s ServerTemplates can be modified, and the orchestration needed to perform jobs from beginning to end doesn’t require bundling all components prior to job execution, as the other vendors did.  By modifying the ServerTemplates, we didn’t need to create our own bundled image on Amazon using their EC2 tools, in effect, making the process that much simpler.  But, like the other cloud management providers we tested, RightScale requires a bit of scripting work to make it useful. 

The first type of server array is queue-controlled.  We found it’s easiest to use RightScale’s pre-made configuration message encoding system, which is written in Ruby.  Workers are process controllers that come in two varieties—one-shot and persistent.  Alert-based arrays can scale up or down based on certain conditions (such as CPU usage, memory usage).

Tap In Control Plan Editor is an automation tool using the Petri Net model, which is a math transform describing distributed systems—- just like the cloud.  At each Plan branch, there can be different conditions in which you can run scripts, which can be written in Ruby, Java or Groovy.  The idea for our Control Plan was to perform a job that would scale up by launching more instances when there were video files in an Amazon S3 bucket.  The console can be accessed on any of the instances within the fabric, and the fabrics can be woven together through instances of the Appistry Network Bridge.  Console access requires a browser, an instance of Java, and Adobe Air.

The CloudIQ engine can launch tasks which will then be taken care of by the fabric workers.  The CloudIQ Platform user interface divides a fabric into applications, services, packages and workers.  Applications monitored are fabric processes, that use services, existing in packages, that are, in turn, attended to by workers.  The fabric’s work output is homogeneous, as workers have identical processes running on them.  The fabrics can be linked together to create dependencies among the workers’ discrete fabric processes.  CloudIQ Storage is similar in concept to Amazon S3, and in a way competes with S3.  Each instance of CloudIQ Storage can be in different locations but they all work together as one group and look like one virtual drive.  Generally, CloudIQ files are synced with each other (for example the same files are located on each storage location).  In the case of the Amazon Appistry images, the CloudIQ storage is built-in to the image which means by default the storage will disappear along with the instances, unless of course you change the default directories to Amazon Elastic Block Storage (EBS) volumes.  This also means that storage is pre-allocated, and finite within the instance by default.

In our testing, we created a wrapper program to launch the ffmpeg video rendering application.  We used the CloudIQ engine coded in such a way that if we launched the client multiple times it would distribute a task to another fabric worker.  When the work was done, we copied the results over to a single EBS volume attached to the first instance.  To access the files in the storage and control the storage process, we could use the ‘curl’ command to send http requests to do things like delete, deploy, get, put, stop and some other things.  There are three different types of programs installed onto a fabric: a fabric application that’s a batch processing application or computing application, a service such as Tomcat, Weblogic, or Apache, or a package such as Java Development Kit (JDK), Ruby, RPM, or command line installation like “yum install”.

Appistry is a sophisticated construction set for distributed cloud computing, but generally for more persistent applications.  Its monitoring and reporting infrastructure relies on mostly external tools, when compared to the instance monioring capabilities of RightScale and Tap In Systems Control Plan.  Appistry can use a variety of code that can be linked in with the Appistry APIs to produce a distributed system (or set of systems) if you’re adept at coding the project, and Appistry’s success is fully dependent on lots of custom coding.  The results, however, could be very useful.  But first, you need to get thru the 1,400 pages of documentation.  Fortunately, paid customers get dedicated systems engineering help, and there’s available architectural support as well.

Posted on 06/07