Evolving application architectures
Just as we have shown that cloud computing is a natural extension of current
trends and best practices, the same is true when viewing cloud computing from
an architectural perspective. Again, cloud computing is nothing new, yet in its
implementation, it changes everything that we do.
Changing approaches to architecture
In the 1990s, the conversation was on how to decompose an application into its
various components and then how to deploy those components on separate servers
in order to optimize non-functional requirements including scalability, availability,
manageability, and security. Today, we are maintaining a decomposed application
architecture while actually deploying onto a consolidated architecture that uses
virtualization.
Cloud computing continues this trend by providing a way to programmatically deploy
application architectures, finally delivering on the promise of a dynamic datacenter.
With cloud computing, efficiency is highly valued; if it can’t be done quickly and
programmatically, it probably isn’t an application that is suited to the model.
Changing application designs
In the past, applications were built to handle larger workloads through vertical
scaling. Put more processors and memory on a mail server to handle a larger volume
of traffic. Scale up a database server to increase throughput. Run high-performance
computing jobs on a supercomputer.
The movement away from highly scalable symmetric multiprocessors and
toward less expensive, but less scalable x86-architecture servers has influenced
application design. Rather than expecting applications to run on highly scalable
servers, developers have been refactoring their applications so that they can scale
horizontally across a number of servers. This application refactoring is not always
easy, as both applications and their data must be designed so that both processing
and data can be factored into smaller chunks. This existing architectural trend has
been a key factor propelling the adoption of cloud computing. Examples of this trend
include:
High-performance computing
HPC workloads have been running on bare-metal compute grids for some time
now, enabled by application refactoring. For example, scientists have found ways
to chunk down data for applications such as 3D climate modeling so that it can be
spread across a large number of servers. Grid computing is a predecessor to cloud
computing in that it uses tools to provision and manage multiple racks of physical
servers so that they all can work together to solve a problem. With its high compute,
interprocess communication, and I/O demands, HPC workloads are good candidates
for clouds that provide infrastructure as a service, specifically bare-metal servers or
Type I virtual machines that provide more direct access to I/O devices.
Database management systems
Database management systems have adapted to run in cloud environments by
horizontally scaling database servers and partitioning tables across them. This
technique, known as sharding, allows multiple instances of database software —
often MySQL software — to scale performance in a cloud environment. Rather than
accessing a single, central database, applications now access one of many database
instances depending on which shard contains the desired data
CPU-intensive processing
Applications that perform activities such as frame rendering have been designed so
that, rather than creating a new thread for each frame, they create a separate virtual
machine to render each frame, increasing performance through horizontal scaling.
Data-intensive processing
Generalized tools are being developed by the open source community that assist
in the processing of large amounts of data and then coalesce the results up to a
coordinating process. Hadoop, for example, is an open source implementation of the
MapReduce problem that integrates the deployment of ‘worker’ virtual machines
with the data they need.
The goals remain the same
Numerous advances in application architecture have helped to promote the adoption
of cloud computing. These advances help to support the goal of efficient application
development while helping applications to be elastic and scale gracefully and
automatically. The overriding objective of good application architectures, however,
has not changed at all: it is to support the same characteristics that have always
been important:
• Scalability. This characteristic is just as important as it has ever been. Applications
designed for cloud computing need to scale with workload demands so that
performance and compliance with service levels remain on target. In order to
achieve this, applications and their data must be loosely coupled to maximize
scalability. The term elastic often applies to scaling cloud applications because
they must not only be ready to scale up, but also scale down as workloads
diminish in order to not run up the cost of deploying in the cloud.
• Availability. Whether the application serves the users of social networking sites,
or it manages the supply chain for a large manufacturing company, users of
Internet applications expect them to be up and running every minute of every day.
Sun has been an industry leader in this area establishing early on its SunToneSM
certification program that helped customers to certify that its applications and
services would stand up to required availability levels.
• Reliability. The emphasis on reliability has shifted over time. When large
applications meant large symmetric multiprocessing systems, reliability meant
that system components rarely fail and can be replaced without disruption
when they do. Today, reliability means that applications do not fail and most
importantly they do not lose data. The way that architecture addresses this
characteristic today is to design applications so that they continue to operate
and their data remains intact despite the failure of one or more of the servers or
virtual machines onto which they are decomposed. Where we once worried about
the failure of individual server components, now we build applications so that
entire servers can fail and not cause disruption.
• Security. Applications need to provide access only to authorized, authenticated
users, and those users need to be able to trust that their data is secure. This
is true whether the application helps individual users on the Internet prepare
their tax returns, or whether the application exchanges confidential information
between a company and its suppliers. Security in today’s environments is
established using strong authentication, authorization, and accounting
procedures, establishing security of data at rest and in transit, locking down
networks, and hardening operating systems, middleware, and application
software. It is such a systemic property that we no longer call it out as its own
principle — security must be integrated into every aspect of an application and its
deployment and operational architecture and processes.
• Flexibility and agility. These characteristics are increasingly important, as business
organizations find themselves having to adapt even more rapidly to changing
business conditions by increasing the velocity at which applications are delivered
into customer hands. Cloud computing stresses getting applications to market
very quickly by using the most appropriate building blocks to get the job done
rapidly.
• Serviceability. Once an application is deployed, it needs to be maintained.
In the past this meant using servers that could be repaired without, or
with minimal, downtime. Today it means that an application’s underlying
infrastructure components can be updated or even replaced without disrupting its
characteristics including availability and security.
• Efficiency. This is the new characteristic on the list, and it is perhaps one that
most differentiates the cloud computing style from others. Efficiency is the point
of cloud computing, and if an application can’t be deployed in the cloud quickly
and easily, while benefitting from the pay-by-the-sip model, it may not be a good
candidate. Enterprise resource planning applications, for example, may be best
suited to vertically scaled systems and provided through SaaS in the near term.
Applications that extract, manipulate, and present data derived from these
systems, however, may be well suited to deployment in the cloud.
Consistent and stable abstraction layer
Cloud computing raises the level of abstraction so that all components are
abstracted or virtualized, and can be used to quickly compose higher-level
applications or platforms. If a component does not provide a consistent and stable
abstraction layer to its clients or peers, it’s not appropriate for cloud computing.
The standard deployment unit is a virtual machine, which by its very nature is
designed to run on an abstract hardware platform. It’s easy to over focus on building
virtual machine images and forget about the model that was used to create them.
In cloud computing, it’s important to maintain the model, not the image itself. The
model is maintained; the image is produced from the model.
Virtual machine images will always change because the layers of software
within them will always need to be patched, upgraded, or reconfigured. What
doesn’t change is the process of creating the virtual machine image, and this
is what developers should focus on. A developer might build a virtual machine
image by layering a Web server, application server, and MySQL database server
onto an operating system image, applying patches, configuration changes, and
interconnecting components at each layer. Focusing on the model, rather than the
virtual machine image, allows the images themselves to be updated as needed by
re-applying the model to a new set of components.
With this standard deployment unit, cloud architects can use appliances that help
to speed deployment with lower costs. A developer might use an appliance that
is preconfigured to run Hadoop on the OpenSolaris OS by interacting with the
appliance’s API. Architects can use content switches that are deployed not as physical
devices, but as virtual appliances. All that needs to be done to deploy it is interact
with its API or GUI. Even companies producing licensed, commercial software are
adapting to cloud computing with more flexible, use-based licensing models.
Whether invoking a model that creates a virtual machine image, or customizing
an appliance, the resulting virtual machine images need to be stored in a library of
images that the enterprise versions and supports.
Standards help to address complexity
Cloud computing emphasizes efficiency above all, so adopting a small number of
standards and standard configurations helps to reduce maintenance and deployment
costs. Having standards that make deployment easy is more important than having
the perfect environment for the job. The 80/20 rule comes into play here: cloud
computing focuses on the few standards that can support 80% of the use cases.
This shifts the economics from costly, one-off implementations to choosing the
building blocks that can be used in the largest volume. There will continue to be
specialization, however the starting point should be with a standard.
For an enterprise shifting to cloud computing, standards may include the type of
virtual machine, the operating system in standard virtual machine images, tools,
and programming languages supported:
• Virtual machine types. Consider the impact of virtual machine choice on the
application to be supported. For a social networking application, isolation for
security, and a high level of abstraction for portability, would suggest using Type II
virtual machines. For a high-performance computing or visualization applications,
the need to access hardware directly to achieve the utmost performance would
suggest using Type I virtual machines.
• Preinstalled, preconfigured systems. The software on virtual machines must be
maintained just as it does on a physical server. Operating systems still need to
be hardened, patched, and upgraded. Having a small, standard set of supported
configurations allows developers to use the current supported virtual machine.
When the supported configuration is updated, the model dictating customizations
should be designed so that it’s easy to re-apply changes to a new virtual machine
image. The same is true for appliances, where the current version can be
configured through their standard APIs.
• Tools and languages. Enterprises might standardize on using the Java
programming language and Ruby on Rails; small businesses might standardize on
PHP as their preferred tools for building applications. As these standards mature
in the context of cloud computing, they start to form the next layer, platform as a
service.
Virtualization and encapsulation supports refactoring
When applications are refactored and created by combining and configuring a set of
virtual machine images and appliances, the emphasis is on what a particular virtual
machine does, not how it’s implemented. Virtualization and encapsulation hides
implementation details and refocuses developers on the interfaces and interactions
between components. These components should provide standard interfaces so
that developers can build applications quickly and easily — as well as use alternate
components with similar functionality as performance or cost dictates.
Application deployment is done programmatically, and even the programs that
deploy applications can be encapsulated so that they can be used and re-used.
A program that deploys a three-tier Web infrastructure could be encapsulated so
that its parameters would include pointers to virtual machine images for the Web
server, business logic, and database tiers. This design pattern could then
be executed to deploy standard applications without having to re-invent or even reconsider,
for example, the network architecture required to support each tier.
The cloud computing philosophy for application maintenance is not to patch, but
redeploy. Managing the model that created a virtual machine image, not the image
itself, simplifies this redeployment. It’s relatively easy to solve problems discovered
after deployment, or release new versions of the application by updating the
component virtual machines and invoking the design pattern to redeploy. When a
developer patches a virtual machine, only one virtual machine image needs to be
created — the rest should be replicated and deployed programmatically. Virtual
machines should be versioned to facilitate rollback when necessary.
Loose-coupled, stateless, fail-in-place computing
For years, Web-based applications have been moving toward being loose-coupled
and stateless. In cloud computing, these characteristics are even more important
because of cloud computing’s even more dynamic nature. Application images are
not patched, they are throwaway objects and thus need to be stateless. If a virtual
machine fails, the application must continue to run interrupted. Coupling between
application components needs to be loose so that a failure of any component does
not affect overall application availability. A component should be able to “fail in
place” with little or no impact on the application.
As application components become increasingly transient, they cannot contain data
that must persist beyond any application instance. Applications should be made as
stateless as possible by pushing the state out of the software, separating processing
and data as much as possible. Techniques for doing this include:
• Push state out to the user in the form of cookies or state coded into URLs
• Push state down to a back-end database
• Maintain additional copies of data, a strategy used by Hadoop
• Use network-based persistence, for example Terracotta or Shoal in a GlassFish
application server
The impact that fail-in-place computing has on operations is that even the hardware
should be stateless for the cloud to function properly. Hardware configurations
should be stored in metadata so that configurations can be restored in the event of a
failure.
Horizontal scaling
Cloud computing makes a massive amount of horizontal scalability available
to applications that can take advantage of it. The trend toward designing and
refactoring applications to work well in horizontally scaled environments means that
an increasing number of applications are well suited to cloud computing.
Applications taking advantage of horizontal scaling should focus on overall
application availability with the assumption that individual components will fail.
Most cloud platforms are built on a virtual pool of server resources where, if any one
physical server fails, the virtual machines that it was hosting are simply restarted
on a different physical server. The combination of stateless and loose-coupled
application components with horizontal scaling promotes a fail-in-place strategy that
does not depend on the reliability of any one component.
Horizontal scaling does not have to be limited to a single cloud. Depending on the
size and location of application data, “surge computing” can be used to extend
a cloud’s capability to accommodate temporary increases in workload. In surge
computing, an application running in a private cloud might recruit additional
resources from a public cloud as the need arises.
Surge computing depends heavily on the amount and locality of data. In the case
of a private cloud located in an enterprise datacenter expanding to use a public
cloud located somewhere else on the Internet, the amount of data that needs to
be moved onto the public cloud needs to be factored in to the equation (see “data
physics” below). In the case of a private cloud hosted at the same colocation facility
as a public cloud provider, the data locality issue is significantly diminished because
virtually unlimited, free bandwidth can connect the two clouds.
Parallelization
Horizontal scaling and parallelization go hand in hand, however today the scale
and implementation has changed. On a microscopic scale, software can use
vertical scaling on symmetric multiprocessors to spawn multiple threads where
parallelization can speed operations or increase response time. But with today’s
compute environments shifting toward x86-architecture servers with two and four
sockets, vertical scaling only has as much parallel processing capability as the server
has cores (or as many cores have been purchased and allocated to a particular
OpenSolaris Dynamic Service Containers
being produced by Project Kenai provide a
lightweight provisioning system that can
be used to horizontally scale Solaris Zones.
Please see http://kenai.com/projects/dsc/.
virtual machine). On a macroscopic scale, software that can use parallelization
across many servers can scale to thousands of servers, offering more scalability than
was possible with symmetric multiprocessing.
In a physical world, parallelization is often implemented with load balancers or
content switches that distribute incoming requests across a number of servers. In a
cloud computing world, parallelization can be implemented with a load balancing
appliance or a content switch that distributes incoming requests across a number of
virtual machines. In both cases, applications can be designed to recruit additional
resources to accommodate workload spikes.
The classic example of the parallelization with load balancing is a number of
stateless Web servers all accessing the same data, where the incoming workload is
distributed across the pool of servers.
There are many other ways to use parallelization in cloud computing environments.
An application that uses a significant amount of CPU time to process user data might
use the model. A scheduler receives jobs from users, places
the data into a repository, then starts a new virtual machine for each job, handing
the virtual machine a token that allows it to retrieve the data from the repository.
When the virtual machine has completed its task, it passes a token back to the
scheduler that allows it to pass the completed project back to the user, and the
virtual machine terminates.
Divide and conquer
Applications can be parallelized only to the extent that their data can be partitioned
so that independent systems can operate on it in parallel. A good application
architecture includes a plan for dividing and conquering data, and a variety of realworld
examples illustrate the wide range of approaches:
• Hadoop is an implementation of the MapReduce pattern which is an
implementation of the master/worker parallelization pattern.
• Database sharding can be accomplished through a range of partitioning
techniques, including vertical partitioning, range-based partitioning, or directorybased
partitioning. The approach used depends entirely on how the data is to be
used.
• Major financial institutions have refactored their fraud detection algorithms so
that what was once more of a batch data-mining operation now runs on a large
number of systems in parallel, providing real-time analysis of incoming data.
• Some high-performance computing applications that deal with three-dimensional
data have been designed so that state of one cubic volume (of gas, liquid, or solid)
can be calculated for time t by one process. Then the state of the one cube is
passed onto the processes representing the eight adjoining cubes, and the state is
calculated for time t+1.
The partitioning of data has a significant impact on the volume of data transferred
over networks, making data physics the next in the list of considerations.
Data physics
Data physics considers the relationship between processing elements and the data
on which they operate. Since most compute clouds store data in the cloud, not on a
physical server’s local disks, it takes time to bring data to a server to be processed.
Data physics is governed by the simple equation that describes how long it takes
to move an amount of data between where it is generated, stored, processed,
and archived. Clouds are good at storing data, not necessarily at archiving it and
destroying it on a predefined schedule. Large amounts of data, or low-bandwidth
pipes, lengthen the time it takes to move data:
time =
bytes * 8
bandwidth
This equation is relevant for both the moment-by-moment processing of data and
for long-term planning. It can help determine whether it makes sense, for example,
to implement a surge computing strategy where it might take longer to move the
data to a public cloud than it does to process it. It can also help determine the
cost of moving operations from one cloud provider to another: whatever data has
accumulated in one cloud provider’s datacenter must be moved to another, and this
process may take time.
The cost of moving data can be expressed both in time and bandwidth charges. The
hybrid model illustrated, where a company’s private cloud is collocated
with its cloud provider’s public cloud, can help to reduce costs significantly.
Bandwidth within a colocation facility generally is both plentiful and free, making
this strategy a win-win proposition for both the time and dollar cost of moving data
around.
The relationship between data and processing
Data physics is a reminder to consider the relationship between data and processing,
and that moving data from storage to processing can take both time and money.
Some aspects of this relationship to consider include:
• Data stored without compute power nearby has limited value, and cloud providers
should be transparent regarding the network relationship between these two
components. What is the size of their pipes? What is the latency? What is the
reliability of the connection? Cloud providers should be forthcoming with answers
to all of these questions.
• Cloud architects should be able to specify the locality of virtual components and
services so that there is a well-defined relationship between virtual machines and
the storage they access.
• Cloud providers may optimize this relationship automatically for customers, but
consider whether their optimization is right for the application at hand.
• In a networked environment, it is sometimes more efficient (faster, less latency)
to calculate a value than it is to retrieve it from networked storage. Consider the
trade-off between using compute cycles and moving data around.
Programming strategies
Cloud computing calls for using programming strategies that consider the
movement of data:
• Moving pointers around is usually better than moving the actual data. Note how
the scheduler/worker model illustrated in uses a central storage service
and passes tokens between application components rather than the actual data.
• Pointers should be treated as a capability, with care taken to ensure that they are
difficult to forge.
• Tools such as representational state transfer (REST) or Simple Object Access
Protocol (SOAP) help to reduce application state while managing the transfer of
state data.
Compliance and data physics
Maintaining compliance with governmental regulations and industry requirements
adds another layer of considerations to the management of data. A cloud architect
needs to be able to specify both topological and geographical constraints on data
storage. A cloud provider should make it easy to specify the relationship between
data and the virtual machines that process it, and also where the data is stored
physically:
• Companies handling personal data may be required to adhere to governmental
regulations regarding the handling of the data. For example, those doing business
in the European Union may violate local laws if they store their data in the United
States because of the difference in how the law protects their data. In cases like
this, cloud providers should provide the capability to specify constraints on how
and where data can be moved.
• Companies subject to industry standards, such as those imposed by credit card
processing authorities, may face restrictions on where data is stored and how
and when it is destroyed. In cases like this, freed disk blocks cannot be allowed to
be incorporated into another customer’s block of storage. They must be securely
erased before reuse.
When choosing a cloud provider for data storage, consider not just whether the
provider is trustworthy. Consider whether the cloud provider is certified according to
standards appropriate for the application.
Security and data physics
Data is often the most valuable of a company’s assets, and it must be protected
with as much vigilance than any other asset. It is easy to argue that more vigilance
is needed to protect data because of how an intruder can potentially reach a
company’s data from anywhere on the Internet. Some steps to take include:
• Encrypt data at rest so that if any intruder is able to penetrate a cloud provider’s
security, or if a configuration error makes that data accessible to unauthorized
parties, that the data cannot be interpreted.
• Encrypt data in transit. Assume that the data will pass over public infrastructure
and could be observed by any party in between.
• Require strong authentication between application components so that data is
transmitted only to known parties.
• Pay attention to cryptography and how algorithms are cracked and are replaced
by new ones over time. For example, now that MD5 has been proven vulnerable to
attack, use a stronger technique such as SHA-256.
• Manage who has access to the application and how:
• Consider using strong, token-based authentication for administrator roles.
• For customer login/password access, consider who manages the authentication
server and whether it is under the company or the cloud provider’s control.
• For anonymous access to storage, for example anonymous FTP, consider
whether a customer would have to register with the cloud provider for access or
whether the cloud provider could federate with the company’s authentication
server.
Network security practices
Good security practices permeate every aspect of system design, implementation,
and deployment. Applications must be secure by design, with interfaces that
present only the appropriate data to authorized users. During implementation,
developers must take care to avoid coding practices that could result in vulnerability
to techniques such as buffer overflow or SQL injection. When deployed, operating
systems should be hardened and every layer of software kept up to date with the
most recent security patches.
In cloud computing, applications are deployed in a shared network environment, and
very straightforward security techniques such as VLANs and port filtering are used
to segment and protect various layers of an application deployment architecture as
well as isolating customers from each other. Some approaches to network security
include:
• Use security domains to group virtual machines together, and then control
access to the domain through the cloud provider’s port filtering capabilities. For
example, create a security domain for front-end Web servers, open only the HTTP
or HTTPS ports to the outside world, and filter traffic from the Web server security
domain to the one containing back-end databases.
• Control traffic using the cloud provider’s port-based filtering, or utilize more
stateful packet filtering by interposing content switches or firewall appliances
where appropriate. For even more fine-grained control over traffic, the concept
of Immutable Service Containers (ISCs) allow multiple layers of software to be
deployed in a single virtual machine, with pre-plumbed networking that is kept
internal to the virtual machine. This technology uses Solaris™ Zones to support
multiple secure virtual environments on a shared OS platform, and is available
with both the Solaris and OpenSolaris Operating Systems.
Credits: Introduction to Cloud Computing Architecture By Sun Microsystems