Tuesday, September 7, 2010

Edge Computing for DoD Clouds

Most of the existing cloud computing services are accessed over the Internet and thus rely on this unpredictable and insecure medium. For DoD the Internet will remain an unreliable platform because it adversely impacts the performance of applications and services that run on top of it while exposing transactions to breaches in security. To realize the full potential of cloud computing the Department of Defense will have to overcome the security, performance, reliability, and scalability flaws of the Internet by adopting network practices that avoid the use of Internet altogether.

The operating value of a cloud service is a function of the speed of applications, the latency of responses, uptime reliability and security assurance. All cloud offerings will be at the mercy of any Internet bottlenecks. An Internet-based Software-as-a-Service (SaaS) offering may not meet a customer’s requirements because the rapid swapping of software routines. For these reasons it will be necessary to speed up communications to and from the cloud so that cloud computing can meet the demanding information handling requirements of DoD.

Original applications of cloud computing, such as available over Forge.mil by DISA, concentrate on public cloud services, which are accessible over the public Internet. Such offerings offer quick economies of scale in computing and data storage. They provide flexible, pay-as-you go benefits for testing and for application prototyping. However, the current DISA probes into cloud computing are mostly small-scale tests.

Driven by security and control needs the DoD must now start planning for the adoption of internally managed private clouds as a way of achieving the efficiencies of cloud computing. The scale of DoD’s $34 billion/year information technology costs makes the shifting to its own internal cloud infrastructure not only feasible but also affordable. By implementing cloud computing behind DoD’s firewalls it will be possible to pool and share computing resources across different applications, departments, or functions without dependence on the Internet for carrying most of its business.

The implementation of private clouds within DoD, on account of ponderous acquisition rules, will require capital investments for added computing hardware. It will require additions to its internal expertise, which is likely to be the greatest holdup.

While implementing cloud computing DoD will still remain partially dependent on the Internet because it will have to support workers at widely dispersed geographic locations where the Global Information Grid (GIG) does not reach.  The DoD private cloud will also have to be interoperable with a variety of vendor services, such as for telecommuting or for mobile communications, which are Internet based.

Consequently, almost all DoD cloud infrastructures will be in the form of a hybrid cloud. This means that DoD will have to provide for a wide range of secure applications that run across a combination of public, private as well as non-cloud environments. DoD will also have to continue running most of its high security applications using totally isolated networks, which are separated from the DoD hybrid cloud with security barriers.

Ultimately, DoD will end up operating all of its business applications and most of its war-fighting applications within its own private cloud. In this way it will realize a reduction in costs while enhancing security. The DoD hybrid approach will also have the ability to continue taking advantage of a wide range of public cloud services whenever that can be economically justified and securely extracted.

The Middle Mile
The infrastructure that supports cloud computing can be split into three links:
1. The first mile (e.g. originating infrastructure);
2. The last mile (e.g. the end user’s connectivity to the Internet, at destination);
3. The middle mile (e.g. the paths over which data travels back and forth across the Internet. Path between the origin server and end user).
Each of these links contributes in different ways to the performance and reliability problems of cloud networks.

First mile bottlenecks are well understood and remain entirely under the management of DoD components. Perhaps the biggest first mile challenge lies in the ability to scale the locally administered and contractor managed infrastructures, such as access gateways, network switches or redundant connections, to meet variable levels of demand. To achieve such improvements is difficult but manageable. Arranging for communications over satellites from ships at sea or in delivering computing support to mobile infantry units will require special attention.

Configuring the first mile will be always costly when dealing with problem how to provide the capacity for occasional transaction peaks. The current approach is to provide first mile infrastructures that are underutilized as well as insufficiently redundant for fail-over in cases of component failures. Cloud computing can correct most of such deficiencies by pooling of first mile resources across multiple DoD components. There will be governance problems in finding workable solutions to these challenges.

The last mile of Internet traffic is conducted over Local Area Networks (LANs) and Wide Area Networks (WANs) at multi-megabyte/sec broadband speeds. In DoD the LANs are locally managed, which is usually done by contractors. Compliance with DoD-wide standards how to install local circuits in a redundant mode will be mandatory. The last mile is unlikely to become a DoD cloud bottleneck provided that this is centrally directed and centrally funded.

This leaves the middle mile, which constitutes the infrastructure that comprises the public Internet as well as GIG circuits that have been subcontracted for conveyance over the Internet. The middle mile will be always a heterogeneous network that is owned by many competing firms, awarded in the form of multiple contracts from different acquisition organizations.  The DoD networking contacts are now covering hundreds or thousands of circuit miles and are estimated to include at least 15,000 separate networks that are often not interoperable.

The entire Internet is actually composed of 13,000 different networks each providing access to a small subset of end users. The largest of these networks accounts for only about 8% of end user access traffic. This means that the DoD Internet dedicated circuits will be ultimately tied to the performance of the Internet as a whole because transactions will have to be routed through several vendor networks. The vulnerability of this arrangement is large.  It includes tens of thousands of connection points between the first mile and the last mile as transactions hop through several links on their way to their ultimate destination. This arrangement is too fragile for the DoD enterprise to depend on in the age of cyber warfare.

Historically, DoD has invested heavily into the first mile and into the last mile. These investments were made from a wide range of funding sources, at thousands of locations. Separate contracts were used to buy different technology solutions because everything was paid out of over 2,000 individual projects plus from an untold number of local installation maintenance budgets.

The DoD’s middle mile has remained a no man’s land between the GIG, a variety of dedicated private networks (including expensive satellite links) and traffic routed over the Internet that is purchased by Services. The funding for the largest share of the middle mile was always budgeted centrally, mostly through the DISA agency. How much to spend for the DoD backbone circuits by an Agency rather than by a Service is a debatable matter. Spending for expansion as well as for new technologies to meet rising demands for capacity must be resolved in favor of completely central funding if cloud computing is accepted as the way how DoD operates its networks.

The consequence of an inadequate middle mile capacity for DoD is unknown. At present there are no policies that define the standards for DoD-wide network metrics. Packet losses, service degradation, uneven performance and down times are unknown. If operations of a DoD enterprise cloud are to materialize, there must be complete end-to-end situational awareness of the metrics of every single component that makes up the DoD cloud.

DoD personnel are now accustomed to low-latency connectivity directly from their homes. They have a perception that the response-time and availability bottlenecks, which are largely middle mile problems, are a reflection of the deteriorating performance of DoD systems.  While the leadership of DoD is increasingly vocal about the decisive importance of cyber operations, the military and civilian people who experience a degradation of the quality of their information technologies, do not have the confidence that DoD systems can perform to do the job that is needed. This is why cloud computing must be chosen as the best means for overcoming the current lack of credibility about DoD information technologies.

TCP as One of the Middle Mile Problems
Architected for reliability rather than efficiency, the Transmission Control Protocol (TCP), which is the Internet’s primary communications protocol, is a principal hindrance on middle mile transmission performance. TCP requires multiple round-trips between any two communicating parties to set up and tear down connections. This is especially detrimental for the performance of SaaS and PaaS applications, as these require small as well as rapid back and forth communications.

Long distances between communicating parties can lead to low throughput that grows as file sizes become larger and the number of “hops” between Internet links increases. This is because TCP allows only small amounts of data to be sent at any time before pausing and waiting for acknowledgment from each router on the receiving end. Network latency, the time it takes a single data packet to travel across multiple links on the network, will rapidly translate into a huge delay in the case of any failure at any router on the path from origin to destination. Communication latency, which is tied to the distance between the source and the end user, will rise whenever the TCP protocol repeats retransmissions as a transaction progresses towards its destination. This is a critical issue for those considering IaaS and SaaS solutions, especially in cases where rapid and reliable responses are essential.

For example, Amazon hosts its EC2 services in just three US datacenters and in a single European datacenter. Applications that are not mission-critical run perform well provided that the user is close to a data center and the size of files is not large. However, for applications with mission-critical applications and very large files the approach offered by Amazon is often too slow for supporting users. It suffers from delays the Internet’s middle mile. Performance and reliability is also likely to fall short of what is needed in support of war fighter applications. Therefore running cloud services for DoD’s global reach, from a few datacenters located in the USA, is not suitable.

Distributed Clouds On the Edge
Locating the cloud computing infrastructures in a distributed manner overcomes the problems of the middle mile. A distributed architecture — where servers are located at the edge of the DoD network, close to end users — avoids the middle mile bottlenecks. It enables the delivery of LAN-like responsiveness for cloud applications that would be running over DISA direct communication links. These would connect data centers to the edge servers, which would be located in immediate proximity to users.

Spreading the DoD workload over ten thousands of edge servers to house frequently accessed replication of code from central applications is technologically and economically feasible because powerful servers, with twelve terabytes of capacity are now available for $5,000. 

At present the largest share of cloud computing services is in the form of clouds based on centralized architectures. The drawbacks of such solutions are outages and delays in the support of mission critical businesses.

The migration of DoD applications to the cloud, which may take a decade, will require many changes in DoD’s network architecture, such as modifying IaaS, PaaS and SaaS software for increased security, application acceleration, fast synchronization with master files and for rapid recovery in case of local failures. Such changes will require software modifications that can take advantage of already proven solutions. To accomplish such a change DoD will need to place servers near points of use for most of its transactions.

As cloud computing takes over the DoD network environment will appear as an enterprise-wide hybrid solution that support millions of DoD users securely, economically and reliably.


No comments:

Post a Comment

For comments please e-mail paul@strassmann.com