What is a/the cloud?

Most people aren’t yet sure what “The Cloud” is. Wikipedia’s disambiguation page suggests that it is

a metaphor for the Internet in cloud computing, based on how it is depicted in computer network diagrams [i.e. as a woolly cloud] and as an abstraction for the complex infrastructure it conceals.

Don’t worry if that makes you scratch your head, IMHO that description simply says “uh, we’re not sure”.

It’s actually fairly easy to understand what a computing “cloud” is. “Cloud” is, to “my computer”, what “city” is to “my friend”.

If your friend lives in a city, then they are part of that city: one of countless entities within the conglomeration of lives and roles, thousands or millions of individuals with a multitude of roles and needs.

A cloud is just a lot of computers, storage, network and applications. When these resources are bound together, what you have is a thing that is much more than a computer. It’s more akin to an archaic super computer.

In the very early days of computers, a single computer was generally capable of executing one specialized piece of work. Gradually computers became slightly less domain specific and more general purpose, but they were best utilized by creating one task to be executed exhaustively – like running a payroll routine thousands of times for different employees, or applying tax codes to millions of citizens.

Modern analogy: Imagine a web-browser that, once you input a URL, requires you to view the entire site one page at a time, before it can go on to another site, and does so at the speed of a cell phone with 1 bar of 1g connectivity.


Super computers were built ever so slightly differently: yes, they could bring enormous power to bear on large, singular computations, but they could also very efficiently handle fragmented versions of those repetitive tasks like payroll/tax runs.

For a normal computer to have processed the tax record run, it would have a single program which looked up each tax record, implemented the tax rules on it, and then stored the record before proceeding to the next. One record at a time, until all records were processed.

Super computers initiated the next logical step, breaking the process up into steps to create a conveyor belt of execution steps and data items.

For example, the first job would be to retrieve some or all of the tax records required. Then apply various pieces of tax logic to the resulting data and store the intermediate results. Apply final processing to do things like flag delinquent accounts, etc. Run through a routine to create the necessary “save to permanent disk” instructions, and then run those to save it.

Each of these steps would be implemented as a separate module. And in theory, each of them could be executed by a different part of the computer – perhaps even simultaneously.

You see – long term storage was really really slow, back then. You think your hard drive is slow? When I did an internship at a council computer center in 1987, the hard disk platters were effectively “cache” memory. The long-term storage equivalent of a modern HD was a magnetic tape.

When you go to access your family photos from 2004, your hard drive might have to move the disk head to an uncached sector of the disk, taking less than a milisecond.

To access an uncached “sector of tape”, a human has to physically go and retrieve the tape, unspool the tape in the machine, spool the new tape in, and then the computer has to tension the tape, rewind it, and seek to the data…

Kinda slow.

So – fetching the records from the tapes to some intermediate form on the disks ahead of the attempt to process them, would allow you to do things like cross-referencing, etc, very efficiently, significantly increasing the capabilities of the software itself.

And splitting the overall task into small steps/modules would save the computer for being locked into doing payroll and nothing but payroll for a day or two at a time.

Different steps of different tasks could be interleaved, and if each step could handle ranges of the total workload, the physical resources could be shared efficiently.

Chances are, you’d probably have to access several magnetic tapes to load all of the tax records. So if your program grabbed what it needed off one tape, formatted the data onto the disk ready for the next step to process, then it could free up the tape drive for another application.

This human-pace process of interleaving is actually a very close parallel to how multi-tasking works on a modern computer, especially now that our personal computers are also finally multi-core.

A computing cloud is just an explosion of that concept to a network scale.

A slew of computers and devices (printers, routers, disks, etc, etc) all offering their resources for batches of work.

The “cloud” analogy only goes so far, though. It doesn’t happen by magic. It’s more like a small organism with a simple central nervous system. The work requests go to a central controller which knows what resources it has available and sends the relevant bits of work to the best destinations. Requests for data to the devices with disk space; requests for 3D or really heavy 3d-like computing to the devices with GPUs, requests for network access to the devices that can talk to the internet… etc, etc.

Of course, some devices will have multiple of those capabilities, but the analogy-failure is dwarfed by the truly abstract component of the term “cloud”.

Companies like Amazon, Google and Microsoft have mind-bogglingly vast networks of computers, disks, switches, etc, to meet their maximum potential capacity needs. Which means the majority of time those devices are idle.

These networks became clouds – vast pools of resource and processing power into which computing work went and out of which processed data returned, without the end user needing to know which computer – or possibly computers – did the work.

The big search companies – Google and Microsoft – have vast farms of web-crawling bots. Simple computers with huge hard disks which spend their time browsing the web and helping to index the content they find.

It’s probably close to impossible to get a computer with as little power as that requires. And so these corporations find themselves with immense amounts of processing power (and disk space) that is idling.

Hence Hotmail and Gmail – you’re using spare disk sectors across hundreds and thousands of web-cache disks.

Amazon also figured out a way to use their spare web-server and order processing cycles, the Elastic Compute Cloud which has pretty much become the standard reference for what “a cloud” is.

So when you see that Microsoft advert saying “to the cloud”, what they’re saying is “let us compute that for you”.

Another way to see the cloud is as the big tech companies starting to doubt the abilities of the CPU, disk and RAM manufacturers and technologies of today to meet the computing needs of tomorrow…

Hard Drives and RAM aren’t scaling with computing needs. They are still evolving at pre-internet rates (which I’ve always had a conspiracy-theory hunch has been deliberately fixed). Today’s standard machine wants to have 16GB DDR4 2500, and today’s defacto CPU should be a dual or quad core 4.5Ghz CPU.

But it’s nowhere near that. Intel tried to redo the x86 with the Itanium. They screwed up, and worse they failed to appreciate the most important lesson the Itanic has to teach: The CPU is the blood of the system, not its heart or its mind. Users need applications, and applications determine what the CPU needs to provide.

Intel thought differently. With the Itanium they expected that if they built a new CPU architecture compilers would come whizzing out of thin air (their failure to do so perhaps part of why Intel decide to make its own compiler). It doesn’t help that Intel have a reputation for being particularly secretive about how their CPUs work and put a lot of work into trying to keep people from leveraging it.

The secretive part is just a failure on Intel’s part to understand their role in things: they provide copious documentation on how their CPUs work, they just don’t do it in a developer friendly way, and if you look thru the history of SSE instruction sets, what you’ll find is a rather prosaic evolution that has fuck all to do with what programmers were actually looking for. Yes, SSE instructions have been helpful, but not as beneficial as they could have been.

Anyway, in an attempt to maintain market share, Intel quickly slapped together the multi-core concept and tossed us that bone, knowing that consumers understand more = better, and from their years of watching Microsoft, that consumers can easily have the wool pulled over their eyes when stuff goes wrong. (I hate to defend the crashiness of old versions of windows, but by Widows XP it was becoming obvious that it was shitty driver code from 3rd parties that was causing most of the crashes and not windows itself; obviously, windows should still cop blame for allowing that to crash it)

The same way that Microsoft initially missed the Internet revolution because it had trained itself to reject the notion of innovation occurring outside of MS, Intel is ailing because it sees itself as the center of our computers…

Think “American Pie” (song) of Processing Power. Millions and millions of people sitting watching an hour-glass or spinning thingy every second of every minute of every day because the PC hasn’t kept up with the demand placed up on it.

So Microsoft is rushing in to be in the birthing of this new technology. I can see why — cloud computing is going to let your cell phone become a front end to globe-spanning supercomputer power, power that corporations like MS and Google were otherwise going to have to write off, because it would just be sitting their idling.

The only thing it’s going to do for the desktop, really, is help with it’s demise. The cons of a console are greatly diminished when it has access to The Cloud to deliver it capabilities that we traditionally provide ourselves with by upgrading our PC. The cloud, though, is on-demand. So instead of having to go out and buy a $500 GPU every 3 months, to then only use it for an hour every few nights, we can just use spare CPU cycles out there on “the cloud” of our providers…

And with cloud empowered cell phones, iPads, etc, you’re going to need to power up that desktop PC less and less. Add wireless phone-to-TV connections so that you can walk into your house and use your 54in LCD as your display while your phone is on the charger … and you’ll soon be playing the next Red Dead off your phone…



But the concept of the cloud as PC replacement assumes a sufficiently fast, reliable and affordable net connection.

Much of Canada is moving toward much more expensive metered net connections. In parts of the US with monopoly or duopoly net availability, such a paradigm shift is being considered. Powerful content creators and owners regard wideband cheap connections as ways to steal their IP without paying them the desired huge amounts of money. Big money is always powerful.

And, for a variety of reasons, the US government is doing serious planning about net kill and fragmentation capabilities.

So rather than having one’s information stored on and processed by resources that others control, why not remain self-reliant to the extent possible?

My ideal is to move away from The Internet to a future system that allows me to join unmanaged crowdsourced WANs, with all of the needed connectivity resources decentralized evenly among the WAN participants, conceptually akin to a torrent swarm. Maybe eventually WiMax could work that way, in a future computer-to-computer mode.

My guess is that, the strategic plans of big net resource owners notwithstanding, many other persons will come to the same conclusion.

jwilly if you haven’t used some of these clouds that are out there i can promise you a fast connection is NOT the main bottle neck. amazon’s cloud is awful most of the time.

the PC or console is not going to get to leverage any cloud services for real time data anytime soon. not without an extreme engineering effort on the backend and an additional fee on the front end.

Ahh, you’re touching on my next post, Jwilly, “Why the cloud *isn’t* the future” :)

What you talk about is mostly whay I would say grid computing refers to. AFAIK, the cloud is not so much about dividing the work in small pieces and process it in parallel in diferent “nodes”, but more about having virtual services that can be assigned trasparently to different nodes without the user or even sysadmin noticing.
But of course, YMMV.

Hrm – I’d say you’re correct in terms of what transpires in today’s cloud. But I suspect it’s a product of the applications they have to deal with.

Well, that’s partially true. But I think they have come to the cloud from a different angle. I see the cloud as one step forward than virtualization, rather than one step short of a grid. The idea is that once you have your servers virtualised, you can start/stop/assign more CPUs/RAM… automatically and easily without having to mess with hardware upgrades. And you can go scale up and down dynamically on demand, which is very important cost-wise.
That’s the “beauty” of the cloud (then reality comes and you see it also has an ugly one, of course).

But I agree with you that it can also be seen as “oh well, we cannot actually partition 99% of our application in pieces that make them suitable for a grid, so we’ll have to think of something else”. I think is not only a failure of current software development but also that many tasks in “average business applications” are really not that easy to cut in pieces that can be performed in parallel, so the incentive to create new tools to do that automagically is still not there. And until we get these automagicall tools, there is no hope. With current “simple” non-partitioned programming, most of the software produced today is really crappy… I don’t want to see what today’s programmers with today’s tools would produce… auch.

Agree with you again, except that to really glean the benefits from such an architecture, you need much better granularity within the applications.

The development tools part: Well we have something of a stalemate there. The C family of languages, including Java, really don’t lend themselves well to seriously distributed computing. But why should they – their “home” architecture these days is x86 based, an architecture which has been honed and finessed to run hefty single-thread or threaded applications, and threading is not distributed computing.

Threading is not even well suited to parallel computing, to be honest, because you’re still developing big chunks of serial code, but also because threads are heavy due to the lack of facilitation by the underlying hardware :(

Well, yeah, the granularity right now at the cloud is usually separating the application servers from the DB servers, so we haven’t improved that much in that regard :).
As you say, threading is more well suited to doing different tasks concurrently where one has to wait for some resource, so you use that “waiting” time to do something else, or to keep the apps responsive while they are stuck waiting for something (I/O usually), but yes, parallel computing requires new languages, new paradigms and new tools, and I still fail to see how we will be able to use them in many situations.
But that might be because our brains are so used to current architectures that it is difficult to think out-of-the-box. In a couple of generations they could as well be laughing at us… “look at them, those poor souls were still doing that by hand!!” :)

I don’t think there is any need for new languages or new out-of-the-box thinking. We already have everything that is needed. Have a look at functional programming languages, which are often great to parallelise.
I’m sure you could create CPU architectures to support functional programming languages, but probably they will break the applications written in C/C++/Java/etc. And this is in my opinion the main problem.
It would cost billions and more to convert all the applications. Nobody will ever buy such a CPU, because he would need to exchange all his programs. As an example, have a look at the Singularity OS from Microsoft. It is a great idea, but they never will be able to introduce it, because you can’t run legacy programs on that OS.

Perhaps we will see in the future some specialised CPUs for functional programming languages, but I don’t think they will ever make a big breakthrough in the main markets.

My main beef with the functional languages is just that they rarely provide a high degree of readability (caveat: they have a fair degree of readability to the mathematician, to whom a C-based language is probably leaning towards the gibberish side).

But they also don’t solve the problem, which is still, fundamentally, at the interface between programming language and processor.

The simple fact is that neither functional nor procedural languages really help the programmer casually draw efficient usage of multicore CPUs; after 20 odd years of per-cpu emphasis, multi-core has taken its sweet, sweet time to sink in and languages are still scrambling to accommodate rather than truly adapt to.

I still say (gets out the dead horse) that the largest obstacle is the fact that unlike pipelines, multi-cores involve the operating system.

2002: http://www.hardwareanalysis.com/content/article/1511.2/

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: