Skip to main content

Parallelism On The Cloud And Polygot Programmers


I am very passionate about the idea of giving developers the control over parallelism without them having to deal with the underlying execution semantics of their code.

The programming languages and the constructs, today, are designed to provide abstraction, but they are not designed to estimate the computational complexity and dependencies. The frameworks such as MapReduce is designed not to have any dependencies between the computing units, but that's not true for the majority of the code. It is also not trivial to rewrite existing code to leverage parallelism. As, with the cloud, when the parallel computing continues to be a norm rather than an exception, the current programs are not going to run any faster. In fact, they will be relatively slower compared to other programs that would leverage parallel computation. Robert Harper, a Professor of Computer Science at Carnegie Mellon University recently wrote an excellent blog post - parallelism is not concurrency. I would encourage you to spend a few minutes to read that. I have quoted a couple of excerpts from that post.

"what is needed is a language-based model of computation in which we assign costs to the steps of the program we actually write, not the one it (allegedly) compiles into. Moreover, in the parallel setting we wish to think in terms of dependencies among computations, rather than the exact order in which they are to be executed. This allows us to factor out the properties of the target platform, such as the number, p, of processing units available, and instead write the program in an intrinsically parallel manner, and let the compiler and run-time system (that is, the semantics of the language) sort out how to schedule it onto a parallel fabric."

The post argues that language-based optimization is far better than machine-based optimization. There's an argument that the machine knows better than a developer what runs faster and what the code depends upon. This is why, for relational databases, the SQL optimizers have moved from rule-based to cost-based. The developers used to write rules inside a SQL statement to instruct the optimizer, but now the developers focus on writing a good SQL query and an optimizer picks a plan to execute the query based on the cost of various alternatives. This machine-based optimization argument quickly falls apart when you want to introduce language-based parallelism that can be specified by a developer in a scale-out situations where it's not a good idea to depend on a machine-based optimization. The cloud is designed based on this very principle. It doesn't optimize things for you, but it has native support for you to introduce deterministic parallelism through functional programming.

"Just as abstract languages allow us to think in terms of data structures such as trees or lists as forms of value and not bother about how to “schedule” the data structure into a sequence of words in memory, so a parallel language should allow us to think in terms of the dependencies among the phases of a large computation and not bother about how to schedule the workload onto processors. Storage management is to abstract values as scheduling is to deterministic parallelism."

As far as the cloud computing goes, we're barely scratching the surface of what's possible. It's absolutely absurd to assume that the polygot programmers will stick to one programming model and learn to spot difference between parallelism and concurrency. The language constructs, annotations, and runtime need to evolve to help the programmers automate most of these tasks to write cloud-native code. These will also be the core tenants of any new programming languages and frameworks. There's also a significant opportunity to move the existing legacy code in the cloud if people can figure out a way to break it down for computational purposes without changing it i.e. using annotations, aspects etc. The next step would be to simplify the design-deploy-maintain life cycle on the cloud. If you're reading this, it's a multi-billion dollars opportunity. Imagine, if you could turn your implementation-specific concurrent access to resources into abstract deterministic parallelism, you can indeed leverage the scale-out properties of cloud fairly easily since that's the guiding principle behind the cloud.

There are other examples you would see where people are moving away from an implementation-centric approach to an abstraction that is closer to the developers and end users. The most important shift that I have seen is from files to documents. People want to work on documents; files are just an instantiation of how things are done. Google Docs and iPad are great examples that are document-centric and not file-centric.

Photo courtesy: bass_nroll on Flickr

Comments

Popular posts from this blog

Emergent Cloud Computing Business Models

The last year I wrote quite a few posts on the business models around SaaS and cloud computing including SaaS 2.0 , disruptive early stage cloud computing start-ups , and branding on the cloud . This year people have started asking me – well, we have seen PaaS, IaaS, and SaaS but what do you think are some of the emergent cloud computing business models that are likely to go mainstream in coming years. I spent some time thinking about it and here they are: Computing arbitrage: I have seen quite a few impressive business models around broadband bandwidth arbitrage where companies such as broadband.com buys bandwidth at Costco-style wholesale rate and resells it to the companies to meet their specific needs. PeekFon solved the problem of expensive roaming for the consumers in Eurpoe by buying data bandwidth in bulk and slice-it-and-dice-it to sell it to the customers. They could negotiate with the operators to buy data bandwidth in bulk because they made a conscious decision not to st...

Focus On Your Customers And Not Competitors

A lorry is a symbol of Indian logistics and the person who is posing against it is about to rethink infrastructure and logistics in India. Jeff Bezos is enjoying his trip to India charting Amazon’s growth plan where competitors like Flipkart have been aggressively growing and have satisfied customer base. This is not the first time Bezos has been to India and he seems to understand Indian market far better than many CEOs of American companies. His interview with a leading Indian publication didn’t get much attention in the US where he discusses Amazon’s growth strategy in India. When asked whether he is in panic mode: For 19 years we have succeeded by staying heads down, focused on our customers. For better or for worse, we spend very little time looking at our competitors. It is better to stay focused on customers as they are the ones paying for your services. Competitors are never going to give you any money. I always believe in focusing on customers, especially on their latent unme...

Purple Squirrels

It is fashionable to talk about talent shortage in the silicon valley. People whine about how hard it is to find and hire the "right" candidates. What no one wants to talk about is how the hiring process is completely broken. I need to fill headcount: This is a line that you hear a lot at large companies. Managers want to hire just because they are entitled to hire with a "hire or lose headcount" clause. Managers spend more time worrying about losing headcount and less time finding the right people the right way. Chasing a mythical candidate: Managers like to chase purple squirrels . They have outrageous expectations and are far removed from reality of talent market. Managers are also unclear on exactly what kind of people they are looking to hire. Bizarre interview practices: "How many golf balls can fit in a school bus?" or "can you write code with right hand while drawing a tree with left hand?" We all have our favorite bizarre interview st...