Distributed Systems: Defer it if you can

Prashant Kumar
May 10, 2022
2 min read

Updated: Aug 9, 2022

Though fancy it may sound, distributed systems bring in a lot of complexity into your implementation. If possible, stick to the single computer. Distributing the task over multiple systems should be the final design call.

Having said that, here is the core idea behind distributed systems:

“A distributed system corresponds to multiple computers working together, cooperating over the network to accomplish a set of coherent task.”

Even after all these challenges around implementing and maintaining the system, why do we invest so much energy and effort for this. In this new era, problems being solved by engineering require a lot more computation and processing power than before. Data size has increased drastically. There are two ways to handle this problem:

Build super computers with lots of processing power, memory etc which can solve these problems independently.
Somehow make multiple average computers work together and process huge tasks which require lot of computation and memory.

It turns out that option 2 above is far simpler to implement with lower running cost. It also provides lot of flexibility to developers building such systems. Multiple nodes can be added for scaling and in case of failures of existing nodes. Imagine all the low funded startups trying to solve for a process or memory intensive problem having a supercomputer.

Implementation

Like any other system, a Distributed System also requires some basic implementation in place.

Network connectivity - In a Distributed System, individual systems need to talk to each other. In general, this can be done using an RPC (Remote Procedure Call) connection. As the name suggests, an RPC connection enables us to execute a remote procedure as if calling a procedure in the local system.
Resource Locking - Multiple computers would often try to access same resource in parallel, like a file or a variable. Many times, implementation would require different processes to read the latest updated value of the resource, like multiple process incrementing a counter variable. In those cases, it becomes important that other processes wait while one of the process is updating the resource.
Fault tolerance - When working with thousands of computers, It is highly probable that some of the computers would malfunction due to network failure or hardware failure. System generally handles such failures using replicas of processes and resources or by bringing back the system to a known stable state and retrying the processes from there.

Map Reduce

To begin with, Map Reduce is a good example for understanding the implementation of any distributed system. A Map Reduce solution model generally solves for problems where we need to derive some dataset from a huge set of raw data. Example of such dataset can be most frequent word occurrences, inverted indices, summaries from huge set of crawled data etc. Map Reduce model abstracts out the complexity of scaling, parallelisation and fault tolerance of processes required to solve any such problem. A developer would just need to focus on the logic required for retrieving the derived dataset. We'll discuss more on Map Reduce in the next blog.

Implementation

Map Reduce

Comments