Code Kata 19 as a MapReduce job for Hadoop

I wanted to do a brief discussion of Dave Thomas's 19th Code Kata. The gist of the problem is that given a wordlist of words can you determine a path from a source word to an end word, changing only one letter at a time, such that every intermediate step is also a valid word in the dictionary. For instance, the supplied example from 'cat' to 'dog' is: cat, cot, cog, dog. The general idea behind these code katas is that they are designed to be well-formed coding exercises to give programmers a chance to stretch and develop new skills. I've written more graph algorithms that run in a single-process space than I'd care to discuss (here's a simple graph implementation I wrote to learn Google's Go for instance Go-lang datastructures ), so I wanted to solve this one using a different paradigm of problem solving. For this one I wanted to write it as a MapReduce job. I'm familiar with CouchDB and it's built-in MapReduce implementation, but I wanted to go with something that worked in a fully-distributed mode as well as serve as practice for Hadoop. I love all my NoSQL options. The code for this discussion, as well as instructions to setup and run, is available at : codekata19-mapreduce. For the sake of discussion, let's decompose the problem into two subproblems: graph construction and graph traversal.

Problem 1 - Graph construction

Given a list of words, we need to construct a graph such that every vertex is a valid word and every edge represents a valid single-letter transform to get to the next vertex. To solve this using MapReduce we make the input set of data the wordlist. The only "global" information that we need to pass around with the actual processing job is the dictionary so each JVM/Mapper in the system knows what constitutes a valid word (this is done using Hadoop's DistributedCache mechanism). The Map function will emit a key for every valid, single-step transform we can determine as valid. So the output is essentially the edge-set of our graph. The reduce function's only job in this case is to make sure we don't have duplicates and to format the data in a way such that we can use this MapReduce job's output as the input to the next phase. The final results of this will look something like:

cat   cog,|-1|WHITE|

with an intentional trailing pipe. The code for this MapReduce job is available here: CodeKata19.java.

Problem 2 - Graph traversal

Graph traversal in this case is a brute-force breadth-first search of the entire graph. Cailin has done an excellent writeup of parallel, distributed breadth-first search at (Breadth-first graph search using iterative map reduce algorithm). The only modification I made to the proposed setup was including a complete path after the last pipe. The algorithm will run until no more GRAY nodes exist in the network, meaning all reachable nodes from a selected start point have been reached. It's worth noting that instead of being the "fastest" implementation, which could easily fit inside a single-process space, this is more of an exercise in distributed algorithms. One of the key advantages is that once this completes running we have the single-source, all-destination pair shortest path result. The final result contains all shortest paths to all feasible solutions. The code for this MapReduce job is available here: CodeKata19Search.java.

Permalink | Leave a comment »

Code Kata 19 as a MapReduce job for Hadoop

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112