In case your group is about to enter the world of large data, you not solely need to decide whether or not or not Apache Hadoop is the becoming platform to use, however as well as which of its many parts are biggest suited to your course of. This space info makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You’ll shortly understand how Hadoop’s duties, subprojects, and related utilized sciences work collectively.
Each chapter introduces a singular matter—comparable to core utilized sciences or data change—and explains why positive parts may or might be not useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, nevertheless with this handy reference, you’ll have a superb grasp of the having fun with topic.
Topics embrace:Core utilized sciences—Hadoop Distributed File System (HDFS), MapReduce, YARN, and SparkDatabase and data administration—Cassandra, HBase, MongoDB, and HiveSerialization—Avro, JSON, and ParquetManagement and monitoring—Puppet, Chef, Zookeeper, and OozieAnalytic helpers—Pig, Mahout, and MLLibData change—Scoop, Flume, distcp, and StormSecurity, entry control, auditing—Sentry, Kerberos, and KnoxCloud computing and virtualization—Serengeti, Docker, and Whirr
- File Size: 4501 KB
- Print Length: 132 pages
- Simultaneous Device Usage: Unlimited
- Publisher: O’Reilly Media; 1 edition (March 2, 2015)
- Publication Date: March 2, 2015
- Sold by: Amazon Digital Services LLC
- Language: English
- ASIN: B00U6P2Q9M
- Text-to-Speech: Enabled
- X-Ray: Not Enabled
- Word Wise: Not Enabled
- Lending: Not Enabled
- Enhanced Typesetting: Enabled