Apache Hadoop is an open-source, scalable storage platform and Java-based programming framework designed to process very large data sets across hundreds to thousands of computing nodes that operate in parallel. It provides a cost-effective storage solution for large data volumes with no format requirements, and makes it possible to run applications on systems with thousands of hardware nodes. This approach lowers the risk of catastrophic system failure and unexpected data loss, even if a significant number of nodes become inoperative.
Hadoop Common—Contains libraries and utilities needed by other Hadoop modules
Hadoop Distributed File System (HDFS)—A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster
Hadoop YARN—A resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users’ applications
Hadoop MapReduce—An implementation of the MapReduce programming model for large scale data processing.
The term “Hadoop” is also used to describe complementary software packages such as: Apache Flume, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie and Apache Storm.
Hadoop consulting
Microsoft SQL/Oracle/SAP to Hadoop
Hadoop management and support
Hadoop version certification
Business intelligence consulting and integration
Hadoop integration into SAP