Entwicklerinformationen
|
hadoop-bin | data-intensive clustering framework - tools | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS). MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. . This package provides the hadoop command line interface. See the hadoop-.*d packages for the Hadoop daemons. |
hadoop-daemons-common | data-intensive clustering framework - common files | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . This package provides infrastructure for the Hadoop daemon packages, creating the hadoop user (with data and log directories) and maintaining the update-alternatives mechanism for hadoop configuration. |
hadoop-datanoded | data-intensive clustering framework - Data Node | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . The Data Nodes in the Hadoop Cluster are responsible for serving up blocks of data over the network to Hadoop Distributed File System (HDFS) clients. |
hadoop-jobtrackerd | data-intensive clustering framework - Job Tracker | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . The Job Tracker is a central service which is responsible for managing the Task Tracker services running on all nodes in an Hadoop Cluster. The Job Tracker allocates work to the Task Tracker nearest to the data with an available work slot. |
hadoop-namenoded | data-intensive clustering framework - Name Node | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . The Hadoop Distributed File System (HDFS) requires one unique server, the Name Node, which manages the block locations of files on the file system. |
hadoop-secondarynamenoded | data-intensive clustering framework - secondary Name Node | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . The secondary Name Node is responsible for checkpointing file system images. It is _not_ a failover partner for the name node, and may safely be run on the same machine. |
hadoop-tasktrackerd | data-intensive clustering framework - Task Tracker | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . The Task Tracker is the Hadoop service that accepts MapReduce tasks and computes results. Each node in a Hadoop cluster that should be doing computation should run a Task Tracker. |
libhadoop-index-java | data-intensive clustering framework - Lucene index support | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . The org.apache.hadoop.contrib.index.main.UpdateIndex library provides support for managing an index using MapReduce. A distributed "index" is partitioned into "shards", each corresponding to a Lucene instance. This library's main() method uses a MapReduce job to analyze documents and update Lucene instances in parallel. |
libhadoop-java | data-intensive clustering framework - Java libraries | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . This package contains the core Java libraries. |
libhadoop-java-doc | data-intensive clustering framework - Java documentation | Mehr ... |
Hadoop is a software platform for writing and running applications that process vast amounts of data on a distributed file system. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . This package provides the API documentation of Hadoop. |