A site devoted mostly to everything related to Information Technology under the sun - among other things.

Wednesday, March 28, 2012

Big Data Glossary

  1. Apache Hadoop (http://hadoop.apache.org ) - an open source distributed computing platform which includes the Hadoop Distributed File System (HDFS) and an implementation of MapReduce.
  2. Apache HBase (http://hbase.apache.org ) - open source distributed storage for big data (e.g. over a billion rows and a million columns) on clusters of commodity hardware.
  3. Apache Hive (http://hive.apache.org) - a data warehouse solution that provides ETL and access to files and query execution.
  4. Cassandra (http://cassandra.apache.org) - a distributed database developed at Facebook (www.facebook.com) owned by DataSax (www.stax.com), now integrated with Hadoop to provide an analytic platform for big data.
  5. Cloudera (www.cloudera.com) - a participant in Hadoop with a commercial distribution bundle that includes the source code and other features in one package.
  6. Greenplum (www.greenplum.com) - a division of EMC (www.emc.com) that provides an analytic platform, a data computing appliance, a database and other products for big data analysis.
  7. Hadoop Distributed File System (HDFS, http://hadoop.apache.org/hdfs) - the file system used by Hadoop for distributed data storage.
  8. Hortonworks (www.hortonworks.com) - a commercial open source platform based on Hadoop for storing, processing, and analyzing big data.
  9. MapReduce (www.mapreduce.org) - technology developed by Google (www.google.com) and used by Hadoop for its parallel processing. It is the core technology behind the big data engines. In the "map" step, input data is distributed to multiple nodes for computation, and in the "reduce" step, the results are collected to produce the answer to the initial question.
  10. MongoDB (www.mongodb.org) - a high-performance, open source, NoSQL database written in C++.
  11. NoSQL (www.nosql.org) - a group of non-relational, distributed, open source, scalable databases designed for Web-scale use. Over 100 such products are listed on www.nonsql-database.org, including Apache HBase, Cassandra, Amazon SimpleDB, MongoDB.

No comments:

About Me

My photo
I had been a senior software developer working for HP and GM. I am interested in intelligent and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.

Blog Archive