What is big table?
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
Map: Big Table is a collection of (key, value)pairs where the key identifies a row and the value is the set of columns.
Persistent: the data is stored persistently on disk.
Distributed: Big table's data is distributed among many independent machines. The table is broken up among rows, with groups of adjacent rows managed by a server. A row itself is never distributed.
Sparse: the table is sparse, meaning that different rows in a table may use different columns, with many of the columns empty for a particular row.
multidimensional: A table is indexed by rows. Each row contains one or more named column families. Column families are defined when the table is first created. Within a column family, one may have one or more named columns. All data within a column family is usually of the same type. The implementation of BigTable usually compresses all the columns within a column family together. Columns within a column family can be created on the fly. Rows, column families and columns provide a three-level naming hierarchy in identifying data. For example:
To get data from BigTable, you need to provide a fully-qualified name in the form column-family:column. For example, users:pxk or sysinfo:. The latter shows an null column name.
time-based: time is another dimension in Big Table data. Every column family may keep multiple versions of column family data. if an application does not specify a time-stamp. it will retrieve the latest version of the column family. Alternatively, it can specify a time-stamp and get the latest version that is earlier than or equal to that time-stamp.
Data Model
The map is indexed by a row key, column key and a time stamp.
row key:
column keys:
time stamp:
Implementation:
Locating rows within a big table is managed in a three-level hierarchy.
- file stored in Chubby that contains the location of the root tablet(first tablet in the metadata table).
- the root tablet contains the location of all tablets in a special metadata table.
- each metadata tablet contains the location of a set of user tablets.
Advantages
- can grow to immense size with storage distributed across a large number of servers.
- Join operations are less costly because of the denormalization
Read full article from Big table | miafish