Monday, October 19, 2015

Big table | miafish



Big table | miafish

What is big table?

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

Map: Big Table is a collection of (key, value)pairs where the key identifies a row and the value is the set of columns.

Persistent: the data is stored persistently on disk.

Distributed: Big table's data is distributed among many independent machines. The table is broken up among rows, with groups of adjacent rows managed by a server. A row itself is never distributed.

Sparse: the table is sparse, meaning that different rows in a table may use different columns, with many of the columns empty for a particular row.

multidimensional: A table is indexed by rows. Each row contains one or more named column families. Column families are defined when the table is first created. Within a column family, one may have one or more named columns. All data within a column family is usually of the same type. The implementation of BigTable usually compresses all the columns within a column family together. Columns within a column family can be created on the fly. Rows, column families and columns provide a three-level naming hierarchy in identifying data. For example:

edu.rutgers.cs" : { // row "users" : { // column family "watrous": "Donald", // column "hedrick": "Charles", // column "pxk" : "Paul" // column } "sysinfo" : { // another column family "" : "SunOS 5.8" // column (null name) } }

To get data from BigTable, you need to provide a fully-qualified name in the form column-family:column. For example, users:pxk or sysinfo:. The latter shows an null column name.

time-based: time is another dimension in Big Table data. Every column family may keep multiple versions of column family data. if an application does not specify a time-stamp. it will retrieve the latest version of the column family. Alternatively, it can specify a time-stamp and get the latest version that is earlier than or equal to that time-stamp.

Data Model

The map is indexed by a row key, column key and a time stamp.

row key:

column keys:

time stamp:

Implementation:

Locating rows within a big table is managed in a three-level hierarchy.

  1. file stored in Chubby that contains the location of the root tablet(first tablet in the metadata table).
  2. the root tablet contains the location of all tablets in a special metadata table.
  3. each metadata tablet contains the location of a set of user tablets.

Advantages

  • can grow to immense size with storage distributed across a large number of servers.
  • Join operations are less costly because of the denormalization

Read full article from Big table | miafish

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts