Tuesday, October 6, 2015

linux file system



2)Linux的file system的结构, 最后让我自己设计一个结构可以存很大的file 
solution
linux把硬盘分成3个区: 目录项(英文dentry, 包含文件名和inode号), inode table(w/r属性,data block指针),data block 


Updating files and directories is not an inherently atomic operations and is susceptible to failure at many steps in the process.
If the process of updation fails before completion (like power loss), then data can be corrupted.
For example, writing a huge file involves following steps:
Creating a directory entry.
Estimating number of inodes.
Creating inodes and transferring actual data to the inodes.
If the process fails before completion, it could lead to wrong inode information or incomplete files.
Recovering from such a corrupted system can take a very long time if all the directories and inodes need to be checked.
A journaled file system solves this problem by creating a journal of the changes it plans to do.
After a successful operation, the entry is removed from the journal or marked as complete.
If the operation was unsuccessful, the entry remains in the journal and the system can just read the journal to quickly jump to the affected file.
https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
http://www.tldp.org/LDP/intro-linux/html/sect_03_01.html
Why partition?
One of the goals of having different partitions is to achieve higher data security in case of disaster. By dividing the hard disk in partitions, data can be grouped and separated. When an accident occurs, only the data in the partition that got the hit will be damaged, while the data on the other partitions will most likely survive.

The use of partitions remains for security and robustness reasons, so a breach on one part of the system doesn't automatically mean that the whole computer is in danger. This is currently the most important reason for partitioning. 

  • data partition: normal Linux system data, including the root partition containing all the data to start up and run the system; and
  • swap partition: expansion of the computer's physical memory, extra memory on hard disk.
https://help.ubuntu.com/community/LinuxFilesystemsExplained
Instead of actually writing directly to the part of the disk where the file is stored, a journaling file system first writes it to another part of the hard drive and notes the necessary changes to a log, then in the background it goes through each entry to the journal and begins to complete the task, and when the task is complete, it checks it off on the list. 
NTFS
2 TB
256 TB
Yes
(For Windows Compatibility) NTFS-3g is installed by default in Ubuntu, allowing Read/Write support
ext3
2 TB
32 TB
Yes
Standard linux filesystem for many years. Best choice for super-standard installation.

ext4
16 TB
1 EB
Yes
Modern iteration of ext3. Best choice for new installations where super-standard isn't necessary.

Lots of Small files
http://stackoverflow.com/questions/115882/how-do-you-deal-with-lots-of-small-files
NTFS performance severely degrades after 10,000 files in a directory. What you do is create an additional level in the directory hierarchy, with each subdirectory having 10,000 files.
http://serverfault.com/questions/6711/filesystem-for-millions-of-small-files
Most FS will choke with more than 65K files in a dir, I think that is still true of ext4.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts