Massive Technical Interviews Tips: linux file system

Tuesday, October 6, 2015

linux file system

2)Linux的file system的结构, 最后让我自己设计一个结构可以存很大的file
solution
linux把硬盘分成3个区: 目录项(英文dentry, 包含文件名和inode号)， inode table(w/r属性，data block指针)，data block

Journaling File System

Updating files and directories is not an inherently atomic operations and is susceptible to failure at many steps in the process.

If the process of updation fails before completion (like power loss), then data can be corrupted.

For example, writing a huge file involves following steps:

Creating a directory entry.

Estimating number of inodes.

Creating inodes and transferring actual data to the inodes.

If the process fails before completion, it could lead to wrong inode information or incomplete files.

Recovering from such a corrupted system can take a very long time if all the directories and inodes need to be checked.

A journaled file system solves this problem by creating a journal of the changes it plans to do.

After a successful operation, the entry is removed from the journal or marked as complete.

If the operation was unsuccessful, the entry remains in the journal and the system can just read the journal to quickly jump to the affected file.

https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
http://www.tldp.org/LDP/intro-linux/html/sect_03_01.html
Why partition?

One of the goals of having different partitions is to achieve higher data security in case of disaster. By dividing the hard disk in partitions, data can be grouped and separated. When an accident occurs, only the data in the partition that got the hit will be damaged, while the data on the other partitions will most likely survive.

The use of partitions remains for security and robustness reasons, so a breach on one part of the system doesn't automatically mean that the whole computer is in danger. This is currently the most important reason for partitioning.

data partition: normal Linux system data, including the root partition containing all the data to start up and run the system; and
swap partition: expansion of the computer's physical memory, extra memory on hard disk.

https://help.ubuntu.com/community/LinuxFilesystemsExplained
Instead of actually writing directly to the part of the disk where the file is stored, a journaling file system first writes it to another part of the hard drive and notes the necessary changes to a log, then in the background it goes through each entry to the journal and begins to complete the task, and when the task is complete, it checks it off on the list.

NTFS

2 TB

256 TB

Yes

(For Windows Compatibility) NTFS-3g is installed by default in Ubuntu, allowing Read/Write support

ext3	2 TB	32 TB	Yes	Standard linux filesystem for many years. Best choice for super-standard installation.
ext4	16 TB	1 EB	Yes	Modern iteration of ext3. Best choice for new installations where super-standard isn't necessary.

Lots of Small files
http://stackoverflow.com/questions/115882/how-do-you-deal-with-lots-of-small-files
NTFS performance severely degrades after 10,000 files in a directory. What you do is create an additional level in the directory hierarchy, with each subdirectory having 10,000 files.
http://serverfault.com/questions/6711/filesystem-for-millions-of-small-files
Most FS will choke with more than 65K files in a dir, I think that is still true of ext4.

Tuesday, October 6, 2015

linux file system

Labels

Popular Posts