Friday, August 14, 2015

Dropbox Interview Question: Find all duplicate files by c... | Glassdoor



Dropbox Interview Question: Find all duplicate files by c... | Glassdoor
给一个file path,把里面所有相同的文件都放到一起,把路径用List<List<String>>
输出出来。
相同的定义式byte对比。
相同文件的文件名不一定一样,里面可能还会有sub folder

Your solution needs to be tackle a couple of problems: obtaining a list of 
all the files in the file system (e.g. via DFS), binning the lists into 
possible matches, repeat via swappable heuristics until your certainty is 
100%. (eg size 1st, md5 2nd, byte stream 3rd)

follow up question: what if the files are very big and md5 is too slow.
Randomly sample parts of the file.

Read full article from Dropbox Interview Question: Find all duplicate files by c... | Glassdoor

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts