Monday, October 19, 2015

Leetcode-192 Word Frequency



Word Frequency | LeetCode OJ
Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
  • words.txt contains only lowercase characters and space ' ' characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.
For example, assume that words.txt has the following content:
the day is sunny the the
the sunny is is
Your script should output the following, sorted by descending frequency:
the 4
is 3
sunny 2
day 1
http://www.cnblogs.com/lookbackinside/p/4418203.html
awk '{for(i=1;i<=NF;i++) a[$i]+=1} END{for(i in a) print i,a[i] | "sort -r -n -k2"}' words.txt
http://blog.csdn.net/wangxiaobupt/article/details/45201817
1.利用awk默认一行一条记录,默认以空格划分每条记录,NF为划分的总块数先打印出所有单词。
2.排序+统计+消除重复
3.输出
awk '{i=1;while(i<=NF){print $i;i++}}' words.txt \
  | sort | uniq -c \
  | sort -k1nr \
  |awk '{print $2 " " $1}'
http://blog.csdn.net/sole_cc/article/details/44998763
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'
解释:
tr -s: 使用指定字符串替换出现一次或者连续出现的目标字符串(把一个或多个连续空格用换行符代替)
sort: 将单词从小到大排序
uniq -c: uniq用来对连续出现的行去重,-c参数为计数
sort -rn: -r 倒序排列, -n 按照数值大小排序
awk '{ print $2, $1 }': 格式化输出,将每一行的内容用空格分隔成若干部分,$i为第i个部分。
方法二:
awk '
{for(i=1;i<=NF;i++)
{s[$i]++;}
}
END{
for(i in s)
{print i " " s[i]}
}' words.txt | sort -nr -k 2
https://github.com/illuz/leetcode/blob/master/solutions/192.Word_Frequency/AC_awk.sh
awk '
{ for (i=1; i<=NF; i++) { ++S[$i]; } }
END { for (i in S) { print i, S[i] } }
' words.txt | sort -nr -k 2
sort的k参数是以第几列来排序的意思
https://github.com/SaulLawliet/leetcode/blob/master/shell/192-word-frequency.sh
awk 'BEGIN{RS=" |\n"} $0!=""{a[$0]++} END{for(i in a) print i" "a[i]}' words.txt |sort -nrk 2
http://accepted.com.cn/leetcode192/
grep -oE '[a-z]+' words.txt | sort | uniq -c | sort -r | awk '{print $2" "$1}'

https://ruixublog.wordpress.com/2015/05/14/leetcode192-word-frequency/
Read full article from Word Frequency | LeetCode OJ

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts