Massive Technical Interviews Tips: Leetcode-192 Word Frequency

Monday, October 19, 2015

Leetcode-192 Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.

For example, assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

http://www.cnblogs.com/lookbackinside/p/4418203.html

awk '{for(i=1;i<=NF;i++) a[$i]+=1} END{for(i in a) print i,a[i] | "sort -r -n -k2"}' words.txt

http://blog.csdn.net/wangxiaobupt/article/details/45201817
1.利用awk默认一行一条记录，默认以空格划分每条记录，NF为划分的总块数先打印出所有单词。
2.排序+统计+消除重复
3.输出
awk '{i=1;while(i<=NF){print $i;i++}}' words.txt \
| sort | uniq -c \
| sort -k1nr \
|awk '{print $2 " " $1}'
http://blog.csdn.net/sole_cc/article/details/44998763
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'
解释：
tr -s: 使用指定字符串替换出现一次或者连续出现的目标字符串（把一个或多个连续空格用换行符代替）
sort: 将单词从小到大排序
uniq -c: uniq用来对连续出现的行去重，-c参数为计数
sort -rn: -r 倒序排列， -n 按照数值大小排序
awk '{ print $2, $1 }': 格式化输出，将每一行的内容用空格分隔成若干部分，$i为第i个部分。
方法二：
awk '
{for(i=1;i<=NF;i++)
{s[$i]++;}
}
END{
for(i in s)
{print i " " s[i]}
}' words.txt | sort -nr -k 2
https://github.com/illuz/leetcode/blob/master/solutions/192.Word_Frequency/AC_awk.sh

awk '
	{ for (i=1; i<=NF; i++) { ++S[$i]; } }
	END { for (i in S) { print i, S[i] } }
	' words.txt \| sort -nr -k 2

sort的k参数是以第几列来排序的意思
https://github.com/SaulLawliet/leetcode/blob/master/shell/192-word-frequency.sh
awk 'BEGIN{RS=" |\n"} $0!=""{a[$0]++} END{for(i in a) print i" "a[i]}' words.txt |sort -nrk 2
http://accepted.com.cn/leetcode192/
grep -oE '[a-z]+' words.txt | sort | uniq -c | sort -r | awk '{print $2" "$1}'

https://ruixublog.wordpress.com/2015/05/14/leetcode192-word-frequency/
Read full article from Word Frequency | LeetCode OJ

Monday, October 19, 2015

Leetcode-192 Word Frequency

Labels

Popular Posts