Word Frequency | LeetCode OJ
1.利用awk默认一行一条记录,默认以空格划分每条记录,NF为划分的总块数先打印出所有单词。
2.排序+统计+消除重复
3.输出
awk '{i=1;while(i<=NF){print $i;i++}}' words.txt \
| sort | uniq -c \
| sort -k1nr \
|awk '{print $2 " " $1}'
http://blog.csdn.net/sole_cc/article/details/44998763
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'
解释:
tr -s: 使用指定字符串替换出现一次或者连续出现的目标字符串(把一个或多个连续空格用换行符代替)
sort: 将单词从小到大排序
uniq -c: uniq用来对连续出现的行去重,-c参数为计数
sort -rn: -r 倒序排列, -n 按照数值大小排序
awk '{ print $2, $1 }': 格式化输出,将每一行的内容用空格分隔成若干部分,$i为第i个部分。
方法二:
awk '
{for(i=1;i<=NF;i++)
{s[$i]++;}
}
END{
for(i in s)
{print i " " s[i]}
}' words.txt | sort -nr -k 2
https://github.com/illuz/leetcode/blob/master/solutions/192.Word_Frequency/AC_awk.sh
sort的k参数是以第几列来排序的意思
https://github.com/SaulLawliet/leetcode/blob/master/shell/192-word-frequency.sh
awk 'BEGIN{RS=" |\n"} $0!=""{a[$0]++} END{for(i in a) print i" "a[i]}' words.txt |sort -nrk 2
http://accepted.com.cn/leetcode192/
grep -oE '[a-z]+' words.txt | sort | uniq -c | sort -r | awk '{print $2" "$1}'
https://ruixublog.wordpress.com/2015/05/14/leetcode192-word-frequency/
Read full article from Word Frequency | LeetCode OJ
Write a bash script to calculate the frequency of each word in a text file
words.txt
.
For simplicity sake, you may assume:
words.txt
contains only lowercase characters and space' '
characters.- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.
For example, assume that
words.txt
has the following content:the day is sunny the the the sunny is isYour script should output the following, sorted by descending frequency:
the 4 is 3 sunny 2 day 1http://www.cnblogs.com/lookbackinside/p/4418203.html
awk '{for(i=1;i<=NF;i++) a[$i]+=1} END{for(i in a) print i,a[i] | "sort -r -n -k2"}' words.txt
http://blog.csdn.net/wangxiaobupt/article/details/452018171.利用awk默认一行一条记录,默认以空格划分每条记录,NF为划分的总块数先打印出所有单词。
2.排序+统计+消除重复
3.输出
awk '{i=1;while(i<=NF){print $i;i++}}' words.txt \
| sort | uniq -c \
| sort -k1nr \
|awk '{print $2 " " $1}'
http://blog.csdn.net/sole_cc/article/details/44998763
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'
解释:
tr -s: 使用指定字符串替换出现一次或者连续出现的目标字符串(把一个或多个连续空格用换行符代替)
sort: 将单词从小到大排序
uniq -c: uniq用来对连续出现的行去重,-c参数为计数
sort -rn: -r 倒序排列, -n 按照数值大小排序
awk '{ print $2, $1 }': 格式化输出,将每一行的内容用空格分隔成若干部分,$i为第i个部分。
方法二:
awk '
{for(i=1;i<=NF;i++)
{s[$i]++;}
}
END{
for(i in s)
{print i " " s[i]}
}' words.txt | sort -nr -k 2
https://github.com/illuz/leetcode/blob/master/solutions/192.Word_Frequency/AC_awk.sh
awk ' | |
{ for (i=1; i<=NF; i++) { ++S[$i]; } } | |
END { for (i in S) { print i, S[i] } } | |
' words.txt | sort -nr -k 2 |
https://github.com/SaulLawliet/leetcode/blob/master/shell/192-word-frequency.sh
awk 'BEGIN{RS=" |\n"} $0!=""{a[$0]++} END{for(i in a) print i" "a[i]}' words.txt |sort -nrk 2
http://accepted.com.cn/leetcode192/
grep -oE '[a-z]+' words.txt | sort | uniq -c | sort -r | awk '{print $2" "$1}'
https://ruixublog.wordpress.com/2015/05/14/leetcode192-word-frequency/
Read full article from Word Frequency | LeetCode OJ