Massive Technical Interviews Tips: May 2016

Monday, May 30, 2016

Gurafu - ETA PHONE HOME: HOW UBER ENGINEERS AN EFFICIENT ROUTE

https://eng.uber.com/engineering-an-efficient-route/
https://segmentfault.com/a/1190000005162383

In Uber’s early days, we used a combination of routing engines (including OSRM) to produce an ETA. (We didn’t have in-app navigation at this point, so we only used it for the ETA and map matching to display vehicle locations.)

We called this service “Goldeta”, which was essentially a model that sat on top of the routing engines and made an adjustment to those original estimates using our own historical Uber Data of similar routes in time and space. This solution, which ultimately took into account hundreds of thousands of Uber trips, compared them to the initial routing engine ETA. Goldeta worked better than using any single ETA alone. However, one issue with this approach was the cold start problem: when we launch in new cities we didn’t have enough data to inform an ETA offset (for new cities, our ETA used to be less accurate than older cities for precisely this reason). Also, as we grew we periodically needed to add new features whose development was slowed by having to implement it to an open source solution (OSRM) which wasn’t built from the ground up with a dispatching system in mind.

The whole road network is modeled as a graph. Nodes represent intersections, and edges represent road segments. The edge weights represent a metric of interest: often either the road segment distance or the time take it takes to travel through it. Concepts such as one-way streets, turn restrictions, turn costs, and speed limits are modeled in the graph as well.

Of course that isn’t the only way to model the real world. Some people want to model road segments as nodes, and edges as the transition between one road segment to another. This is called edge-based representation, in contrast to the node-based representation mentioned before. Each representation has its own tradeoffs, so it’s important to know what you need before committing to one or the other.

Once you have decided on the data structure, you can use different routing algorithms to find a route. One simple example you can try at home is the Dijkstra’s search algorithm, which has become the foundation for most modern routing algorithms today. However, in a production environment, Dijkstra, or any other algorithm that works on top of an unprocessed graph, is usually too slow.

OSRM is based on contraction hierarchies. Systems based on contraction hierarchies achieve fast performance — taking just a few milliseconds to compute a route — by preprocessing the routing graph. (Below 100 milliseconds in the 99th percentile response time. We need this because this calculation is done every time before a vehicle is dispatched to a ride request.) But because the preprocessing step is very slow, it’s very difficult to make real-time traffic work. With our data, it takes roughly 12 hours to build the contracted graph using all the roads of the world, meaning we can never take into account up-to-date traffic information.

This is the reason some preprocessing and tweaking are often needed to speed up querying. (Recent examples of this class of algorithms in the literature include highway hierarchies, the ALT-algorithm, and customizable route planning.)

1. Making Contraction Hierarchies Dynamic

In our routing graph, as new traffic information comes in, we need to be able to dynamically update those graph edge weights. As we mentioned, we used OSRM which uses contraction hierarchies as its routing algorithm. One drawback of contraction hierarchies is when edge weights are updated, the pre-processing step needs to be re-run for the whole graph, which could take a few hours for a big graph, for example one that covers the whole planet. This makes contraction hierarchies unsuitable for real-time traffic updates.

Contraction hierarchies’ preprocessing step can be significantly faster by doing dynamic updates, where the ordering of the nodes remains the same and only the edges that change due to traffic are updated. This decreases the precomputation significantly. With this approach, it takes just 10 minutes to do a dynamic update of the world graph even with 10% of the road segments changing traffic speed. However, 10 minutes is still too much of a delay for the traffic update to finally be considered for ETAs. So, this path was ultimately a dead end.

2. Sharding

We can also break up the graphs into geographic regions, aka sharding. Sharding speeds up the time it takes to build the contracted graph. However, this requires quite a bit of engineering on our infrastructure for it to work, and would introduce bottlenecks in the cluster size of servers for each region. If one region received too many requests during peak hours, the other servers couldn’t share the load. We wanted to make the most out of the limited numbers of servers we had, so we did not implement this solution either.

**3. A* Algorithm**

For real-time updates and in a small scale, we tried the A* search algorithm. At a high level, A* is Dijkstra’s search algorithm with heuristics, so A* prioritizes whichever nodes are most likely to find a route from A to B. This means we can update the edge weights of the graph in real-time to account for traffic conditions without needing to do any precomputation. And since most routes we need to calculate are for short trips (the en route time from drivers to riders), A* works well in those situations.

But we knew A* was a temporary solution because it’s really slow for long routes: the A* response grows geometrically in relation to the depth of nodes traversed. (For example, the response time for a route from the Presidio to the Mission District in San Francisco is ~120 milliseconds, several times longer than contraction hierarchies.)

Even A* with landmarks, which makes use of the triangle inequality and several pre-computation tricks, does not increase the time of A* traversal enough to make it a viable solution.

For long distance trips, A* simply doesn’t have a quick enough response time, so we end up falling back to using a contracted graph that doesn’t have the dynamic edge weights to begin with.

We needed the best of both worlds: we needed precomputation to make it fast, and the ability to update edge weights quickly enough to support real-time traffic. Our solution effectively handles real-time traffic updates by re-running the preprocessing step only for a small part of the whole graph. Since our solution divides the graph into layers of small cells which are relatively independent of each other, the preprocessing step is run in parallel when needed to make it even faster. (To find out more, click here!)

The new system is based on Gurafu, our new routing engine, and Flux, Uber’s first historical traffic system based on GPS data we collect from partner phones. We used our in-house ETA system primarily for pickups, but we’ve also been tracking ETA accuracy for full trip length ETAs.

Friday, May 27, 2016

Cracking On-Site Code Interview

https://discuss.leetcode.com/topic/131/things-to-keep-in-mind-for-white-board-interviews

It's very VERY important to write code on a paper. You can get better only if you practice hard on this.
When you're coding online, every time you hit backspace, it's gonna hurt in your real interview. This means you'll have to erase and rewrite. Forget reordering of the code, that'll mess up the whole white board.
When practicing on leetcode or anywhere else, try to make sure that the first time you type is your best line of code.
Use short variables, smaller text, and make sure you have complete picture before you start writing the code. Leave extra space to declare additional variables in case needed later.

you should keep in mind all the useful api of them.

Someone advised to split the white board into two parts, one for the code, the other for thinking and drawing. If the whiteboard is not big enough, I think that is impossible to split it into parts

During onsite interviews, try to start writing code as high up as possible so you leave plenty of space below. Leave some vertical spaces between lines so you can easily insert new lines of code if you have to.

Before writing code, it is helpful to break down the problem into several small pieces, ideally each small piece should be a separate function itself. It is helpful if the function itself is trivial to implement but contain some small details which can detract from the main algorithm itself. You can always implement these functions later.

Make sure you have thought of all edge cases and have a clear picture of the entire algorithm before you begin coding. Otherwise you will spend lots of time fixing the mistakes due to missing a specific case / not planning ahead carefully

Thursday, May 26, 2016

Linux Shell Scripting Misc 2

https://stackoverflow.com/questions/7573368/in-place-edits-with-sed-on-os-x

You can use the -i flag correctly by providing it with a suffix to add to the backed-up file. Extending your example:

sed -i.bu 's/oldword/newword/' file1.txt

Will give you two files: one with the name file1.txt that contains the substitution, and one with the name file1.txt.bu that has the original content.

Mildly dangerous

If you want to destructively overwrite the original file, use something like:

sed -i '' 's/oldword/newword/' file1.txt
      ^ note the space

Because of the way the line gets parsed, a space is required between the option flag and its argument because the argument is zero-length.

https://unix.stackexchange.com/questions/57924/how-to-delete-commands-in-history-matching-a-given-string

The history command just operates on your history file, $HISTFILE (typically ~/.history or ~/.bash_history). It'll be much easier if you just remove the lines from that file, which can be done many ways. grep is one way, but you have to be careful not to overwrite the file while still reading it:

$ grep -v searchstring "$HISTFILE" > /tmp/history
$ mv /tmp/history "$HISTFILE"

Another way is with sed:

$ sed -i '/searchstring/d' "$HISTFILE"

https://www.tecmint.com/clear-command-line-history-in-linux/

$ ls -l /home/aaronkilik/.bash_history

$ history -d 2038

To delete or clear all the entries from bash history, use the history command below with the -c option.

$ history -c

Alternatively, you can use the command below to delete history of all last executed commands permanently in the file.

$ cat /dev/null > ~/.bash_history

IFS
https://stackoverflow.com/questions/2789319/file-content-into-unix-variable-with-newlines

This is due to IFS (Internal Field Separator) variable which contains newline.

A workaround is to reset IFS to not contain the newline, temporarily:

When given a command line, bash splits it into words according to the documentation for the IFSvariable:

IFS: The Internal Field Separator that is used for word splitting after expansion ... the default value is <space><tab><newline>.

That specifies that, by default, any of those three characters can be used to split your command into individual words. After that, the word separators are gone, all you have left is a list of words.

https://stackoverflow.com/questions/229551/how-to-check-if-a-string-contains-a-substring-in-bash

If you prefer the regex approach:

string='My string';

if [[ $string =~ "My" ]]
then
   echo "It's there!"
fi

Test if it does NOT contain a string: if [[ ! "abc" =~ "d" ]] is true
https://stackoverflow.com/questions/46914505/substituting-keyword-in-string-with-multi-line-variable-via-sed
http://mywiki.wooledge.org/BashFAQ/100

$ var="She favors the bold.  That's cold."
$ echo "${var/old/new}"
She favors the bnew.  That's cold.

That replaces just the first occurrence of the word old. If we want to replace all occurrence of the word, we double up the first slash:

$ var="She favors the bold.  That's cold."
$ echo "${var//old/new}"
She favors the bnew.  That's cnew.

http://wiki.bash-hackers.org/syntax/pe

parameters referenced by a name are called variables (this also applies to arrays)
parameters referenced by a number are called positional parameters and reflect the arguments given to a shell
parameters referenced by a special symbol are auto-set parameters that have different special meanings and uses

Parameter expansion is the procedure to get the value from the referenced entity, like expanding a variable to print its value. On expansion time you can do very nasty things with the parameter or its value. These things are described here.

The second form with the curly braces is also needed to access positional parameters (arguments to a script) beyond $9:

echo "Argument 10 is: ${10}"

- Use a default value
```bash
${PARAMETER:-WORD}
# the default value is only used when the parameter was unset, not when it was empty
${PARAMETER-WORD}
```
- Assign a default value
```bash
${PARAMETER:=WORD}
${PARAMETER=WORD}
https://www.alexecollins.com/shell-scripting-recipes/

You can also give variables defaults, in case they are not passed:

  name=${1:-"Anonymous"}

  [ "$name" = "" ] && echo "invalid name" >&2 && exit 1

Scripts and functions can also read from standard in, allowing you to used them as part of a pipeline:

  while read name ; do
    echo "Hello $name!"
  done

A marker file is a file used to indicate a long running task has already run, and does not need to be run again, or is already running, and should not start.

For example, only run the find if the output file has not already been created:

[ ! -e text_files ] && find . -name '*.txt' > text_files

find_text_files() {
  set -Eeuo pipefail

  find . -name '*.txt' > text_files
}

[ ! -e /tmp/find.marker ] && find_text_files && touch /tmp/find.marker

I recommend they are lower-case, to make them clearly different to from environment variables, which are usually upper-case.

https://unix.stackexchange.com/questions/121418/assigning-a-command-to-variable

The line you wrote defines a variable whose value is the string ls -f | grep -v /. When you use it unquoted, it expands to a list of words (the string is split at whitespace): ls, -f, |, grep, -v, /. The | character isn't special since the shell is splitting the string, not parsing it, so it becomes the second argument to the ls command.

You can't stuff a command into a string like this. To define a short name for a command, use an alias (if the command is complete) or a function (if the command has parameters and is anything more than passing some default arguments to a single command). In your case, an alias will do.

alias my_File='ls -f | grep -v /'
my_File

$ my_File='ls -f | grep -v '\/''
$ $my_File 
ls: cannot access |: No such file or directory
ls: cannot access grep: No such file or directory
[snip]

When interpreting $my_File, bash treats the characters in it as just characters. Thus, for one, the command line has a literal | character in it, not a pipe.

If you are trying to execute $my_File on the command line and have the pipes work, you need eval $my_File

https://www.chromium.org/chromium-os/shell-style-guidelines

When using variables, avoid using the ${var} form when possible. The shell knows to look up var as ${var} for you and omitting the ${...} leads to cleaner code. Note: this does not mean you should avoid using ${var} in non-arithmetic code.

# To increment the variable "i" by one.  We avoid the ++ operator
# as it is not as portable, and this version isn't much longer.
# Note that:
#  - We do not write ${i} or $i.
#  - We put a space after the (( and before the )).
: $(( i += 1 ))

# To decrement the variable "i" by one:
: $(( i -= 1 ))

# Do some complicated math.
min=5
sec=30
echo $(( (min * 60) + sec ))

People often want to print out a string but omit the trailing new line. Or print a string with escape sequences (like colors or tabs). You should never use echo for this. Instead, use printf. In other words, when you use echo, avoid all options like -eor -n. The printf command is both powerful & portable, and has well defined behavior in all shell environments.

# Print out a string without a trailing newline.
printf '%s' "${some_var}"

# Print out a string and interpret escape sequences it contains.
printf '%b\n' "${some_var}"

# Print escape sequences in place.
printf '\tblah: run it and believe\n'

Default Assignments

Sometimes you want to set a variable to something if it isn't already set. People will try to test for this case using the -z operator ([[ -z ${foo} ]]). This leads to duplicated/multiline code when it can all be accomplished in one line. It might also not be correct if you want to accept an empty string as a valid input.

# Assign "bar" to the variable "foo" if it is not set, or if it is set to "".
# Note: the quotes are not required here, but people seem to prefer it.
: ${foo:="bar"}

# Assign "bar" to the variable "foo" only if it is not set.
# If bar is already set to "", do nothing.
: ${foo="bar"}

Argument/Option Parsing

Often times you want your script to accept flags like --foo or -q. There are three options depending on how much flag parsing you need to do:

Parse the arguments yourself and scan for options

Should be avoided for anything beyond one or two simple flags

Use the getopts built-in helper

Preferred when you only have short options (e.g. -q and -v and -h vs --quiet and --version and --help)
You have to implement the help (usage) flag yourself

https://mywiki.wooledge.org/glob
Even if a file contains internal whitespace, the expansion of a glob that matches that file will still preserve each filename as a single word.

# This is safe even if a filename contains whitespace:
for f in *.tar; do
    tar tvf "$f"
done

# But this one is not:
for f in $(ls | grep '\.tar$'); do
    tar tvf "$f"
done

Globs are also used to match patterns in a few places in Bash. The most traditional is in the case command:

case "$input" in
    [Yy]|'') confirm=1;;
    [Nn]*) confirm=0;;
    *) echo "I don't understand.  Please try again.";;
esac

Patterns (which are separated by | characters) are matched against the first word after the case itself. The first pattern which matches, "wins", causing the corresponding commands to be executed.

Bash also allows globs to appear on the right-hand side of a comparison inside a [[ command:

   1 if [[ $output = *[Ee]rror* ]]; then ...

Finally, globs are used during parameter expansion to indicate patterns which may be stripped out, or replaced, during a substitution. Simple examples (there are many more on the previously referenced page):

filename=${path##*/}    # strip leading pattern that matches */ (be greedy)
dirname=${path%/*}      # strip trailing pattern matching /* (non-greedy)

printf '%s\n' "${arr[@]}"          # dump an array, one element per line
printf '%s\n' "${arr[@]/error*/}"  # dump array, removing error* if matched

https://mywiki.wooledge.org/BashGuide/Arrays
BASH provides three types of parameters: Strings, Integers and Arrays.

The easiest way to create a simple array with data is by using the =() syntax:

$ names=("Bob" "Peter" "$USER" "Big Bad John")

This syntax is great for creating arrays with static data or a known set of string parameters, but it gives us very little flexibility for adding lots of array elements. If you need more flexibility, you can also specify explicit indexes:

$ names=([0]="Bob" [1]="Peter" [20]="$USER" [21]="Big Bad John")
# or...
$ names[0]="Bob"

Notice that there is a gap between indices 1 and 20 in this example. An array with holes in it is called a sparse array. Bash allows this, and it can often be quite useful.

If you want to fill an array with filenames, then you'll probably want to use Globs in there:

$ photos=(~/"My Photos"/*.jpg)

Unfortunately, its really easy to equivocally create arrays with a bunch of filenames in the following way:

$ files=$(ls)    # BAD, BAD, BAD!
$ files=($(ls))  # STILL BAD!

Remember to always avoid using ls. The first would create a string with the output of ls. That string cannot possibly be used safely for reasons mentioned in the Arrays introduction. The second is closer, but it still splits up filenames with whitespace.

This is the right way to do it:

$ files=(*)      # Good!

Breaking up a string is what IFS is used for:

$ IFS=. read -a ip_elements <<< "127.0.0.1"

https://mywiki.wooledge.org/BashGuide/Practices
It's not just spaces you need to protect. Word Splitting occurs on all whitespace, including tabs, newlines, and any other characters in the IFS variable.

echo "$(ls -al)"

You should use arrays for nearly all of these cases. Arrays have the benefit that they separate strings without the need for an explicit delimiter. That means your strings in the array can contain any valid (non-NUL) character, without the worry that it might be your string delimiter (like the space is in our example above). Using arrays in our example above gives us the ability to add middle or last names of our friends:

$ friends=( "Marcus The Rich" "JJ The Short" "Timid Thomas" "Michelangelo The Mobster" )
$ for friend in "${friends[@]}"

Using [[ we can avoid the mess altogether. [[ sees the < operator before Bash gets to use it for Redirection -- problem fixed. Once again, [[ is safer.

cat file | grep pattern
- Don't use cat to feed a single file's content to a filter. cat is a utility used to concatenate the contents of several files together.
- To feed the contents of a file to a process you will probably be able to pass the filename as an argument to the program (grep 'pattern' /my/file, sed 'expression' /my/file, ...).
- If the manual of the program does not specify any way to do this, you should use redirection (read column1 column2 < /my/file, tr ' ' '\n' < /my/file, ...).
for line in $(<file); do
- Don't use a for loop to read the lines of a file. Use a while read loop instead.
for number in $(seq 1 10); do
- For the love of god and all that is holy, do not use seq to count.
- Bash is able enough to do the counting for you. You do not need to spawn an external application (especially a single-platform one) to do some counting and then pass that application's output to Bash for word splitting. Learn the syntax of for already!
- In general, C-style for loops are the best method for implementing a counter for ((i=1; i<=10; i++)).
- If you actually wanted a stream of numbers separated by newlines as test input, consider printf '%d\n' {1..10}. You can also loop over the result of a sequence expansion if the range is constant, but the shell needs to expand the full list into memory before processing the loop body. This method can be wasteful and is less versatile than other arithmetic loops.
i=`expr $i + 1`
- expr is a relic of ancient Rome. Do not wield it.
- It was used in scripts written for shells with very limited capabilities. You're basically spawning a new process, calling another C program to do some math for you and return the result as a string to bash. Bash can do all this itself and so much faster, more reliably (no numbers->string->number conversions) and in all, better.
- You should use this in Bash: let i++ or ((i++))
- Even POSIX sh can do arithmetic: i=$(($i+1)). It only lacks the ++ operator and the ((...)) command (it has only the $((...)) substitution).

https://mywiki.wooledge.org/DontReadLinesWithFor

The final issue with reading lines with for is inefficiency. A while read loop reads one line at a time from an input stream (though one byte at a time in some circumstances); $(<afile) slurps the entire file into memory all at once. For small files, this is not a problem, but if you're reading large files, the memory requirement will be enormous. (Bash will have to allocate one string to hold the file, and another set of strings to hold the word-split results... essentially, the memory allocated will be twice the size of the input file.)

https://mywiki.wooledge.org/BashFAQ/001

The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines or to escape the delimiters). Without this option, any unescaped backslashes in the input will be discarded. You should almost always use the -r option with read.

while IFS= read -r line; do
  printf '%s\n' "$line"
done < "$file"

https://stackoverflow.com/questions/2172352/in-bash-how-can-i-check-if-a-string-begins-with-some-value

http://tldp.org/LDP/abs/html/comparison-ops.html

[[ $a == z* ]]   # True if $a starts with an "z" (pattern matching).
[[ $a == "z*" ]] # True if $a is equal to z* (literal matching).

[ $a == z* ]     # File globbing and word splitting take place.
[ "$a" == "z*" ] # True if $a is equal to z* (literal matching).

https://www.tldp.org/LDP/abs/html/string-manipulation.html

String Length

${#string}
expr length $string: These are the equivalent of strlen() in C.

Length of Matching Substring at Beginning of String

expr match "$string" '$substring'

$substring is a regular expression.

expr "$string" : '$substring'

$substring is a regular expression.

stringZ=abcABC123ABCabc
#       |------|
#       12345678

echo `expr match "$stringZ" 'abc[A-Z]*.2'`   # 8
echo `expr "$stringZ" : 'abc[A-Z]*.2'`       # 8

Index

expr index $string $substring

Substring Extraction

${string:position}: Extracts substring from $string at $position.
If the $string parameter is "*" or "@", then this extracts the positional parameters, [1] starting at $position.
${string:position:length}

expr substr $string $position $length

Substring Removal

${string#substring}: Deletes shortest match of $substring from front of $string.
${string##substring}: Deletes longest match of $substring from front of $string.

Substring Replacement

${string/substring/replacement}: Replace first match of $substring with $replacement. [2]
${string//substring/replacement}: Replace all matches of $substring with $replacement.

https://stackoverflow.com/questions/18488651/how-to-break-out-of-a-loop-in-bash

while true ; do
    ...
    if [ something ]; then
        break
    fi
done

https://stackoverflow.com/questions/5274294/how-can-you-run-a-command-in-bash-over-until-success

passwd
while [ $? -ne 0 ]; do
    passwd
done

$ passwd
$ while [ $? -ne 0 ]; do !!; done

while [ -n $(passwd) ]; do
        echo "Try again";
done;

until passwd
do
  echo "Try again"
done

until passwd; do echo "Try again"; done
https://stackoverflow.com/questions/1289026/syntax-for-a-single-line-bash-infinite-while-loop

until ((0)); do foo; sleep 2; done

Note that in contrast to while, until would execute the commands inside the loop as long as the test condition has an exit status which is not zero.

Using a for loop:

for ((;;)); do foo; sleep 2; done

Another way using until:

until [ ]; do foo; sleep 2; done

while true ; do continue ; done

Fr your question it would be:

while true; do foo ; sleep 2 ; done

http://www.ruanyifeng.com/blog/2017/11/bash-set.html

执行脚本的时候，如果遇到不存在的变量，Bash 默认忽略它。

set -u就用来改变这种行为。脚本在头部加上它，遇到不存在的变量就会报错，并停止执行。

-u还有另一种写法-o nounset，两者是等价的。


set -o nounset

默认情况下，脚本执行后，屏幕只显示运行结果，没有其他内容。如果多个命令连续执行，它们的运行结果就会连续输出。有时会分不清，某一段内容是什么命令产生的。

set -x用来在运行结果之前，先输出执行的那一行命令。

-x还有另一种写法-o xtrace。

command || exit 1

# 写法一
command || { echo "command failed"; exit 1; }

# 写法二
if ! command; then echo "command failed"; exit 1; fi

# 写法三
command
if [ "$?" -ne 0 ]; then echo "command failed"; exit 1; fi

另外，除了停止执行，还有一种情况。如果两个命令有继承关系，只有第一个命令成功了，才能继续执行第二个命令，那么就要采用下面的写法。


command1 && command2

set -e根据返回值来判断，一个命令是否运行失败。但是，某些命令的非零返回值可能不表示失败，或者开发者希望在命令失败的情况下，脚本继续执行下去。这时可以暂时关闭set -e，该命令执行结束后，再重新打开set -e。


set +e
command1
command2
set -e

上面代码中，set +e表示关闭-e选项，set -e表示重新打开-e选项。

还有一种方法是使用command || true，使得该命令即使执行失败，脚本也不会终止执行。


#!/bin/bash
set -e

foo || true
echo bar

上面代码中，true使得这一行语句总是会执行成功，后面的echo bar会执行。

-e还有另一种写法-o errexit。


set -o errexit

set -e有一个例外情况，就是不适用于管道命令。

所谓管道命令，就是多个子命令通过管道运算符（|）组合成为一个大的命令。Bash 会把最后一个子命令的返回值，作为整个命令的返回值。也就是说，只要最后一个子命令不失败，管道命令总是会执行成功，因此它后面命令依然会执行，set -e就失效了。

set -o pipefail用来解决这种情况，只要一个子命令失败，整个管道命令就失败，脚本就会终止执行。

set命令的上面这四个参数，一般都放在一起使用。


# 写法一
set -euxo pipefail

# 写法二
set -eux
set -o pipefail

这两种写法建议放在所有 Bash 脚本的头部。

另一种办法是在执行 Bash 脚本的时候，从命令行传入这些参数。


$ bash -euxo pipefail script.sh

https://stackoverflow.com/questions/17336915/return-value-in-bash-script

Although bash has a return statement, the only thing you can specify with it is the function's own exit status (a value between 0 and 255, 0 meaning "success"). So return is not what you want.

You might want to convert your return statement to an echo statement - that way your function output could be captured using $() braces, which seems to be exactly what you want.

Here is an example:

function fun1(){
  echo 34
}

function fun2(){
  local res=$(fun1)
  echo $res
}

https://stackoverflow.com/questions/59838/check-if-a-directory-exists-in-a-shell-script

-L "FILE" : FILE exists and is a symbolic link (same as -h)
-h "FILE" : FILE exists and is a symbolic link (same as -L)
-d "FILE" : FILE exists and is a directory
-w "FILE" : FILE exists and write permission is granted

[ -d "/path/to/dir" ] && echo "Directory /path/to/dir exists." || echo "Error: Directory /path/to/dir does not exists."

https://askubuntu.com/questions/80371/bash-history-handling-with-multiple-terminals

The bash session that is saved is the one for the terminal that is closed the latest. If you want to save the commands for every session, you could use the trick explained here.

export PROMPT_COMMAND='history -a'
To quote the manpage: “If set, the value is executed as a command prior to issuing each primary prompt.”

So every time my command has finished, it appends the unwritten history item to ~/.bash_history before displaying the prompt (only $PS1) again.

So after putting that line in /etc/bash.bashrc I don’t have to find myself reinventing wheels or lose valuable seconds re-typing stuff just because I was lazy with my terminals.

Anyway, you'll need to take into account that commands from different sessions will be mixed in your history file so it won't be so straightforward to read it later.

http://northernmost.org/blog/flush-bash_history-after-each-command/
https://unix.stackexchange.com/questions/1288/preserve-bash-history-in-multiple-terminal-windows

# Avoid duplicates
export HISTCONTROL=ignoredups:erasedups  
# When the shell exits, append to the history file instead of overwriting it
shopt -s histappend

# After each command, append to the history file and reread it
export PROMPT_COMMAND="${PROMPT_COMMAND:+$PROMPT_COMMAND$'\n'}history -a; history -c; history -r"

http://wiki.bash-hackers.org/syntax/ccmd/conditional_expression
-n string is not null.
-z string is null, that is, has zero length

Testing for File Characteristics
-d File is a directory
-e File exists
-f File is a regular file
-s File has a size greater than zero
-r, -w, -x, -s - socket

Testing with Pattern Matches
<STRING> == <PATTERN>
<STRING> =~ <ERE>
if [[ "${MYFILENAME}" == *.jpg ]]

-a, &&
-o, ||
if [ ! -d $param ]
https://www.linux.com/learn/essentials-bash-scripting-using-loops

for i in $( command ); do command $i; done

That's on the command line. If you're doing a script, you'd format it as:

for i in $( command ); do 
command $i 
done

So, if I wanted to make a backup copy of all the HTML files in a directory, I'd use:

for i in $( ls *html ); do cp $i $i.bak; done

Use echo to replace the command you wish to run as a test-run.

i=0 
while [ $i -lt 22 ] 
do 
touch $i 
i=$[$i+1] 
done

https://stackoverflow.com/questions/1075083/execute-a-shell-command-from-a-shell-script-without-stopping-if-error-occurs

Disable the "exit immediately" option with set +e, run your command, then optionally re-enable it with set -e:

set +e
invoke-rc.d tomcat stop
set -e  # optional

See section 4.3.1 of the Bash manual for an explanation of the set builtin and all of its various options (of which there are many).

http://www.gnu.org/software/bash/manual/bashref.html#The-Set-Builtin

https://unix.stackexchange.com/questions/132511/how-to-capture-error-message-from-executed-command

errormessage=$( /sbin/modprobe -n -v hfsplus 2>&1)
echo $errormessage

/sbin/modprobe -n -v hfsplus 2> fileName

https://www.linux.com/learn/essentials-bash-scripting-using-loops

https://github.com/niieani/bash-oo-framework
https://stackoverflow.com/questions/22009364/is-there-a-try-catch-command-in-bash

Brace Expansion
http://www.gnu.org/software/bash/manual/bashref.html#Brace-Expansion
Patterns to be brace expanded take the form of an optional preamble, followed by either a series of comma-separated strings or a sequence expression between a pair of braces, followed by an optional postscript. The preamble is prefixed to each string contained within the braces, and the postscript is then appended to each resulting string, expanding left to right.
Brace expansions may be nested. The results of each expanded string are not sorted; left to right order is preserved. For example,

bash$ echo a{d,c,b}e
ade ace abe

Brace expansion is performed before any other expansions, and any characters special to other expansions are preserved in the result.

mkdir /usr/local/src/bash/{old,new,dist,bugs}

chown root /usr/{ucb/{ex,edit},lib/{ex?.?*,how_ex}}

rename
http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal

rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]

-v means "verbose" and it will output the names of the files when it renames them. It is a good idea to use this feature so you can keep track of what is being renamed. It is also a good idea to do a test run with -n which will do a test run where it won't rename any files, but will show you a list of files that would be renamed.

rename -n ’s/\.htm$/\.html/’ *.htm

The -n means that it's a test run and will not actually change any files.

rename -v 's/\.JPG$/\.jpg/' *.JPG

http://www.cyberciti.biz/faq/unix-linux-rename-file-extension-command/

rename .txt .doc *.txt

http://stackoverflow.com/questions/1224766/how-do-i-rename-the-extension-for-a-batch-of-files

for file in *.html; do
    mv "$file" "`basename "$file" .html`.txt"
done

EDIT: As pointed out in the comments, this does not work for filenames with spaces in them without proper quoting (now added above). When working purely on your own files that you know do not have spaces in the filenames this will work but whenever you write something that may be reused at a later time, do not skip proper quoting.

find . -name '*.txt' -exec sh -c 'mv "$0" "${0%.txt}.txt_bak"' {} \;

Install rename if you haven't: brew install rename
rename -S .html .txt *.html

-n/--just-print/--dry-run

iptables
https://www.netfilter.org/documentation/HOWTO/NAT-HOWTO-10.html
http://serverfault.com/questions/586486/how-to-do-the-port-forwarding-from-one-ip-to-another-ip-in-same-network

echo 1 > /proc/sys/net/ipv4/ip_forward

iptables -F
iptables -t nat -F
iptables -X

iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 192.168.12.77:80
iptables -t nat -A POSTROUTING -p tcp -d 192.168.12.77 --dport 80 -j SNAT --to-source 192.168.12.87

The reason a seemingly obvious iptables -t nat -A PREROUTING -d 192.168.12.87 -p tcp --dport 80 -j DNAT --to-destination 192.168.12.77 will not work is how the return packets will be routed.

You can set up rules that will cause the packets send to 192.168.12.87 to simply be NATted to 192.168.12.77, but 192.168.12.77 will then send replies directly back to the client. Those replies will not go through the host where your iptables rule is doing NAT, hence the packets in one direction are translated, but packets in the other direction are not.

There are three approaches to solving this problem.

On the first host don't just do DNAT, but also do SNAT such that return traffic will be send back through the first host. The rule could look something like iptables -t NAT -A POSTROUTING -d 192.168.12.77 -p tcp --dport 80 -j SNAT --to-source 192.168.12.87

http://stackoverflow.com/questions/4471364/how-do-i-list-the-functions-defined-in-my-shell

typeset is obsolete, please use:

declare -f

declare -f function_name

type function_name

declare -F

will give you the names of all functions

The following will list all exported functions by name:

declare -x -F

If you want also see the function code use:

declare -x -f

The export -f feature is specific to Bash

-f restrict action or display to function names and definitions
-F restrict display to function names only (plus line number and
source file when debugging)

https://bash.cyberciti.biz/guide/Pass_arguments_into_a_function

Shell functions have their own command line argument.
Use variable $1, $2..$n to access argument passed to the function.
The syntax is as follows:

name(){
  arg1=$1
  arg2=$2
  command on $arg1
}

To invoke the the function use the following syntax:

 name foo bar

http://www.tutorialspoint.com/unix/unix-shell-functions.htm

Returning Values from Functions

If you execute an exit command from inside a function, its effect is not only to terminate execution of the function but also of the shell program that called the function.

If you instead want to just terminate execution of the function, then there is way to come out of a defined function.

Based on the situation you can return any value from your function using thereturn command whose syntax is as follows −

return code

Hello () {
   echo "Hello World $1 $2"
   return 10
}

# Invoke your function
Hello Zara Ali

# Capture value returnd by last command
ret=$?

Function Call from Prompt

To remove the definition of a function from the shell, you use the unset command with the .f option. This is the same command you use to remove the definition of a variable to the shell.

$unset .f function_name

http://www.onkarjoshi.com/blog/77/how-to-split-a-file-process-the-pieces-in-multiple-threads-and-combine-results-using-a-shell-script/
split -l 15000 originalFile.txt

for f in x*
do
runDataProcessor $f > $f.out &
done

wait
for k in *.out
do
cat $k >> combinedResult.txt
done
http://www.onkarjoshi.com/blog/123/how-to-create-a-file-of-arbitrary-size-with-shell-script-commands/

You can do it with the dd command. You can use /dev/random or /dev/zero as a data source for random bytes or null bytes.

bs = block size

count = number of blocks

if = input file

of = output file

The total size of the file created will be bs x count bytes.

dd if=/dev/random of=myFile.dat bs=1024 count=102400

dd if=/dev/random of=myFile.dat bs=1024 count=1024

dd if=/dev/random of=myFile.dat bs=100 count=1

dd if=/dev/random of=myFile.dat bs=1 count=1

find where function is defined
type functionName

type -a function_name

http://mp.weixin.qq.com/s?__biz=MzIxNDMyODgyMA==&mid=2247483679&idx=1&sn=f98b8ef107264b9258f8ab76986b8f57&scene=0#wechat_redirect

配置文件：

sqs:aws-sqs-queue-name
email:myemail@gmail.com

shell script

while IFS='' read -r line || [[ -n "$line" ]]; do
  IFS=':' read -r protocol endpoint <<< "$line"
  echo "Protocol: $protocol - Endpoint: $endpoint"
done < "$file"

输出：

Protocol: sqs - Endpoint: aws-sqs-queue-name
Protocol: email - Endpoint: myemail@gmail.com

read 和 IFS

通常情况下 read 和 IFS 会一起配合使用。其中

read 通常用于读取数据和用户输入，文本使用它从字符串中读取变量。
IFS（Internal Field Separator） 用来 read 指令中的分隔符。我们可以用分割字符串，并且读取到不同的变量中。

使用 read 读取一行数据到变量：

文件：

sqs:aws-sqs-queue-name

shell script

file=$1
read -r line <<< "$file"
echo $line # => sqs:aws-sqs-queue-name

此时 -r 参数代表 raw，忽略转移字符。例如将 \n 视为字符串，而不是换行符。

读取用户名和hostname：

echo "ubuntu@192.168.1.1" | IFS='@' read -r username hostname
echo "User: $username, Host: $hostname" # => User: ubuntu, Host: 192.168.1.1

读取程序的版本号：

git describe --abbrev=0 --tags  #=> my-app-1.0.1
$(git describe --abbrev=0 --tags) | IFS='-' read -r _ _ version
echo $version # => 1.0.1

配置文件:

sqs:aws-sqs-queue-name
email:myemail@gmail.com

shell script 会解析上述文件，并且执行两条 aws-cli 指令

file=@1

while IFS='' read -r line || [[ -n "$line" ]]; do
  IFS=':' read -r protocol endpoint <<< "$line"
  # create subscription for the topic
  aws sns subscribe --topic-arn $topic_arn --protocol $protocol --notification-endpoint $endpoint
done < "$file"

https://codingstyle.cn/topics/175

cat 用于输出 apartment_prices.csv 文件内容，grep 用于按照正则过滤我们需要的文件。

grep ^Shaanxi：提取以 Shaanxi 开头的数据行，^ 在 Regex 中用于指定行首。
grep -v ^Shaanxi：-v 参数为 --invert-match，提取不满足 Regex 条件的数据。



cat apartment_prices.csv | tail -n +2 | wc -l
# 4

tail -n +2： tail 用于从文件尾部读取数据，-n +2 指定从第二行读取到行尾。我们常用 tail -f logfile.log 来监控 log 输出。

cat apartment_prices.csv | awk -F, '{ print $3; }'

awk -F, '{ print $3; }'：awk 是 linux 中非常强大的列表处理工具，linux 系统中的几乎所有输出都可以用 awk 处理。
- -F,：指定列分隔符为 ',' （CSV格式），默认为空格，制表符。
- pring $3：输出 第3列 数据。

sort -g：sort 用于排序数据，默认按照数据长度排序，-g 指定按照数字值排序。

cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $1 "," $3 }' | sort -t ',' -k 2 -g

sort -t ',' -k 2 -g： -k 2 指定按照第二列排序，-t , 指定列分隔符。

比较数据需要将数据输出到文件，然后使用git diff 或者 vim -d file1 file2 来比较。

比如比较 2014 和 2015 的数据

# 1. 提取 2014 年的数据
cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $4; }' > 2014
# 2. 提取 2015 年的数据
cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $5; }' > 2015
# 3. git diff 2014 2015
git diff 2014 2015