Thursday, July 3, 2014

Linux Interview Questions: Sed



sed '' BSD
sed -n 'p' BSD

sed -n '1p' BSD
sed -n '1,5p' BSD
sed -n '1,+4p' BSD

If we want to print every other line, we can specify the interval after the "~" character. The following line will print every other line starting with line 1:

sed -n '1~2p' BSD

Deleting Text
delete every other line starting with the first:
sed '1~2d' BSD
sed -i '1~2d' everyother.txt
To create a backup file prior to editing, add the backup extension directly after the "-i" option:
sed -i.bak '1~2d' everyother.txt

Substituting Text
's/old_word/new_word/'
sed 's/on/forward/' annoying.txt
We will provide the "g" flag to the substitute command by placing it after the substitution set.
sed 's/on/forward/g' annoying.txt

If we only wanted to change the second instance of "on" that sed finds on each line, then we could use the number "2" instead of the "g".
sed 's/on/forward/2' annoying.txt

see which lines were substituted
sed -n 's/on/forward/2p' annoying.text


Ignore case:
sed 's/SINGING/saying/i' annoying.txt

Referencing Matched Text
put parentheses around the matched text:
sed 's/^.*at/(&)/' annoying.txt
sed 's/\([a-zA-Z0-9][a-zA-Z0-9]*\) \([a-zA-Z0-9][a-zA-Z0-9]*\)/\2 \1/' annoying.txt
sed 's/\([^ ][^ ]*\) \([^ ][^ ]*\)/\2 \1/' annoying.txt

Addresses - sed, a stream editor
Selecting lines with sed
number

first~step
This GNU extension matches every stepth line starting with line first. In particular, lines will be selected when there exists a non-negative n such that the current line-number equals first+ (n * step). Thus, to select the odd-numbered lines, one would use 1~2; to pick every third line starting with the second, ‘2~3’ would be used; to pick every fifth line starting with the tenth, use ‘10~5
$ - last line

/regexp/

0,/regexp/
addr1,+N
Matches addr1 and the N lines following addr1
addr1,~N
Matches addr1 and the lines following addr1 until the next line whose input line number is a multiple of N.
Appending the ! character to the end of an address specification negates the sense of the match. That is, if the ! character follows an address range, then only lines which do not match the address range will be selected. This also works for singleton addresses, and, perhaps perversely, for the null address.

How to keep only every nth line of a file
bash - How to keep only every nth line of a file - Super User
awk 'NR == 1 || NR % 3 == 0' yourfile
sed -n '1p;0~3p' input.txt

ed
When you open a file using ed, it displays the number of characters in the file and positions you at the last line.
By default, a command affects only the current line.
Commands:
p: print, d: delete
n: move to nth line
Move to line containing word: /word/
/regular/d, 1d
g/regular/d: delete all the lines that contain the regular expression
Substituting text:
[address]s /pattern/replacement/flag
s/regular/complex/g
This command changes all occurrences on the current line.
To make it apply to all lines, use the global command, putting g before the address.
g/regular/s/regular/complex/g
The "g" at the beginning is the global command that means make the changes on all lines matched by the address. The "g" at the end is a flag that means change each occurrence on a line, not just the first.
If the address and the pattern are the same, it can be written like:
g/regular/s//complex/g
ed test < ed-script
g/re/p stands for "global regular expression print."

Sed
SYNOPSIS
-n, --quiet, --silent suppress automatic printing of pattern space
-e script, --expression=script add the script to the commands to be executed
-f script-file, --file=script-file
add the contents of script-file to the commands to be executed
-i[suffix], --in-place[=suffix]
edit files in place (makes backup if extension supplied)
-l N, --line-length=N
specify the desired line-wrap length for the `l' command
-r, --regexp-extended
use extended regular expressions in the script.
-s, --separate
consider files as separate rather than as a single continuous long stream.
-u, --unbuffered
load minimal amounts of data from the input files and flush the output buffers more often
COMMAND SYNOPSIS
Zero-address “commands”
: label Label for b and t commands.
#comment The comment extends until the next newline (or the end of a -e script fragment).
} The closing bracket of a { } block.
Zero- or One- address commands
= Print the current line number.
a \
text
Append text, which has each embedded newline preceded by a backslash.
i \
text
Insert text, which has each embedded newline preceded by a backslash.
q
Immediately quit the sed script without processing any more input, except that if auto-print is not disabled the current pattern space will be printed.
Q
Immediately quit the sed script without processing any more input.
r filename Append text read from filename.
R filename Append a line read from filename.
Commands which accept address ranges
{ Begin a block of commands (end with a }).
b label Branch to label; if label is omitted, branch to end of script.
t label
If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script.
T label
If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script.
c \
text
Replace the selected lines with text, which has each embedded newline preceded by a backslash.
d
Delete pattern space. Start next cycle.
D
Delete up to the first embedded newline in the pattern space. Start next cycle, but skip reading from the input if there is still data in the pattern space.
h H
Copy/append pattern space to hold space.
g G
Copy/append hold space to pattern space.
x
Exchange the contents of the hold and pattern spaces.
l
List out the current line in a “visually unambiguous” form.
n N
Read/append the next line of input into the pattern space.
p
Print the current pattern space.
P(capital)
Print up to the first embedded newline of the current pattern space.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
w filename
Write the current pattern space to filename.
W filename
Write the first line of the current pattern space to filename.
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
Sed was meant to execute scripts exclusively and cannot be used interactively. Sed differs from ed primarily in that it is stream-oriented. By default, all of the input to sed passes through and goes to standard output. The input file itself is not changed.
Sed is stream-oriented, and ed is not. In ed, a command without an address affects only the current line. But Sed goes through the file, a line at a time, such that each line becomes the current line, and the commands are applied to it. The result is that sed applies a command without an address to every line in the file.
In ed you use addressing to expand the number of lines that are the object of a command; in sed, you use addressing to restrict the number of lines affected by a command.
Command-Line Syntax
sed [-e] ’instruction’ file
The -e option is necessary only when you supply more than one instruction on the command line. It tells sed to interpret the next argument as an instruction. When there is a single instruction, sed is able to make that determination on its own.
sed ’s/MA/Massachusetts/’ list
To specify multiple instructions on the command line:
sed ’s/ MA/, Massachusetts/; s/ PA/, Pennsylvania/’ list
sed -e ’s/ MA/, Massachusetts/’ -e ’s/ PA/, Pennsylvania/’ list
sed -f scriptfile inputfile
Scripting
In sed and awk, each instruction has two parts: a pattern and a procedure. The patter n is a regular expression delimited with slashes (/). A procedure specifies one or more actions to be performed.
Saving output
sed -f sedscr list > newlist
Do not redir ect the output to the file you are editing or you will clobber it. (The “>” redir ection operator truncates the file before the shell does anything else.) If you want the output file to replace the input file, you can do that as a separate step, using the mv command.
Suppressing automatic display of input lines
The default operation of sed is to output every input line. The -n option suppresses the automatic output. When specifying this option, each instruction intended to produce output must contain a print command, p.
sed -n -e ’s/MA/Massachusetts/p’ list
Mixing options (POSIX)
You can build up a script by combining both the -e and -f options on the command
line. The script is the combination of all the commands in the order given.
Using sed and awk Together
awk -F, ’{
print $4 ", " $0
}’ $* |
sort |
awk -F, ’
$1 == LastState {
print "\t" $2
}
$1 != LastState {
LastState = $1
print $1
print "\t" $2
}’

Basic principles how sed works:
• All editing commands in a script are applied in order to each line of input.
• Commands are applied to all lines (globally) unless line addressing restricts the lines affected by editing commands.
• The original input file is unchanged; the editing commands modify a copy of original input line and the copy is sent to standard output.

Sed is implicitly global, will apply commands to every input line.
Line addresses are used to supply context for, or restrict, an operation.
/Sebastopol/s/CA/California/g

A sed command can specify zero, one, or two addresses. An address can be a regular expression describing a pattern, a line number, or a line addressing symbol.
• If no addr ess is specified, then the command is applied to each line.
• If ther e is only one address, the command is applied to any line matching the address.
• If two comma-separated addresses are specified, the command is performed on the first line matching the first address and all succeeding lines up to and including a line matching the second address.
• If an addr ess is followed by an exclamation mark (!), the command is applied to all lines that do not match the address.
1d,$d
The line number refers to an internal line count maintained by sed. This counter is not reset for multiple input files. Thus, no matter how many files were specified as input, there is only one line 1 and only one last line in the input stream. Last line can be specified using the addressing symbol $.
/ˆ$/d deletes only blank lines
50,$d, 1,/ˆ$/d

An exclamation mark (!) following an address reverses the sense of the match. The following script deletes all lines except those inside tbl input:
/ˆ\.TS/,/ˆ\.TE/!d

Grouping Commands
Braces ({}) are used in sed to nest one address inside another or to apply multiple commands at the same address. You can nest addresses if you want to specify a range of lines and then, within that range, specify another address.
To delete blank lines only inside blocks of tbl input, use the following command:
/ˆ\.TS/,/ˆ\.TE/{
/ˆ$/d
s/ˆ\.ps 10/.ps 8/
s/ˆ\.vs 12/.vs 10/
}
The opening curly brace must end a line and the closing curly brace must be on a line by itself. Be sure there are no spaces after the braces.
Basic sed Commands: d (delete), a (append), i (insert), and c (change)
[address]command
[line-address]command
Placing multiple commands on the same line is highly discouraged because sed scripts are difficult enough to read even when each command is written on its own line.
Comment #
Substitution
[address]s/pattern/replacement/flags
where the flags that modify the substitution are:
n A number (1 to 512) indicating that a replacement should be made for only the n th occurrence of the patter n.
g Make changes globally on all occurrences in the pattern space. Normally only the first occurrence is replaced.
p Print the contents of the pattern space.
w file Write the contents of the pattern space to file.
In the replacement section, only the following characters have special meaning:
& Replaced by the string matched by the regular expression.
\n Matches the n th substring (n is a single digit) previously specified in the pattern using “\(” and “\)”.
\ Used to escape the ampersand (&), the backslash (\), and the substitution command’s delimiter when they are used literally in the replacement section. In addition, it can be used to escape the newline and create a multiline replacement string.

The print and write flags are typically used when the default output is suppressed (the -n option). In addition, if a script contains multiple substitute commands that match the same line, multiple copies of that line will be printed or written to file.
Append, Insert, and Change
The syntax of these commands is unusual for sed because they must be specified over multiple lines.
append [line-address]a\
text
insert [line-address]i\
text
change [address]c\
text

Each of these commands requires a backslash following it to escape the first end of line. The text must begin on the next line. To input multiple lines of text, each successive line must end with a backslash, with the exception of the very last line.
/<Larry’s Address>/i\
4700 Cross Court\
French Lick, IN
The append and insert commands can be applied only to a single line address, not a range of lines. The change command, however, can address a range of lines. In this case, it replaces all addressed lines with a single copy of the text.
/ˆFrom /,/ˆ$/c\
<Mail Header Removed>
removes the entire mail-message header and replaces it with the line “<Mail Header Removed>.” Note that you will see the opposite behavior when the change command is one of a group of commands, enclosed in braces, that act on a range of lines.
/ˆFrom /,/ˆ$/{
s/ˆFrom //p
c\
<Mail Header Removed>
}
will output “<Mail Header Removed>” for each line in the range.

The change command clears the pattern space, having the same effect on the pattern space as the delete command. No command following the change command in the script is applied.
The insert and append commands do not affect the contents of the pattern space. The supplied text will not match any address in subsequent commands in the script, nor can those commands affect the text. No matter what changes occur to alter the pattern space, the supplied text will still be output appropriately. This is also true when the default output is suppressed — the supplied text will be output even if the pattern space is not. Also, the supplied text does not affect sed’s internal line counter.

List
The list command (l) displays the contents of the pattern space, showing nonprinting characters as two-digit ASCII codes.
Transform
[address]y/abc/xyz/
The replacement is made by character position, “a” is replaced by “x” anywhere on the line, regardless of whether or not it is followed by a “b”.
Print command (p)
/ˆ\.Ah/{
p
s/ "//g
s/ˆ\.Ah //p
}
The substitute command’s print flag differs from the print command in that it is conditional upon a successful substitution.

Print Line Number
An equal sign (=) following an address prints the line number of the matched line.
[line-address]=
This command cannot operate on a range of lines.
#n print line number and line with if statement
/ if/{
=
p
}
#n suppr esses the default output of lines.
Next
The next command (n) outputs the contents of the pattern space and then reads the next line of input without retur ning to the top of the script. Its syntax is:
[address]n
The next command changes the normal flow control, the next command causes the next line of input to replace the current line in the pattern space. Subsequent commands in the script are applied to the replacement line, not the current line. If the default output has not been suppressed, the current line is printed before the replacement takes place.
/ˆ\.H1/{
n
/ˆ$/d
}
Match any line beginning with the string ‘.H1’, then print that line and read in the next line. If that line is blank, delete it.
Reading and Writing Files
[line-address]r file
[address]w file
The read command reads the contents of file into the pattern space after the addressed line. It cannot operate on a range of lines. The write command writes the contents of the pattern space to the file.
The read command will not complain if the file does not exist. The write command will create a file if it does not exist; if the file already exists, the write command will overwrite it each time the script is invoked. If there are multiple instructions writing to the same file in one script, then each write command appends to the file.
sed ’$r closing’ $* | pr | lp
/ˆ<Company-list>/r company.list
/ˆ<Company-list>/d
2.
/Northeast$/{
s///
w region.northeast
}
The substitute command matches the same pattern as the address and removes it.
Quit(q)
The quit command (q) causes sed to stop reading new input lines (and stop sending them to the output). Its syntax is:
[line-address]q
sed ’100q’ test
sed -n "
/ˆ\.de *$mac/,/ˆ\.\.$/{
p
/ˆ\.\.$/q
}" $file

Advanced sed Commands
Multiline Pattern Space
Sed has the ability to look at more than one line in the pattern space. This allows you to match patterns that extend over multiple lines. The three multiline commands (N,D,P) all correspond to lowercase basic commands (n,d,p).

Append Next Line:N
The multiline Next (N) command creates a multiline pattern space by reading a new line of input and appending it to the contents of the pattern space. The original contents of pattern space and the new input line are separated by a newline. The embedded newline character can be matched in patterns by the escape sequence “\n”. In a multiline pattern space, the metacharacter “ˆ” matches the very first character of the pattern space, and not the character(s) following any embedded newline(s). Similarly, “$” matches only the final newline in the pattern space, and not any embedded newline(s). After the Next command is executed, control is then passed to subsequent commands in the script.
The Next command differs from the next command, which outputs the contents of the pattern space and then reads a new line of input. The next command does not create a multiline pattern space.
s/Owner and Operator Guide/Installation Guide/
/Owner/{
N
s/ *\n/ /
s/Owner and Operator Guide */Installation Guide\
/
}

$!N excludes the last line ($) from the Next command.

Multiline Delete
The delete command (d) deletes the contents of the pattern space and causes a new line of input to be read with editing resuming at the top of the script. The Delete command (D) works slightly differently: it deletes a portion of the pattern space, up to the first embedded newline. It does not cause a new line of input to be read; instead, it returns to the top of the script, applying these instructions to what remains in the pattern space.

# reduce multiple blank lines to one; version using d command
/^$/{
N
/^\n$/d
}
When a blank line is encountered, the next line is appended to the pattern space. Then we try to match the embedded newline. Note that the positional metacharacters, ˆ and $, match the beginning and the end of the pattern space, respectively.

Where ther e was an even number of blank lines, all the blank lines were removed. Only when there was an odd number was a single blank line preserved. That is because the delete command clears the entire pattern space. Once the first blank line is encountered, the next line is read in, and both are deleted. If a third blank line is encountered, and the next line is not blank, the delete command is not applied, and thus a blank line is output. If we use the multiline Delete command (D rather than d), we get the result we want:
/ˆ$/{
N
/ˆ\n$/D
}

Multiline Print:(P)
The multiline Print command outputs the first portion of a multiline pattern space, up to the first embedded newline.
The Print command(P) frequently appears after the Next command and before the Delete command. These three commands can set up an input/output loop that maintains a two-line pattern space yet outputs only one line at a time. The purpose of this loop is to output only the first line in the pattern space, then return to the top of the script to apply all commands to what had been the second line in the pattern space.
/UNIX$/{
N
/\nSystem/{
s// Operating &/
P
D
}
}

Examples
# Scribe font change script.
s/@f1(\([^)]*\))/\\fB\1\\fR/g
/@f1(.*/{
N
s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
P
D
}
This can be translated as: Once making a substitution across two lines, print the first line and then delete it from the pattern space. With the second portion remaining in the pattern space, control passes to the top of the script where we see if there is an “@f1(” remaining on the line.

Hold That Line(hold space)
The pattern space is a buffer that contains the current input line. There is also a set-aside buffer called the hold space. The contents of the pattern space can be copied to the hold space and the contents of the hold space can be copied to the patter n space. A group of commands allows you to move data between the hold space and the pattern space. The hold space is used for temporary storage.
The most frequent use of the hold space is to have it retain a duplicate of the current input line while you change the original in the pattern space.
Command Abbreviation Function
Hold h or H Copy or append contents of pattern space to hold space.
Get g or G Copy or append contents of hold space to pattern space.
Exchange x Swap contents of hold space and pattern space.

Each of these commands can take an address that specifies a single line or a range of lines.
# Reverse flip
/1/{
h
d
}
/2/{
G
}
The hold command followed by the delete command is a fairly common pairing. Without the delete command, control would reach the bottom of the script and the contents of the pattern space would be output.

A Capital Transformation
# capitalize statement names
/the .* statement/{
h
s/.*the \(.*\) statement.*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/\(.*\)\n\(.*the \).*\( statement.*\)/\2\1\3/
}

Building Blocks of Text
The hold space can be used to collect a block of lines before outputting them.
/ˆ$/!{
H
d
}
/ˆ$/{
x
s/ˆ\n/<p>/
s/$/<\/p>/
G
}

Examples:
sed -n '10,20p' access.log

Resources
sed & awk, 2nd Edition

Read full article from The Basics of Using the Sed Stream Editor to Manipulate Text in Linux | DigitalOcean

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts