Thursday, March 15, 2018

Linux Troubleshooting



https://www.linux.com/news/discover-possibilities-proc-directory
/proc/meminfo 
https://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
/proc is very special in that it is also a virtual filesystem. It's sometimes referred to as a process information pseudo-file system. It doesn't contain 'real' files but runtime system information (e.g. system memory, devices mounted, hardware configuration, etc). For this reason it can be regarded as a control and information centre for the kernel. In fact, quite a lot of system utilities are simply calls to files in this directory. For example, 'lsmod' is the same as 'cat /proc/modules' while 'lspci' is a synonym for 'cat /proc/pci'. By altering files located in this directory you can even read/change kernel parameters (sysctl) while the system is running.
/proc/PID/cmdline
Command line arguments.
/proc/PID/cpu
Current and last cpu in which it was executed.
/proc/PID/cwd
Link to the current working directory.
/proc/PID/environ
Values of environment variables.
/proc/PID/exe
Link to the executable of this process.
/proc/PID/fd
Directory, which contains all file descriptors.
/proc/PID/maps
Memory maps to executables and library files.
/proc/PID/mem
Memory held by this process.
/proc/PID/root
Link to the root directory of this process.
/proc/PID/stat
Process status.
/proc/PID/statm
Process memory status information.
/proc/PID/status
Process status in human readable form.

https://www.cyberciti.biz/faq/linux-system-memory-utilization/
To update every 5 seconds, enter:
# free -s 5
Pass the -c option to update [count] times i.e. update free command every 2 seconds 3 times, enter:
# free -s 2 -c 3

Understanding vmstat memory options
  • swpd: the amount of virtual memory used.
  • free: the amount of idle memory.
  • buff: the amount of memory used as buffers.
  • cache: the amount of memory used as cache.
  • inact: the amount of inactive memory (see -a option).
  • active: the amount of active memory (see -a option).
$ vmstat -a
Pass the -s option to vmstat command displays a table of various event counters and memory statistics:
# vmstat -s

https://unix.stackexchange.com/questions/352516/should-i-be-concerned-my-free-memory-is-so-low-or-is-the-free-memory-in-buffers
Should I be worried my free memory is only 46M or is the -/+ buffers/cache row value that says 351M free also available for whatever?
             total       used       free     shared    buffers     cached         
Mem:          594M       548M        46M        76M        28M  277M
-/+ buffers/cache:       242M       351M
Swap:           0B         0B         0B

The -/+ buffers/cache indicate the size of RAM that is dedicated directly for read/write by all the process of running applications.
When you run free with -m flag, -/+ buffers/cache is the most important row to look at. In your case, it doesn't mean that (351+46)Mb is your total free memory but is a way to visualize that 242 Mb has been used by processes and 351Mb of buffers/cache in RAM is dedicatedly free for other application to use.
Linux always tries to use RAM to speed up disk operations by using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices). It may be noted that if a system has been running for a while, a small number can be seen under the freecolumn of the mem row.
https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
Under Linux, the number of megabytes of main memory currently used for the page cache is indicated in the Cached column of the report produced by the free -m command.
[root@testserver ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         15976      15195        781          0        167       9153
-/+ buffers/cache:       5874      10102
Swap:         2000          0       1999
The following example shows output of the cat /proc/meminfo command on a node of a running job. The output reports that 60,023,544 KB (~60 GB) of the node's memory is used by page cache:
% cat /proc/meminfo 
MemTotal:      132007564 kB 
MemFree:          605680 kB 
Buffers:               0 kB 
Cached:        60023544 kB 
The next example shows output from the top -b -n1 | head command, run on the same node as in the previous example. The output reports that the node has 128,913 MB total memory, that 128,314 MB of the total memory is used, and that 59,220 MB (~59 GB) of the used memory is occupied by page cache:
% top -b -n1 | head
top - 15:04:00 up 3 days, 16:22,  2 users,  load average: 1.29, 1.23, 1.38
Tasks: 546 total,   2 running, 543 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.5%us,  2.7%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    128913M total,   128314M used,      598M free,        0M buffers
Swap:         0M total,        0M used,        0M free,    59220M cached
https://unix.stackexchange.com/questions/20784/how-can-i-resolve-a-hostname-to-an-ip-address-in-a-bash-script
host ip_address or hostname

dig +short unix.stackexchange.com
If dig +short is unavailable, any one of the following should work. All of these query DNS directly and ignore other means of resolution:
host unix.stackexchange.com | awk '/has address/ { print $4 }'
nslookup unix.stackexchange.com | awk '/^Address: / { print $2 }'
dig unix.stackexchange.com | awk '/^;; ANSWER SECTION:$/ { getline ; print $5 }'

ps show full command line
https://superuser.com/questions/486648/full-command-text-with-unix-ps

I found this on my FreeBSD's 9 ps man page:
-w Use 132 columns to display information, instead of the default which is your window size. If the -w option is specified more than once, ps will use as many columns as necessary without regard for your window size. Note that this option has no effect if the “command” column is not the last column displayed.
So:
ps auxww

https://unix.stackexchange.com/questions/229541/view-full-commands-in-ps-output
ps auxfww

https://unix.stackexchange.com/questions/128953/how-to-display-top-results-sorted-by-memory-usage-in-real-time
Use quick tip using top command in Linux/Unix
top
hit Shift+F, then choose the display to order by memory usage by hitting key N (without Shift) then press Enter. You will see active process ordered by memory usage.
Or you can just press Shift+M after running the top command.
On OS X 10.10 the command top -o MEM seems to work.
ps aux --sort '%mem'
ps aux --sort '%cpu'
https://askubuntu.com/questions/775851/how-can-i-select-default-version-of-netcat-in-ubuntu
sudo apt-get update
sudo apt-get install netcat
https://serverfault.com/questions/789442/how-can-you-distinguish-between-a-crash-and-a-reboot-on-rhel7

(1) auditd logs

auditd is amazing. You can see all the different events that it logs by checking ausearch -m. Apropos to the problem at hand, it logs system shutdown and system boot, so you can use the command ausearch -i -m system_boot,system_shutdown | tail -4. If this reports a SYSTEM_SHUTDOWN followed by a SYSTEM_BOOT, all is well; however, if it reports 2 SYSTEM_BOOT lines in a row, then clearly the system did not shutdown gracefully, as in the following example:
[root@a72 ~]# ausearch -i -m system_boot,system_shutdown | tail -4
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:10:32.392:7) : pid=657 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:11:41.134:7) : pid=656 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 

(2) last -x

Same as above, but with the simple last -n2 -x shutdown reboot command. Example where system crashed:
[root@a72 ~]# last -n2 -x shutdown reboot
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:11 - 01:20  (00:08)    
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:10 - 01:20  (00:09)    
Or where system had a graceful reboot:
[root@a72 ~]# last -n2 -x shutdown reboot
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:21 - 01:21  (00:00)    
shutdown system down  3.10.0-327.el7.x Tue Sep 20 01:21 - 01:21  (00:00)    



https://www.mkyong.com/linux/how-to-check-reboots-history-in-linux/
last reboot
last shutdown
https://serverfault.com/questions/352201/what-log-contains-server-crashes
Sadly, probably none of them. When there's a kernel panic, there's no logging subsystem left to write logs to, and no file handles to handle them.
This is the log files I've looked through:
  • /var/log/messages
  • /var/log/syslog
  • /var/log/debug
  • /var/log/kern.log
/var/log/messages






Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts