Play With Linux

1 Terminal

1.1 Ubuntu Could not get lock

解决方案:

1
2
3
4
5
sudo rm /var/lib/apt/lists/lock
sudo rm /var/cache/apt/archives/lock
sudo rm /var/lib/dpkg/lock
sudo apt clean
sudo apt update

(2) 另外一种常见的问题是在执行 sudo apt update 的时候遇见的 Hash Sum mismatch 问题:

其大多数情况都是因为添加了其他 PPA 源导致的:

通用做法一般是去除掉这个 PPA 源,另外一种是手动从源代码编译来安装

GPG error

1
E: GPG error:h ttp://cn.archive.ubuntu.com trusty InRelease: Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)

软件更新源

1
/etc/apt/sources.list

在为 deepin 系统安装 mysql-server 的时候,执行:

1
2
sudo apt-get update
sudo apt-get install mysql-server

卡在了 connecting... 界面,于是把 Ubuntu 的源覆盖了 deepin 的默认源,开始工作了,但是最后报 broken package 错误… 其原因就在于正确的解决方法应该是寻找适合 deepin 系统的源 (比如 清华大学, 163 等源), 而不是直接拷贝 Ubuntu 的源


在安装 Python 一些依赖的时候,遇见了一些典型问题:

1
Cannot compile 'Python.h' ...

尝试安装按照网络上教程安装 python-dev 依赖的时候又提示:

1
2
3
The following packages have unmet dependencies:
python-dev : Depends: python2.7-dev (>= 2.7.3) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

接着一顿乱试:

1
2
3
4
sudo apt-get clean
sudo apt-get autoclean
sudo apt-get -f install
...

最终换了一个阿里源搞定了。

1
2
3
4
5
6
7
8
9
10
deb http://mirrors.aliyun.com/ubuntu/ trusty main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-backports main restricted universe multiverse

注: 上述问题发生在 Ubuntu 14.04 系统上

换源,虽然能够修复一部分问题,但是也有可能出现如下情况,软件包不一致的问题:

1
2
mysql-server : 
Depends: mysql-server-5.5 but it is not going to be installed

再重新切换回原来的源就又好了

1.2 解压缩

1
unrar e 文件.rar 解压之后保存的路径

1.3 登录文件

每次启动 terminal 都会执行一遍文件:

1
~/.bashrc

安装 Java

Shell

Flow Control

(1) If 语句

1
2
3
4
if [ -r /bin/netstat ]; then
/bin/netstat -an > $DATE_DIR/netstat.dump 2>&1
echo -e ".\c"
fi

if … else … 句式:

1
2
3
4
5
6
7
8
9
10
if [ "$1" = "restart" ]; then
./restart.sh
else
if [ "$1" = "dump" ]; then
./dump.sh
else
echo "ERROR: Please input argument: start or stop or debug or restart or dump"
exit 1
fi
fi

(2) For 语句

1
2
3
4
5
echo -e "Dumping the $SERVER_NAME ...\c"
for PID in $PIDS ; do
jstack $PID > $DATE_DIR/jstack-$PID.dump 2>&1
echo -e ".\c"
done

(3) Switch 语句

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function main() {
case $1 in
'scp')
scp_to_server
;;
'ssh')
ssh_to_server
;;
*)
echo 'Usage: ./deploy [scp | ssh]'
;;
esac
}

main $1

(4) Switch 语句多匹配

1
2
3
4
5
6
7
8
9
case $subdomain in
'sv2'|'sv3'|'sv4'|'sv5')
echo 'system'
;;
*)
echo 'unknown'
exit 1
;;
esac

String

语句 说明
[[ $str1 == $str2 ]] 判断两个字符串是否相同
[[ $str1 != $str2 ]] 判断两个字符串是否不同
[[ -z $str1 ]] 判断 $str1 是否是一个空字符串
[[ -n $str1 ]] 判断 $str1 是否是一个非空字符串
[ $? -eq 0 ] 判断上一个调用的函数是否正确返回

Sed - a stream editor

1
/pattern/action
  • pattern: 正则表达式
  • action: p, d, s/pattern1/pattern2/

(1) 行的地址与范围:

命令 说明
4,10d 删除第 4 到第 10 行
4,+2d 删除 4,5,6 行
2,5!d 删除除了第 2 到第 5 行外的所有的行
1~3d 删除第 1,4,7,10 行
4,10p 打印第 4 到第 10 行
4,d 语法错误
,10d 语法错误

(1) 查找并替换:

1
sed -i -e 's/few/asd/g' hello.txt

查找替换带有星号字符的文件:

1
sed -i -e 's/few\*/asd\*/g' hello.txt

使用一个不同的分隔符:

1
sed -i -e 's:few:asd:g' hello.txt

使用空字符串替换:

1
cat /etc/passed | sed 's/root//g'

只替换某一行:

1
cat /etc/passwd | sed '10s/sh/quiet/g'

替换 1 到 5 行:

1
cat /etc/passwd | sed '1,5s/sh/quiet/g'
  • -i[SUFFIX]: 对文件 hello.txt edit in place
  • –in-place[=SUFFIX]: 同上
  • -e script: 指明 expression/command to run
  • –expression=script: 同上
  • -E:
  • -r:
  • –regexp-extended: 使用扩展正则表达式
  • s/: 表示替换
  • /g: 替换整行

(2) 打印指定行:

1
sed -n '45p' file.txt
  • -n: 抑制输出,默认情况下 Sed 会打印所有的输入文件的内容
  • p: 打印指定的行

(3) 删除指定范围的行:

删除指定范围的行,然后把未删除的内容写入到 output.txt 文件中去:

1
sed '30,35d' input.txt > output.txt
1
2
3
4
5
6
task_name=$1
main_class=$2
rp_task_name_cmd='s_<artifactId>task</artifactId>_<artifactId>'$task_name'</artifactId>_g'
rp_main_class_cmd='s_<mainClass>.*</mainClass>_<mainClass>'$main_class'</mainClass>_g'
sed -i $rp_task_name_cmd $POM_FILE
sed -i $rp_main_class_cmd $POM_FILE

文件系统

命令 描述
[ -f $file_var ] 测试这个变量是否是一个正则表达式文件路径或文件名
[ -e $var ] 测试是否是一个可执行的文件路径
[ -d $var ] 测试是否是一个目录路径 (可以用来测试文件夹是否存在)
[ -e $var ] 文件是否存在
[ -c $var ] 文件是否是一个 character device 文件
[ -b $var ] 文件是否是一个 block device 文件
[ -w $var ] 文件路径是否可写
[ -r $var ] 文件路径是否可读
[ -L $var ] 文件是否是一个 symlink 路径
1
2
3
4
5
DUMP_DATE=`date +%Y%m%d%H%M%S`
DATE_DIR=$DUMP_DIR/$DUMP_DATE
if [ ! -d $DATE_DIR ]; then
mkdir $DATE_DIR
fi

date

1
2
$ date +%Y%m%d%H%M%S
20170701165459

echo

1
echo -e "Dumping the $SERVER_NAME ...\c"
  • -e: Enable echo’s interpretation of additional instances of the newline character as well as the interpretation of other special characters, such as a horizontal tab, which is represented by \t

tr - translate or delete characters

1
tr [OPTION]... SET1 [SET2]
  • -d:
  • –delete: 删除在 SET1 中的字符
  • -s:
  • –squeeze-repeats: 挤压字符

(1) 删除字符 t

1
echo "the geek stuff" | tr -d 't'

Text files created on DOS/Windows machines have different line endings than files created on Unix/Linux. DOS uses carriage return and line feed (“\r\n”) as a line ending, which Unix uses just line feed (“\n”). 删除 \r 字符:

1
tr -d '\r'

删除空格, Tab 键产生的空格等用这个 tr 命令就行了,AWK 正则匹配很不靠谱:

1
tr -d ' \t'

(2) 小写转成大写,必须手动输入需要转换的字符,然后回车才会显示

1
2
3
$ tr [:lower:] [:upper:]
thegeekstuff
THEGEEKSTUFF

(3) 保留一个空格,删除其他空格

1
echo "This  is  for testing" | tr -s [:space:] ' '

2.8 文件描述符和重定向

将标准错误转为标准输出:

1
kill $PID > /dev/null 2>&1

2.9 文件参数

  • $0: 文件名
  • $1: 参数1
  • $2: 参数2
  • $@: 文件名 参数1 参数2 …
  • $*: 文件名 “参数1 参数2 …”
  • $#: 代表传递的参数个数
1
2
# alibaba druid 项目的脚本文件 druidStat.sh
"$JAVA_HOME/bin/java" -Dfile.encoding="UTF-8" -cp "./druid-0.2.6.jar:$JAVA_HOME/lib/tools.jar" com.alibaba.druid.support.console.DruidStat $@

检查参数合法性:

1
2
3
if [$# -ne 1]; 
then echo "illegal number of parameters"
fi

传递参数:

1
./deploy.sh test -y 2017 -m 6

现在我想在 deploy.sh 脚本中获取到 -y 2017 -m 6 这个参数:

1
2
shift
echo $@

输出的就是 -y 2017 -m 6 这个参数

数字比较

1
2
[ $var -eq 0 ] # It returns true when $var equal to 0.
[ $var -ne 0 ] # It returns true when $var not equals 0
  • -gt: Greater than
  • -lt: Less than
  • -ge: Greater than or equal to
  • -le: Less than or equal to

数学计算

The let command can be used to perform basic operations directly:

1
2
3
4
5
#求和
let result=no1+no2
echo $result
#自增
let no1++

使用 [] 可以达到和 let 相同的目的:

1
result=$[ no1 + no2 ]

使用 $ 前缀是可以的:

1
result=$[ $no1 + 5 ]

也可以使用 (())

1
result=$(( no1 + 50 ))

expr can also be used for basic operations:

1
2
result=`expr 3 + 4`
result=$(expr $no1 + 5)

All of the above methods do not support floating point numbers, and operate on 整数 only.

if 多个条件

1
2
3
4
5
# AND -a 和
[ $var1 -ne 0 -a $var2 -gt 2 ]

# OR -r 或
[ $var1 -ne 0 -o $var2 -gt 2 ]

文件遍历

1
2
3
4
for entry in "$search_dir"/*
do
echo "$entry"
done

函数返回值

Bash 脚本有三种方式返回值:

  • echo 一个字符串
  • return 一个状态码
  • 全局共享变量

脚本兼容性

  • 定义函数带有 function 关键字的话,会使得脚本兼容性更低

xxd - make a hexdump or do the reverse

1
2
3
4
// 二进制
xxd -b file
// 十六进制
xxd file

od - dump files in octal and other formats

1
2
3
4
// ASCII 编码方式打印
od -c file
// 十六进制
od -cx file

grep

  • -v:
  • –invert-match: 反匹配
1
2
3
4
# Redis 查询输出缓冲区不为 0 的客户端连接
echo 'CLIENT LIST' | redis-cli | grep -v 'omem=0'
# 只在指定类型的文件中进行搜索
grep 'void' -Rn . --include \*.java --include \*.c

Basic vs Extended Regular Expressions:

In basic regular expressions the meta-characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ 失去 their special meaning; instead use the 反斜线版本 ‘\?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’.

Traditional egrep did not support the ‘{’ meta-character, and some egrep implementations support ‘{’ instead, so portable scripts should avoid ‘{’ in ‘grep -E’ patterns and should use ‘[{]’ to match a literal ‘{’.


默认情况下 . 是匹配单个字符的,* 是匹配多个字符的,这两个是默认生效的。所以下面这行命令对 abcdefab.def 都能匹配上。

1
grep 'ab.def' -Rn .

如果就想要匹配一个点,那么需要转义:

1
grep 'ab\.def' -Rn .

默认情况下,grep 是区分大小写的,可以使用 -i 来不区分大小写:

1
grep "ab.def" -Rni .

grep OR 匹配:

1
grep -e 'select' -e 'insert' aaa.log

curl

curl 命令执行之前一定要 unset http_proxy,尤其是 curl http://localhost:8080

(1) curl - 加上头

1
curl -H 'Accept:application/json' http://localhost:8080

(2) curl - 只获取服务器返回的头

1
curl -I http://localhost:8080

(3) curl - 进行 POST 请求

1
curl -X POST -d '{"username":"xyz","password":"123"}' http://localhost/8080/api/login
  • -d: 暗含了执行 POST 请求,所以 -X 可以不加

(4) curl - 进行 https 请求

1
curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol

暂时不知道如何解决

wc

1
2
3
4
5
6
7
# 统计总行数 (A 文件行 + B 文件行 + ...)
# 而且,子目录也会被搜索
find . -name '*.java' | xargs wc -l
# 排除子目录
find . -name '*.java' -maxdepth 1
# 统计文件数
find . -name '*.java' | awk '{ print } END { print NR }'

大文件传输

1
rsync all.sql zk@10.108.113.85:~/Desktop/

apt-get

删除文件:

1
2
dpkg -l | grep sogoupinyin
sudo apt remove --purge sogoupinyin

netstat

There’s a few parameters to netstat that are useful for how to check the opened/closed port on my computer:

  • -l or --listening shows only the sockets currently listening for incoming connection.
  • -a or --all shows all sockets currently in use.
  • -t or --tcp shows the tcp sockets.
  • -u or --udp shows the udp sockets.
  • -n or --numeric shows the hosts and ports as numbers, instead of resolving in dns and looking in /etc/services.
  • -p or --program Show the PID and name of the program to which each socket belongs.
  • -e, --extend Display additional information. Use this option twice for maximum detail.
1
sudo netstat -peanut | grep 3306

磁盘管理

(1) 查看磁盘剩余空间

1
df -h

(2) 查看目录或文件占用多大空间

1
du -h target/

SSH 免密登录

服务器端:

1
2
3
4
5
cd .ssh
ssh-keygen -t rsa (hit return through prompts)
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
rm id_rsa.pub

客户端:

把自己的 ./ssh/id_rsa.pub 文件里面的所有内容 都拷贝到服务器端的 authorized_keys 文件中去。

AWK

1
awk '$2 ~ /^org.*/ {print $2}' .java_process

对如下 combined_log 日志文件进行分析:

1
2
66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"
66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET / HTTP/1.0" 200 6433 "-" "Googlebot/2.1"

打印所有 IP:

1
awk '{print $1}' combined_log

AWK 默认以空格为分隔符,现在我们修改为 ",打印所有请求行:

1
awk -F\" '{print $2}' combined_log

按照 User Agent 出现的次数降序排序:

1
awk -F\" '{print $6}' combined_log | sort | uniq -c | sort -fr
  • sort -r: reverse 的意思,降序排序
  • sort -f: --ignore-case 的意思,忽略大小写
  • uniq -c: --count,在每行之前显示这行出现的次数

参考: System: Analyzing Apache Log Files

Bash 字符串

(1) 字符串连接

注意字符串连接的时候,几个字符串之间没有空格:

1
2
AWK_PROGRAM='$2 ~ /^'$PROCESS_NAME'.*/ { print $1 }'
echo `awk "$AWK_PROGRAM" .java_process`

cp

cp 的时候一定要注意目标地址的文件夹是否存在的问题:

1
2
3
4
# auto_creat_folder 一开始不存在,auto_creat_folder 将会自动被创建,目录结构变为: auto_creat_folder/**
cp -r ../resources/ auto_creat_folder/
# exist_folder 已经存在了, 这个时候,目录结构会变为 exist_folder/resources/**
cp -r ../resources exist_folder/

脚本启动的问题

这样写是不行的,下一个命令会等待上一个命令执行结束:

1
2
3
4
5
# ./start.sh

java -jar a.jar
java -jar b.jar
java -jar c.jar

所以应该都添加 & 来确保其一块启动起来:

1
2
3
4
5
# ./start.sh

java -jar a.jar &
java -jar b.jar &
java -jar c.jar &

kill vs kill -9

kill -9 Meaning: The process will be killed by the kernel; this signal cannot be ignored. 9 means KILL signal that is not catchable or ignorable

Uses: SIGKILL singal

Kill Meaning: The kill command without any signal passes the signal 15, which terminates the process the normal way.

Uses: SIGTERM signal, which can be handled by programmers

/etc/init/ vs /etc/init.d/

/etc/init.d contains 脚本 used by the System V init tools (SysVinit). This is the traditional service management package for Linux, containing the 初始化 program (the first process that is run when the kernel has finished initializing¹) as well as some infrastructure to start and stop services and configure them. Specifically, files in /etc/init.d are shell scripts that respond to start, stop, restart, and (when supported) reload commands to manage a particular service. These scripts can be invoked directly or (most commonly) via some other trigger (typically the presence of a symbolic link in /etc/rc?.d/).

/etc/init contains configuration files used by Upstart. Upstart is a young service management package championed by Ubuntu. Files in /etc/init are configuration files telling Upstart how and when to start, stop, reload the configuration, or query the status of a service. As of lucid, Ubuntu is transitioning from SysVinit to Upstart, which explains why many services come with SysVinit scripts even though Upstart configuration files are preferred. In fact, the SysVinit scripts are processed by a compatibility layer in Upstart.

.d in directory names typically indicates a 目录 containing many configuration files or scripts for a particular situation (e.g. /etc/apt/sources.list.d contains files that are concatenated to make a virtual sources.list; /etc/network/if-up.d contains scripts that are executed when a network interface is activated). This structure is usually used when each entry in the directory is provided by a different source, so that each package can deposit its own plug-in without having to parse a single configuration file to reference itself. In this case, it just happens that “init” is a logical name for the directory, SysVinit 来的比较早 and used init.d, and Upstart used plain init for a directory with 相同的目的 (it would have been more “mainstream”, and perhaps less arrogant, if they’d used /etc/upstart.d instead).

参考: What is the difference between /etc/init/ and /etc/init.d/?

查看所有用户

1
2
3
cat /etc/passwd
# 或者
awk -F':' '{ print $1}' /etc/passwd

nohup

nohup is a POSIX command to ignore the HUP (hangup) signal. The HUP signal is, by convention, the way a terminal warns dependent processes of logout.

1
2
$ nohup abcd &
$ exit

Note that nohupping backgrounded jobs is typically used to avoid terminating them when logging off from a remote SSH session. A different issue that often arises in this situation is that ssh is 拒绝退出 (“hangs”), since it refuses to lose any data from/to the background job(s). This problem can also be overcome by 重定向 all three I/O streams:

1
$ nohup ./myprogram > foo.out 2> foo.err < /dev/null &

Cron

The software utility Cron is a time-based job scheduler in Unix-like computer operating systems. People who set up and maintain software environments use cron to 调度任务(命令或者脚本) to run periodically at fixed times, dates, or intervals.

Cron is driven by a crontab (cron table) file, a configuration file that specifies shell commands to run periodically on a given schedule. The crontab files are stored where the lists of jobs and other instructions to the cron daemon are kept. Users can have their own individual crontab files and often there is a system-wide crontab file (usually in /etc or a subdirectory of /etc) that only system administrators can edit.

Each line of a crontab file represents a job, and looks like this:

1
2
3
4
5
6
7
8
9
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday;
│ │ │ │ │ 7 is also Sunday on some systems)
│ │ │ │ │
│ │ │ │ │
* * * * * command to execute

一些示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 每天 00:01 分清空一下这个日志文件的内容
1 0 * * * printf > /var/log/apache/error_log

# 每周六 23:45 分调用一下 dump.sh 这个脚本
45 23 * * 6 /home/oracle/scripts/export_dump.sh

# 每小时的第一分钟开始执行
@hourly /scripts/script.sh

# 每天的第一分钟开始执行
@daily /scripts/script.sh

# 每周的第一分钟开始执行
@weekly /bin/script.sh

# 没月的第一分钟开始执行
@monthly /scripts/script.sh

# 每年的第一分钟开始执行
@yearly /scripts/script.sh

# 重启执行
@reboot /scripts/script.sh

编辑文件:

1
2
3
4
5
6
7
8
# 打开 crontab 文件用当前用户进行编辑/添加
crontab -e

# 可能需要使用 sudo 权限
sudo crontab -e

# 列举任务
crontab -l

每分钟运行一次脚本:

1
* * * * * script.sh

每 10 分钟运行一次脚本:

1
*/10 * * * * script.sh

查看 crontab 的日志:

1
tail -f /var/log/syslog

This happens because your cron jobs are producing output and then the cron daemon tries to 发送邮件 that output to you. If you don’t need that output, the easiest way to solve this is to 丢弃 it at the crontab:

1
* * * * * yourCommand >/dev/null 2>&1

cron 环境的问题:

Cron provides only this environment by default :

  • HOME user’s home directory
  • LOGNAME user’s login
  • PATH=/usr/bin:/usr/sbin
  • SHELL=/usr/bin/sh

If you need more you can source a script where you define your environment before the scheduling table in the crontab.

或者在自己写的脚本里面,指明路径:

1
2
3
4
#!/bin/bash
export JAVA_HOME=/path/to/jdk

some-other-command

top 命令在 cron 里面没有输出的问题:

1
2
3
4
# 应该加上 -b 参数
# :Batch-mode operation
# Starts top in Batch mode, which could be useful for sending output from top to other programs or to a file.
top -b

ls

几个重要参数

  • -t: 根据修改时间排序,最新的排在最前面
  • -r: 将排序的结果反过来

ps -efps -aux

  • ps -ef:

  • ps -aux:

add-apt-repository

平常如果需要安装东西的时候,比如 jdk8,那么可能需要使用 add-apt-repository 这个命令,敲击回车后,如果提示:

1
2
spider@hadoop-master:~$ sudo add-apt-repository ppa:webupd8team/java
sudo: add-apt-repository: command not found

那么请执行如下命令:

1
spider@hadoop-master:~$ sudo apt-get install software-properties-common python-software-properties

如果发现需要配置代理:

1
2
3
export http_proxy=http://<proxy>:<port>
export https_proxy=http://<proxy>:<port>
sudo -E add-apt-repository ppa:linaro-maintainers/toolchain
  • -E: 保留环境变量

启动项

1
sudo vi /etc/default/grub

然后把 GRUB_DEFAULT=0 后面的数字修改为任何想要的数字:

最后不要忘记执行:

1
sudo update-grub

如何查看 Linux 内核版本

1
uname -r

Linux 命令中考没考虑过管道大小,管道满了怎么办

Pipe 容量:

A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail 阻塞或者失败, depending on whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.

In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a system with a page size of 4096 bytes). Since Linux 2.6.35, the default pipe capacity is 16 pages, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations. See fcntl(2) for more information. The following ioctl(2) operation, which can be applied to a file descriptor that refers to either end of a pipe, places a count of the number of unread bytes in the pipe in the int buffer pointed to by the final argument of the call:

1
ioctl(fd, FIONREAD, &nbytes);

The FIONREAD operation is not specified in any standard, but is provided on many implementations.


参考:

Linux 零拷贝

“Zero-copy” describes computer operations in which the CPU does not perform the task of copying data from one memory area to another. This is frequently used to save CPU cycles and memory bandwidth when transmitting a file over a network. Also, zero-copy operations reduce the number of time-consuming mode switches between user space and kernel space. 减少用户空间和内核空间的切换


不用 Zero Copy 的问题所在:

1
2
read(file, tmp_buf, len);
write(socket, tmp_buf, len);

Looks simple enough; you would think there is not much overhead with only those two system calls. In reality, this couldn’t be further from the truth. Behind those two calls, the data has been copied at least four times 数据至少拷贝 4 次, and almost as many user/kernel context switches have been performed. (Actually this process is much more complicated, but I wanted to keep it simple). To get a better idea of the process involved, take a look at Figure 1. The top side shows context switches, and the bottom side shows copy operations.

1) 第一步: the read system call causes a context switch 上下文切换 from user mode to kernel mode. The first copy is performed by the DMA engine, which reads file contents from the disk and stores them into a kernel address space buffer DMA 从磁盘拷贝到内核缓冲区.

2) 第二步: data is copied from the kernel buffer into the user buffer 数据从内核缓冲区拷贝到用户缓冲区, and the read system call returns. The return from the call caused a context switch from kernel back to user mode. Now the data is stored in the user address space buffer, and it can begin its way down again.

3) 第三步: the write system call causes a context switch 上下文切换 from user mode to kernel mode. A third copy is performed to put the data into a kernel address space buffer again. This time, though, the data is put into a different buffer, a buffer that is associated with sockets specifically.

4) 第四步: the write system call returns, creating our fourth context switch. Independently and asynchronously, a fourth copy happens as the DMA engine DMA 引擎 passes the data from the kernel buffer to the protocol engine. You are probably asking yourself, “What do you mean independently and asynchronously? Wasn’t the data transmitted before the call returned?” Call return, in fact, doesn’t guarantee transmission; it doesn’t even guarantee the start of the transmission. It simply means the Ethernet driver had free descriptors in its queue and has accepted our data for transmission. There could be numerous packets queued before ours. Unless the driver/hardware implements priority rings or queues, data is transmitted on a first-in-first-out basis. (The forked DMA copy in Figure 1 illustrates the fact that the last copy can be delayed).

As you can see, a lot of data duplication is not really necessary 不是特别有必要 to hold things up. Some of the duplication could be eliminated to decrease overhead and increase performance. As a driver developer, I work with hardware that has some pretty advanced features. Some hardware can bypass the main memory altogether and transmit data directly to another device. This feature eliminates a copy in the system memory and is a nice thing to have, but not all hardware supports it. There is also the issue of the data from the disk having to be repackaged for the network, which introduces some complications. To eliminate overhead, we could start by eliminating some of the copying between the kernel and user buffers 取消一些内核和用户空间的拷贝.


One way to eliminate a copy is to 跳过 calling read and instead call mmap. For example:

1
2
tmp_buf = mmap(file, len);
write(socket, tmp_buf, len);

To get a better idea of the process involved, take a look at Figure 2. Context switches remain the same.

1) 第一步: the mmap system call causes the file contents to be copied into a kernel buffer by the DMA engine. The buffer is shared then with the user process 缓冲区同时被内核和用户区共享, without any copy being performed between the kernel and user memory spaces.

2) 第二步: the write system call causes the kernel to copy the data from the original kernel buffers into the kernel buffers associated with sockets.

3) 第三步: the third copy happens as the DMA engine passes the data from the kernel socket buffers to the protocol engine.

By using mmap instead of read, we’ve cut in half the amount of data the kernel has to copy. This yields reasonably good results when a lot of data is being transmitted. However, this improvement doesn’t come without a price; there are hidden pitfalls 缺点 when using the mmap+write method. You will fall into one of them when you memory map a file and then call write while another process 其他进程截断 truncates the same file. Your write system call will be interrupted by the bus error BUS 错误 signal SIGBUS, because you performed a bad memory access 坏内存访问. The default behavior for that signal is to kill the process and dump core—not the most desirable operation for a network server. There are two ways to get around this problem.

The first way is to install a signal handler for the SIGBUS signal, and then simply call return in the handler. By doing this the write system call returns with the number of bytes it wrote 已经写的字节数 before it got interrupted and the errno set to success. Let me point out that this would be a bad solution, one that treats the symptoms and not the cause of the problem. Because SIGBUS signals that something has gone seriously wrong with the process, I would discourage using this as a solution.

The second solution involves file leasing 文件租赁 (which is called “opportunistic locking” in Microsoft Windows) from the kernel. This is the correct way to fix this problem. By using leasing on the file descriptor, you take a lease 租约 with the kernel on a particular file. You then can request a read/write lease from the kernel. When another process tries to truncate the file you are transmitting, the kernel sends you a real-time signal, the RT_SIGNAL_LEASE signal. It tells you the kernel is breaking 毁约 your write or read lease on that file. Your write call is interrupted write 提前被打断 before your program accesses an invalid address and gets killed by the SIGBUS signal. The return value of the write call is the number of bytes written before the interruption, and the errno will be set to success. Here is some sample code that shows how to get a lease from the kernel:

1
2
3
4
5
6
7
8
9
if(fcntl(fd, F_SETSIG, RT_SIGNAL_LEASE) == -1) {
perror("kernel lease set signal");
return -1;
}
/* l_type can be F_RDLCK F_WRLCK */
if(fcntl(fd, F_SETLEASE, l_type)){
perror("kernel lease set type");
return -1;
}

You should get your lease before mmaping the file, and break your lease after you are done. This is achieved by calling fcntl F_SETLEASE with the lease type of F_UNLCK.


In kernel version 2.1, the sendfile system call was introduced to simplify the transmission of data 简化数据传输 over the network and between two local files. Introduction of sendfile not only reduces data copying, it also reduces context switches. Use it like this:

1
sendfile(socket, file, len);

To get a better idea of the process involved, take a look at Figure 3.

1) 第一步: the sendfile system call causes the file contents to be copied into a kernel buffer by the DMA engine. Then the data is copied by the kernel into the kernel buffer associated with sockets.

2) 第二步: the third copy happens as the DMA engine passes the data from the kernel socket buffers to the protocol engine.

You are probably wondering what happens if another process truncates the file we are transmitting with the sendfile system call. If we don’t register any signal handlers, the sendfile call simply returns with the number of bytes it transferred before it got interrupted, and the errno will be set to success.

If we get a lease from the kernel on the file before we call sendfile, however, the behavior and the return status are exactly the same. We also get the RT_SIGNAL_LEASE signal before the sendfile call returns.

So far, we have been able to avoid having the kernel make several copies, but we are still left with one copy. Can that be avoided too? Absolutely, with a little help from the hardware. To eliminate all the data duplication done by the kernel, we need a network interface that supports gather operations 网卡支持 gather 操作. This simply means that data awaiting transmission doesn’t need to be in consecutive memory 数据不用放在连续内存中; it can be scattered 可以被聚合 through various memory locations. In kernel version 2.4, the socket buffer descriptor was modified to accommodate those requirements—what is known as zero copy 零拷贝 under Linux. This approach not only reduces multiple context switches, it also eliminates data duplication done by the processor. For user-level applications nothing has changed, so the code still looks like this:

1
sendfile(socket, file, len);

To get a better idea of the process involved, take a look at Figure 4.

1) 第一步: the sendfile system call causes the file contents to be copied into a kernel buffer by the DMA engine.

2) 第二步: no data is copied into the socket buffer. Instead, only descriptors with information about the whereabouts and length of the data are appended to the socket buffer. The DMA engine passes data directly from the kernel buffer to the protocol engine DMA 直接将数据从内核区传参给协议引擎, thus eliminating the remaining final copy.

Because data still is actually copied from the disk to the memory and from the memory to the wire, some might argue this is not a true zero copy. This is zero copy from the operating system standpoint, though, because the data is not duplicated between kernel buffers. When using zero copy, other performance benefits can be had besides copy avoidance, such as fewer context switches 更少的上下文切换, less CPU data cache pollution 更少的 CPU 缓存污染 and no CPU checksum calculations CPU 不用计算校验.

Now that we know what zero copy is, let’s put theory into practice and write some code. You can download the full source code from www.xalien.org/articles/source/sfl-src.tgz. To unpack the source code, type tar -zxvf sfl-src.tgz at the prompt. To compile the code and create the random data file data.bin, run make.

Looking at the code starting with header files:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* sfl.c sendfile example program
Dragan Stancevic <
header name function / variable
-------------------------------------------------*/
#include <stdio.h> /* printf, perror */
#include <fcntl.h> /* open */
#include <unistd.h> /* close */
#include <errno.h> /* errno */
#include <string.h> /* memset */
#include <sys/socket.h> /* socket */
#include <netinet/in.h> /* sockaddr_in */
#include <sys/sendfile.h> /* sendfile */
#include <arpa/inet.h> /* inet_addr */
#define BUFF_SIZE (10*1024) /* size of the tmp
buffer */

Besides the regular <sys/socket.h> and <netinet/in.h> required for basic socket operation, we need a prototype definition of the sendfile system call. This can be found in the <sys/sendfile.h> server flag:

1
2
3
4
5
/* are we sending or receiving */
if(argv[1][0] == 's') is_server++;
/* open descriptors */
sd = socket(PF_INET, SOCK_STREAM, 0);
if(is_server) fd = open("data.bin", O_RDONLY);

The same program can act as either a server/sender or a client/receiver. We have to check one of the command-prompt parameters, and then set the flag is_server to run in sender mode. We also open a stream socket of the INET protocol family. As part of running in server mode we need some type of data to transmit to a client, so we open our data file. We are using the system call sendfile to transmit data, so we don’t have to read the actual contents of the file and store it in our program memory buffer. Here’s the server address:

1
2
3
4
5
6
/* clear the memory */
memset(&sa, 0, sizeof(struct sockaddr_in));
/* initialize structure */
sa.sin_family = PF_INET;
sa.sin_port = htons(1033);
sa.sin_addr.s_addr = inet_addr(argv[2]);

We clear the server address structure and assign the protocol family, port and IP address of the server. The address of the server is passed as a command-line parameter. The port number is hard coded to unassigned port 1033. This port number was chosen because it is above the port range requiring root access to the system.

Here is the server execution branch:

1
2
3
4
5
6
7
8
if(is_server){
int client; /* new client socket */
printf("Server binding to [%s]\n", argv[2]);
if(bind(sd, (struct sockaddr *)&sa,
sizeof(sa)) < 0){
perror("bind");
exit(errno);
}

As a server, we need to assign an address to our socket descriptor. This is achieved by the system call bind, which assigns the socket descriptor (sd) a server address (sa):

1
2
3
4
if(listen(sd,1) < 0){
perror("listen");
exit(errno);
}

Because we are using a stream socket, we have to advertise our willingness to accept incoming connections and set the connection queue size. I’ve set the backlog queue to 1, but it is common to set the backlog a bit higher for established connections waiting to be accepted. In older versions of the kernel, the backlog queue backlog 队列 was used to prevent syn flood attacks 防止 SYN 洪泛攻击. Because the system call listen changed to set parameters for only established connections, the backlog queue feature has been deprecated for this call. The kernel parameter tcp_max_syn_backlog has taken over the role of protecting the system from syn flood attacks:

1
2
3
4
if((client = accept(sd, NULL, NULL)) < 0){
perror("accept");
exit(errno);
}

The system call accept creates a new connected socket from the first connection request on the pending connections queue. The return value from the call is a descriptor for a newly created connection; the socket is now ready for read, write or poll/select system calls:

1
2
3
4
5
6
7
if((cnt = sendfile(client,fd,&off,
BUFF_SIZE)) < 0){
perror("sendfile");
exit(errno);
}
printf("Server sent %d bytes.\n", cnt);
close(client);

A connection is established on the client socket descriptor, so we can start transmitting data to the remote system. We do this by calling the sendfile system call, which is prototyped under Linux in the following manner:

1
2
3
extern ssize_t
sendfile (int __out_fd, int __in_fd, off_t *offset,
size_t __count) __THROW;

The first two parameters are file descriptors. The third parameter points to an offset from which sendfile should start sending data. The fourth parameter is the number of bytes we want to transmit. In order for the sendfile transmit to use zero-copy functionality, you need memory gather operation support from your networking card. You also need checksum capabilities for protocols that implement checksums, such as TCP or UDP. If your NIC is outdated and doesn’t support 如果不支持 those features, you still can use sendfile to transmit files. The difference is the kernel will merge the buffers 内核在传送之前合并缓冲区 before transmitting them.


Portability Issues 移植问题:

One of the problems with the sendfile system call, in general, is the lack of a standard implementation, as there is for the open system call. Sendfile implementations in Linux, Solaris or HP-UX are quite different. This poses a problem for developers who wish to use zero copy in their network data transmission code.

One of the implementation differences is Linux provides a sendfile that defines an interface for transmitting data between two file descriptors (file-to-file) and (file-to-socket). HP-UX and Solaris, on the other hand, can be used only for file-to-socket submissions.

The second difference is Linux doesn’t implement vectored transfers. Solaris sendfile and HP-UX sendfile have extra parameters that eliminate overhead associated with prepending headers to the data being transmitted.


Looking Ahead

The implementation of zero copy under Linux is far from finished and is likely to change in the near future. More functionality should be added. For example, the sendfile call doesn’t support vectored transfers 不支持矢量传输, and servers such as Samba and Apache have to use multiple sendfile calls with the TCP_CORK flag set. This flag tells the system more data is coming through in the next sendfile calls. TCP_CORK also is incompatible with TCP_NODELAY and is used when we want to prepend or append headers to the data. This is a perfect example of where a vectored call would eliminate the need for multiple sendfile calls 需要多个 sendfile 调用 and delays mandated by the current implementation.

One rather unpleasant limitation in the current sendfile is it cannot be used when transferring files greater than 2GB 不能超过 2GB. Files of such size are not all that uncommon today, and it’s rather disappointing having to duplicate all that data on its way out. Because both sendfile and mmap methods are unusable in this case, a sendfile64 would be really handy in a future kernel version.


编程访问:

The Linux kernel supports zero-copy through various system calls, such as sys/socket.h‘s sendfile, sendfile64, and splice. Some of them are specified in POSIX and thus also present in the BSD kernels or IBM AIX, some are unique to the Linux kernel API.

Java input streams can support zero-copy through the java.nio.channels.FileChannel‘s transferTo() method if the underlying operating system also supports zero copy.


参考:

Linux 的内核同步


参考:

写复制

Copy-on-write in 虚拟内存管理:

Copy-on-write finds its main use in 共享虚拟内存 of operating system processes, in the implementation of the fork 系统调用. Typically, the process does not modify any memory and immediately executes a new process, replacing the address space entirely. Thus, it would be wasteful to copy all of the process’s memory 拷贝所有内存是机器浪费的 during a fork, and instead the copy-on-write technique is used. It can be implemented efficiently using the page table 页表 by marking certain pages of memory as read-only and keeping a count of the number of references to the page. When data is written to these pages, the kernel intercepts the write attempt and allocates a new physical page 拦截写操作,并分配新的物理页, initialized with the copy-on-write data, although the allocation can be skipped if there is only one reference. The kernel then updates the page table with the new (writable) page, decrements the number of references, and performs the write. The new allocation ensures that a change in the memory of one process is not visible in another’s. The copy-on-write technique can be extended to support efficient memory allocation by having a page of physical memory filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space. The combined algorithm is similar to demand paging.

Copy-on-write pages are also used in the Linux kernel’s kernel same-page merging feature.

poll vs epoll vs select 区别和各自使用场景, epollselect 底层使用的数据结构

When designing a high performance networking application with 非阻塞 socket I/O, the architect needs to decide which polling method to use to monitor the events generated by those sockets. There are several such methods, and the use cases for each of them are different. Choosing the correct method may be critical to satisfy the application needs.

(1) Polling with select()

Old, trusted workforce from the times the sockets were still called Berkeley sockets. It didn’t make it into the first specification though since there were 还没有非阻塞 I/O 的概念 of non-blocking I/O at that moment, but it did make it around 八十年代, and nothing changed since that in its interface.

To use select, the developer needs to initialize and fill up several fd_set structures with the descriptors and the events to monitor, and then call select(). A typical workflow looks like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
fd_set fd_in, fd_out;
struct timeval tv;

// Reset the sets
FD_ZERO( &fd_in );
FD_ZERO( &fd_out );

// Monitor sock1 for input events
FD_SET( sock1, &fd_in );

// Monitor sock2 for output events
FD_SET( sock2, &fd_out );

// Find out which socket has the largest numeric value as select requires it
int largest_sock = sock1 > sock2 ? sock1 : sock2;

// Wait up to 10 seconds
tv.tv_sec = 10;
tv.tv_usec = 0;

// Call the select
int ret = select( largest_sock + 1, &fd_in, &fd_out, NULL, &tv );

// Check if select actually succeed
if ( ret == -1 )
// report error and abort
else if ( ret == 0 )
// timeout; no event detected
else
{
if ( FD_ISSET( sock1, &fd_in ) )
// input event on sock1

if ( FD_ISSET( sock2, &fd_out ) )
// output event on sock2
}

When the select interface was designed and developed, nobody probably expected there would be multi-threaded applications serving many thousands connections. Hence select carries quite a few 设计缺陷 which make it undesirable as a polling mechanism in the modern networking application. The major disadvantages include:

  • select modifies the passed fd_sets so 无法重用. Even if you don’t need to change anything – such as if one of descriptors received data and needs to receive more data – a whole set 整个 set has to be either recreated again (argh!) or restored from a backup copy via FD_COPY. And this has to be done each time the select is called.
  • To find out which descriptors raised the events you have to manually iterate 遍历 through all the descriptors in the set and call FD_ISSET on each one of them. When you have 2,000 of those descriptors and only one of them is active – and, likely, the last one – you’re wasting CPU cycles each time you wait.
  • Did I just mention 2,000 descriptors? Well, select cannot support that much 文件描述符受限. At least on Linux. The maximum number of the supported descriptors is defined by the FD_SETSIZE constant, which Linux happily defines as 1024. And while some operating systems allow you to hack this restriction by redefining the FD_SETSIZE before including the sys/select.h, this is not portable. Indeed, Linux would just ignore this hack and the limit will stay the same.
  • You cannot modify the descriptor set 无法修改描述符集 from a different thread 从不同线程中 while waiting. Suppose a thread is executing the code above. Now suppose you have a housekeeping thread which decided that sock1 has been waiting too long for the input data, and it is time to cut the cord. Since this socket could be reused to serve another paying working client, the housekeeping thread wants to close the socket. However the socket is in the fd_set which select is waiting for. Now what happens when this socket is closed? man select has the answer, and you won’t like it. The answer is, “If a file descriptor being monitored by select() is closed in another thread, the result is unspecified”.
  • Same problem arises if another thread suddenly decides to send something via sock1. It is not possible to start monitoring the socket for the output event until select returns.
  • The choice of the events to wait for is limited; 能检测的事件类型有限 for example, to detect whether the remote socket is closed you have to a) monitor it for input and b) actually attempt to read the data from socket to detect the closure (read will return 0). Which is fine if you want to read from this socket, but what if you’re sending a file and do not care about any input right now?
  • select puts extra burden on you when filling up the descriptor list to calculate the largest descriptor number and provide it as a function parameter.

Of course the operating system developers recognized those drawbacks and addressed most of them when designing the poll method. Therefore you may ask, is there is any reason to use select at all? 还有没有使用 select 的理由? Why don’t just store it in the shelf of the Computer Science Museum? Then you may be pleased to know that yes, there are two reasons 两个理由, which may be either very important to you or not important at all.

The first reason is portability 移植性. select has been around for ages, and you can be sure that every single platform around which has network support and nonblocking sockets will have a working select implementation while it might not have poll at all. And unfortunately I’m not talking about the tubes and ENIAC here; poll is only available on Windows Vista and above which includes Windows XP – still used by the whooping 34% of users as of Sep 2013 despite the Microsoft pressure. Another option would be to still use poll on those platforms and emulate it with select on those which do not have it; it is up to you whether you consider it reasonable investment.

The second reason is more exotic, and is related to the fact that select can – theoretically – handle the timeouts 超时 withing the one nanosecond 纳秒精度 precision, while both poll and epoll can only handle the one millisecond precision. This is not likely to be a concern on a desktop or server system, which clocks doesn’t even run with such precision, but it may be necessary on a realtime embedded platform 嵌入式平台 while interacting with some hardware components. Such as lowering control rods to shut down a nuclear reactor – in this case, please, use select to make sure we’re all stay safe!

The case above would probably be the only case where you would have to use select and could not use anything else. However if you are writing an application which would never have to handle more than a handful of sockets (like, 200), the difference between using poll and select would not be based on performance, but more on personal preference or other factors.

(2) Polling with poll()

poll is a newer polling method which probably was created immediately after someone actually tried to write the high performance networking server. It is much better designed and doesn’t suffer from most of the problems which select has. In the vast majority of cases you would be choosing between poll and epoll/libevent.

To use poll, the developer needs to initialize the members of struct pollfd structure with the descriptors and events to monitor, and call the poll(). A typical workflow looks like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// The structure for two events
struct pollfd fds[2];

// Monitor sock1 for input
fds[0].fd = sock1;
fds[0].events = POLLIN;

// Monitor sock2 for output
fds[1].fd = sock2;
fds[1].events = POLLOUT;

// Wait 10 seconds
int ret = poll( &fds, 2, 10000 );
// Check if poll actually succeed
if ( ret == -1 )
// report error and abort
else if ( ret == 0 )
// timeout; no event detected
else
{
// If we detect the event, zero it out so we can reuse the structure
if ( pfd[0].revents & POLLIN )
pfd[0].revents = 0;
// input event on sock1

if ( pfd[1].revents & POLLOUT )
pfd[1].revents = 0;
// output event on sock2
}

poll was mainly created to fix 修复 the pending problems select had, so it has the following advantages over it:

  • There is no hard limit on the number of descriptors poll can monitor, so the limit of 1024 does not apply here.
  • It does not modify 不修改 the data passed in the struct pollfd data. Therefore it could be reused between the poll() calls as long as set to zero the revents member for those descriptors which generated the events. The IEEE specification states that “In each pollfd structure, poll() shall clear the revents member, except that where the application requested a report on a condition by setting one of the bits of events listed above, poll() shall set the corresponding bit in revents if the requested condition is true“. However in my experience at least one platform did not follow this recommendation, and man 2 poll on Linux does not make such guarantee either (man 3p poll does though).
  • It allows more fine-grained control of events comparing to select. For example, it can detect remote peer shutdown 检测远程关闭 without monitoring for read events.

There are a few disadvantages as well, which were mentioned above at the end of the select section. Notably, poll is not present on Microsoft Windows older than Vista; on Vista and above it is called WSAPoll although the prototype is the same, and it could be defined as simply as:

1
2
3
#if defined (WIN32)
static inline int poll( struct pollfd *pfd, int nfds, int timeout) { return WSAPoll ( pfd, nfds, timeout ); }
#endif

And, as mentioned above, poll timeout has the 1ms precision 毫秒精度, which again is very unlikely to be a concern in most scenarios. Nevertheless poll still has a few issues 仍然有一些问题 which need to be kept in mind:

  • Like select, it is still not possible to find out which descriptors have the events triggered without iterating through the whole list and checking the revents 仍然需要遍历. Worse, the same happens in the kernel space as well, as the kernel has to iterate through the list of file descriptors to find out which sockets are monitored, and iterate through the whole list again to set up the events.
  • Like select, it is not possible to dynamically modify the set 无法动态修改集合 or close the socket which is being polled 无法关闭正在 polledsocket (see above).

Please keep in mind, however, that those issues might be considered unimportant for most client networking applications – the only exception would be client software such as P2P which may require handling of thousands of open connections. Those issues might not be important even for some server applications. Therefore poll should be your default choice over select unless you have specific reasons mentioned above. More, poll should be your preferred method even over epoll if the following is true:

  • You need to support more than just Linux 支持更多操作系统, and do not want to use epoll wrappers such as libevent (epoll is Linux only);
  • Your application needs to monitor less than 1000 sockets 一次监控小于 1000 个 sockets at a time (you are not likely to see any benefits from using epoll);
  • Your application needs to monitor more than 1000 sockets at a time, but the connections are very short-lived 都是短连接 (this is a close case, but most likely in this scenario you are not likely to see any benefits from using epoll because the speedup in event waiting would be wasted on adding those new descriptors into the set – see below)
  • Your application is not designed the way that it changes the events while another thread is waiting for them (i.e. you’re not porting an app using kqueue or IO Completion Ports).

(3) Polling with epoll()

epoll is the latest, greatest, newest polling method in Linux (and only Linux 只有 Linux). Well, it was actually added to kernel in 2002, so it is not so new. It differs both from poll and select in such a way that it keeps the information about the currently monitored descriptors and associated events inside the kernel, and exports the API to add/remove/modify those.

To use epoll, much more preparation is needed. A developer needs to:

  • 创建 the epoll descriptor by calling epoll_create;
  • 初始化 the struct epoll structure with the wanted events 事件 and the context data pointer 上下文数据指针. Context could be anything, epoll passes this value directly to the returned events structure. We store there a pointer to our Connection class.
  • Call epoll_ctl( … EPOLL_CTL_ADD ) to 添加 the descriptor into the monitoring set
  • Call epoll_wait() to wait for 20 events for which we reserve the storage space. Unlike previous methods, this call receives an empty structure, and fills it up only with the triggered events. For example, if there are 200 descriptors and 5 of them have events pending, the epoll_wait will return 5, and only the first five members of the pevents structure will be initialized. If 50 descriptors have events pending, the first 20 would be copied and 30 would be left in queue, they won’t get lost.
  • 遍历 through the returned items. This will be a short iteration since the only events returned are those which are triggered.

A typical workflow looks like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Create the epoll descriptor. Only one is needed per app, and is used to monitor all sockets.
// The function argument is ignored (it was not before, but now it is), so put your favorite number here
int pollingfd = epoll_create( 0xCAFE );

if ( pollingfd < 0 )
// report error

// Initialize the epoll structure in case more members are added in future
struct epoll_event ev = { 0 };

// Associate the connection class instance with the event. You can associate anything
// you want, epoll does not use this information. We store a connection class pointer, pConnection1
ev.data.ptr = pConnection1;

// Monitor for input, and do not automatically rearm the descriptor after the event
ev.events = EPOLLIN | EPOLLONESHOT;
// Add the descriptor into the monitoring list. We can do it even if another thread is
// waiting in epoll_wait - the descriptor will be properly added
if ( epoll_ctl( epollfd, EPOLL_CTL_ADD, pConnection1->getSocket(), &ev ) != 0 )
// report error

// Wait for up to 20 events (assuming we have added maybe 200 sockets before that it may happen)
struct epoll_event pevents[ 20 ];

// Wait for 10 seconds
int ready = epoll_wait( pollingfd, pevents, 20, 10000 );
// Check if epoll actually succeed
if ( ret == -1 )
// report error and abort
else if ( ret == 0 )
// timeout; no event detected
else
{
// Check if any events detected
for ( int i = 0; i < ret; i++ )
{
if ( pevents[i].events & EPOLLIN )
{
// Get back our connection pointer
Connection * c = (Connection*) pevents[i].data.ptr;
c->handleReadEvent();
}
}
}

Just looking at the implementation alone should give you the hint of what are the disadvantages of epoll, which we will mention firs. It is more complex to use 更加复杂, and requires you to write more code 更多代码, and it requires more library calls comparing to other polling methods.

However epoll has some significant advantages 更大优势 over select/poll both in terms of performance and functionality:

  • epoll returns only the list of descriptors which triggered the events 只返回触发事件的描述符. No need to iterate through 10,000 descriptors anymore to find that one which triggered the event!
  • You can attach meaningful context 附加有意义的上下文 to the monitored event instead of socket file descriptors. In our example we attached the class pointers which could be called directly, saving you another lookup.
  • You can add sockets or remove them from monitoring anytime 任意添加/删除, even if another thread is in the epoll_wait function. You can even modify the descriptor events. Everything will work properly, and this behavior is supported and documented. This gives you much more flexibility in implementation.
  • Since the kernel knows all the monitoring descriptors, it can register the events happening on them even when nobody is calling epoll_wait. This allows implementing interesting features such as edge triggering, which will be described in a separate article.
  • It is possible to have the multiple threads waiting 多线程等待 on the same epoll queue with epoll_wait(), something you cannot do with select/poll. In fact it is not only possible with epoll, but the recommended method in the edge triggering mode.

However you need to keep in mind that epoll is not a “better poll”, and it also has 缺点 when comparing to poll:

  • Changing the event flags (i.e. from READ to WRITE) requires the epoll_ctl syscall, while when using poll this is a simple bitmask operation done entirely in userspace. Switching 5,000 sockets from reading to writing with epoll would require 5,000 syscalls and hence context switches 上下文切换 (as of 2014 calls to epoll_ctl still could not be batched, and each descriptor must be changed separately), while in poll it would require a single loop over the pollfd structure.
  • Each accept()ed socket needs to be added to the set, and same as above, with epoll it has to be done by calling epoll_ctl – which means there are two required syscalls 每个连接需要两个系统调用 per new connection socket instead of one for poll. If your server has many short-lived connections which send or receive little traffic, epoll will likely take longer than poll to serve them.
  • epoll is exclusively Linux domain, and while other platforms have similar mechanisms, they are not exactly the same – edge triggering, for example, is pretty unique (FreeBSD’s kqueue supports it too though).
  • High performance processing logic is more complex and hence more difficult to debug 难以调试, especially for edge triggering which is prone to deadlocks if you miss extra read/write.

Therefore you should only use epoll if all following is true:

  • Your application runs a thread poll which handles many network connections by a handful of threads 多线程. You would lose most of epoll benefits in a single-threaded application, and most likely it won’t outperform poll.
  • You expect to have a reasonably large number of sockets 大量的 to monitor (at least 1,000); with a smaller number epoll is not likely to have any performance benefits over poll and may actually worse the performance;
  • Your connections are relatively long-lived 生命周期长; as stated above epoll will be slower than poll in a situation when a new connection sends a few bytes of data and immediately disconnects because of extra system call required to add the descriptor into epoll set;
  • Your app depends on other Linux-specific features (so in case portability question would suddenly pop up, epoll wouldn’t be the only roadblock), or you can provide wrappers for other supported systems. In the last case you should strongly consider libevent.

If all the items above aren’t true, you should be better served by using poll instead.

(4) Polling with libevent

libebent is a library which wraps the polling methods listed in this article (and some others) in an uniform API.Its main advantage is that it allows you to write the code once and compile and run it on many operating systems without the need to change the code 各个操作系统兼容性. It is important to understand that libevent it is just a wrapper built on top of the existing polling methods, and therefore it inherits the issues those polling methods have. It will not make select supporting more than 1024 sockets on Linux or allow epoll to modify the polling events without a syscall/context switch. Therefore it is still important to understand each method’s pros and cons.

Having to provide access to the functionality from the dramatically different methods, libevent has a rather complex API which is much more difficult to use than poll or even epoll. It is however easier to use libevent than to write two separate backends if you need to support FreeBSD (epoll and kqueue). Hence it is a viable alternative which should be considered if:

  • Your application requirements indicate that you must use epoll, and using just poll would not be enough (if poll would satisfy your needs, it is extremely unlikely libevent would offer you any benefits)
  • You need to support other OS than Linux, or may expect such need to arise in future. Again, this depends on other features of your application – if it is tied up to many other Linux-specific things you’re not going to achieve anything by using libevent instead of epoll.

参考

libevent 实现,IO + 定时 + 信号事件如何集成统一

PPA

A Personal Package Archive (PPA) is a software repository for uploading source packages to be built and published as an Advanced Packaging Tool (APT) repository by Launchpad.

解压

1
tar -xvf filename.tar.xz

目录结构问题

/usr/lib/x86_64-linux-gnu is /usr/lib64, This changed when Ubuntu 12.04 came out.

包搜索问题

使用 apt search 来搜索包,有的时候示例文档提示的需要的依赖,可能并不精确:

1
sudo apt search libaio

man 问题

查看第 8 节:

1
man 8 apt-get

通过这个可以了解到,平常 apt-get 后面跟的 -y 选项代表的是:

1
2
3
4
5
-y, --yes, --assume-yes
Automatic yes to prompts; assume "yes" as answer to all prompts and run
non-interactively. If an undesirable situation, such as changing a held package,
trying to install a unauthenticated package or removing an essential package occurs
then apt-get will abort.

如何只在文件夹不存在的情况下才创建文件夹

1
mkdir -p crawler/log

telnet 的正确使用方法

1
telnet 10.108.112.218 1099

telnet 可以连接到任何 TCP 服务器,包括 HTTP 服务器:

1
2
3
4
5
6
7
8
9
10
11
12
13
zk@zk-pc:~/Documents$ telnet www.kunzhao.org 80
Trying 59.110.168.119...
Connected to www.kunzhao.org.
Escape character is '^]'.
GET / HTTP/1.1 <- 输入这行
<- 输入回车行
HTTP/1.1 200 OK <- 服务器开始返回内容
X-Powered-By: Hexo
Content-Type: text/html
Date: Mon, 11 Sep 2017 01:37:21 GMT
Connection: keep-alive
Transfer-Encoding: chunked
...

切换为 root 用户的正确操作

The root account is disabled by default in Ubuntu, so there is no root password, that’s why su fails with an authentication error.

Use sudo -i to become root:

1
2
3
4
5
zk@zk-pc:~/Documents$ su root
Password:
su: Authentication failure
zk@zk-pc:~/Documents$ sudo -i
root@zk-pc:~#

Ubuntu 设置启动项菜单引导

1
2
3
sudo add-apt-repository ppa:danielrichter2007/grub-customizer
sudo apt-get update
sudo apt-get install grub-customizer

Ubuntu 卸载其它系统

1
2
sudo add-apt-repository ppa:yannubuntu/boot-repair
sudo apt-get update; sudo apt-get install -y os-uninstaller && os-uninstaller
1
readlink -f `which emacs`
1
2
cd /usr/local/bin
sudo ln -s /usr/lib/intellij_idea/idea-IC-172.3968.16/bin/idea.sh idea

mount

1) Determine what device a directory is located on:

Use df -h:

1
2
3
zk@zk-pc:/dev$ df -h /media/zk/Documents/
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 209G 17G 192G 9% /media/zk/Documents

2) Unmount:

1
umount Documents/

必须等待不忙的时候再 umount,否则会出现如下错误:

1
2
3
4
zk@zk-pc:/media/zk$ umount Documents/
Error unmounting block device 8:3: GDBus.Error:org.freedesktop.UDisks2.Error.DeviceBusy: Error unmounting /dev/sda3: Command-line `umount "/media/zk/Documents"' exited with non-zero exit status 32: umount: /media/zk/Documents: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)

3) Mount:

1
2
3
cd /media/zk
sudo mkdir Documents/ # 必须先创建文件夹
sudo mount /dev/sda3 Documents

必备软件

1
2
3
4
5
6
7
8
9
10
# 分区软件
sudo apt install gparted

# 截屏软件
sudo add-apt-repository ppa:shutter/ppa
sudo apt-get update && sudo apt-get install shutter

# 办公软件 http://community.wps.cn/download/

# PDF 阅读软件 https://www.foxitsoftware.cn/downloads/

1) Shutter 配置:

禁止掉原来的截屏快捷键:

添加 Shutter 截屏快捷键:

配置 Shutter 截屏后的文件名,自动拷贝路径等:

获得文件信息

1) 显示文件路径

1
dirname $file

2) 显示文件名:

1
basename $file

隐藏活动记录

安装 QQ

Windows 硬盘安装 Ubuntu

  • 下载安装 EasyBCD:

  • 进行配置,内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# NeoSmart NeoGrub Bootloader Configuration File
#
# This is the NeoGrub configuration file, and should be located at C:\NST\menu.lst
# Please see the EasyBCD Documentation for information on how to create/modify entries:
# http://neosmart.net/wiki/display/EBCD/

title Install Ubuntu
root (hd0,0)
kernel (hd0,0)/vmlinuz.efi boot=casper iso-scan/filename=/ubuntu-16.04.3-desktop-amd64.iso ro quiet splash locale=zh_CN.UTF-8
initrd (hd0,0)/initrd.lz
title reboot
reboot
title halt
halt
  • 解压 ubuntu.iso 提取两个文件:

  • 将三个重要文件放到 C 盘目录:

  • 重新启动,进入 Ubuntu 引导启动加载,进行安装

参考:

openssh-server

1
2
3
4
5
sudo apt-get update
sudo apt-get install openssh-server
sudo ufw allow 22
sudo iptables -L
sudo iptables -A INPUT -p tcp --dport ssh -j ACCEPT

Windows 行结尾符转为 UNIX 结尾符

1
2
3
4
5
sudo apt install dos2unix
# Replace all CR from all lines, in place operation.
dos2unix file.txt
# To save the output in a different file
dos2unix -n file.txt output.txt

Ubuntu 引导项重新扫描

1
sudo update-grub

zip

不要 .git 文件夹

1
zip -r bitvolution.zip bitvolution -x \*.git\*

添加新的用户

1
2
3
4
5
6
7
8
# 创建用户
sudo useradd -d /home/tom -m tom -s /bin/bash

# 设置密码
sudo passwd tom

# 添加 sudo 权限
sudo adduser tom sudo

Linux 上安装虚拟机

应该搜索的是 virtualbox 而不是 vmware,另外也需要 Windows 32bit vhd 文件才行

Linux 修改组

1
2
3
4
5
6
7
8
9
10
11
12
13
# 添加一个组 zk
sudo groupadd zk
# 组下面添加一个用户
sudo usermod -a -G zk zk
# 改变文件组
chgrp zk sample_file.txt
chgrp -R zk sample_folder/
# 查看一个用户属于哪些组
groups zk
# 查看 primary group
id -g zk
# 改变用户分组
usermod -g primaryGroupName zk

1
2
3
4
# 修改 Group Owner
chgrp zk file.txt
# 修改 Owner
chown zk file.txt

循环所有文件

1
2
3
4
for file in *
do
echo `basename $file`"\t"
done

合并所有文件

1
cat * > merged-file

tmux

创建一个新的会话:

1
tmux new -s <name-of-my-session>

重新返回先前开启的会话:

1
tmux a -t <name-of-my-session>

断开会话,返回到桌面:

1
Ctrl-b d

关闭会话:

1
tmux kill-session -t <name-of-my-session>

注意,在 tmux 会话里面的话,如果输入 exit 的话,会直接杀死并退出。

ftp

连接:

1
ftp 10.108.118.176 2121

本地文件传输到手机上:

1
put /home/zk/Desktop/xxx.pdf ./xxx.pdf

注意,这个地方必须指明要传输的文件在手机端的文件名字

断开连接:

1
bye

可以使用 ls, cd, pwd 等命令在手机的目录上来回切换。


想要显示进度条,请使用 ncftp:

1
2
3
sudo apt install ncftp
ncftp -P 2121 10.108.112.24
put /home/zk/Desktop/xxx.pdf ./

ftp 命令列表:

ftp_command_list

注意 ncftp 命令的 get 稍有不同:

1
2
3
# 并不是从远端拷贝 a.zip 然后修改名称为 b.zip 并存放到本地
# 而是从远端拷贝 a.zip 和 b.zip 到本地
ncftp> get a.zip b.zip

unzip

乱码问题

使用 unzip -O GBK xxx.zip 命令即可


提示这句话: replace jsp/extension/add-aspect.jsp? [y]es, [n]o, [A]ll, [N]one, [r]ename: y:

1
unzip -o /path/to/archive.zip

进入 MySQL 日志目录

这个时候执行:

1
sudo su -

就可以进来。

split 以及 merge

1
2
3
4
5
6
7
# 大文件切割为小文件
# 将会生成 WinXP_img_aa
# 将会生成 WinXP_img_ab
split --bytes=2048m WinXP.img WinXP_img_

# 小文件合并成大文件
cat WinXP_img_a* WinXP.img

计算文件的 MD5 大小

1
md5sum xxx.iso

查询 DNS

1
nmcli device show eno1 | grep IP4.DNS

解决 bash 乱码问题

查看系统支持的所有语言: locale -a


~/.bashrc 里面加入:

1
2
export LANG='UTC-8'
export LC_ALL='en_US.UTF-8'

在本地和登录到服务器输入 locale 回车会显示下面内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Ubuntu 自建热点问题

要创建一个热点,必须有无线网卡,且该网卡支持AP模式,具体可以用如下命令查看:

1
2
sudo apt-get install iw
iw list

如果出现如下包含AP的关键字,则表示支持AP模式:

AP

ubuntu下使用networkmanger可以很方便的建立ad-hocwifi热点。。但是android默认不支持ad-hocwifi

安装下面这个软件可以将 ad-hoc 模式改为 access-point 模式:

1
2
sudo apt-get install plasma-nm
kde5-nm-connection-editor

这样 Android 手机就可以识别了。

tcpdump 命令

抓取 TCP 报文段:

1
sudo tcpdump -ntx -i eth0 port 54321

获取刚刚运行的 Java 进程 ID

If the starting is automated by a shell script, you can write the pid of the just-started-process which is in the variable $!.

1
2
java ...... &
echo "$!" > myjavaprogram.pid

When you need to kill it, just do:

1
kill `cat myjavaprogram.pid`

Linux 实现进度条效果

You can implement this by overwriting a line. Use \r to go back to the beginning of the line without writing \n to the terminal.

Write \n when you’re done to advance the line.

Use echo -ne to:

not print \n and
to recognize escape sequences like \r.
Here’s a demo:

1
2
3
4
5
6
echo -ne '#####                     (33%)\r'
sleep 1
echo -ne '############# (66%)\r'
sleep 1
echo -ne '####################### (100%)\r'
echo -ne '\n'

探测文件类型

1
file -i your_file

gzip

如果开启 GZIP 压缩的话,服务器返回的响应码中会加入:

1
content-encoding:gzip

使用 gzip 工具可以解压这些文件 (后缀至少支持的可解压的后缀,例如 gz):

1
gzip -d html.gz

系统各个进程使用内存情况排序

1
ps aux --sort=-%mem | awk 'NR<=10{print $0}'

Shell 中的布尔值

  • 函数返回 0 代表 true
  • 函数返回 1 代表 false

(1) 进入文件夹成功,我再进入该文件夹,并打印文件夹当前地址

1
mkdir tmp && cd tmp && pwd

(2) 如果文件夹不存在,提示用户文件夹不存在,并结束当前程序

1
[ -d tmp ] || ( echo 'folder tmp not exist !'; exit 1; )

逻辑判断

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
! expression

# OR
[ ! expression ]

# OR
if test ! condition
then
command1
command2
fi

# OR
if [ ! condition ]
then
command1
command2
fi

数组

1
2
3
4
5
6
7
8
9
10
11
## declare an array variable
declare -a arr=("element1" "element2" "element3")

## now loop through the above array
for i in "${arr[@]}"
do
echo "$i"
# or do whatever with individual element of the array
done

# You can access them using echo "${arr[0]}", "${arr[1]}" also

Also works for multi-line array declaration

1
2
3
4
declare -a arr=("element1" 
"element2" "element3"
"element4"
)

字符串多行

1
2
3
long_arg="my very long string\
which does not fit\
on the screen"

字符串包含

1
2
3
4
5
if [[ $agent_id = *"$host_name"* ]]; then
echo 'contains'
else
echo 'not contains'
fi

读取一行文件

1
agent_id=`head -n 1 $APPLICATION_CONF_FILE`

读取 CPU 信息

1
2
3
4
lscpu

# OR
cat /proc/cpuinfo

字符界面访问浏览器

1
2
# links is a text mode WWW browser with ncurses interface, supporting colors, correct table rendering, background downloading, menu driven configuration interface and slim code.
sudo apt install links

uptime 格式

1
14:51:56 up 27 days,  3:11,  3 users,  load average: 0.14, 0.08, 0.01
  • 当前时间: 14:51:56
  • 累积运行: 27 天 03:11 分钟
  • 已登录用户数量: 3 个
  • 平均负载量: 1 分钟、5 分钟、15 分钟 平均负载量

平均负载定义

指的就是单位时间内,系统处于可运行状态 (R)不可中断状态 (D)的平均进程数

怎样算合理

1
2
# 查看 CPU 数量
grep 'model name' /proc/cpuinfo | wc -l

平均负载高于 CPU 数量 70% 的时候,就认为负载高。

怎样会导致负载升高

1
2
3
4
5
6
7
8
# 模拟一个 CPU 使用率 100% 场景
stress --cpu 1 --timeout 600

# 模拟 I/O 不停执行 sync
stress -i 1 --timeout 600

# 模拟大量进程
stress -c 8 --timeout 600

怎样分析负载原因

1
2
3
4
5
# 大量进程: 间隔 5 秒输出一组数据, 查看 %wait 列, 查看 PID 哪个进程
pidstat -u 5 1

# 显示所有 CPU 指标, 查看 %iowait 列
mpstat -P ALL 5 1

注: 需要提前安装 systat, mpstatpidstatiostat 都在这个里面

telnet 远程某个端口连接不上

ufw 命令是方便用户管理 iptables 而出的命令: sudo ufw [status|disable|enable]

  • 列举规则: sudo iptables -L
  • 清除所有规则:
1
2
3
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT

在本机器上:

1
2
3
4
5
# 测试 TCP 是否可以连接,扮演 TCP 服务端
nc -l 10.108.114.17 13307

# 测试 UDP 是否可以连接,扮演 UDB 服务端
nc -lu 10.108.114.17 13307

在另外一个机器上:

1
2
3
4
5
# 扮演 TCP 客户端
nc -v 10.108.114.17 13307

# 扮演 UDP 客户端
nc -lu 10.108.114.17 13307

使用命令 netstat -anp | grep 13307 可以查看当前的端口状态。

注意: nc -l 只能同时接受一个连接,一旦与某个客户端建立连接成功,便收不到其它客户端的连接。

上下文切换

1
2
3
4
5
6
7
8
9
10
11
12
# 每隔 5 秒输出 1 组数据
$ vmstat 5

# cs: 每秒上下文切换
# in: 每秒中断次数
# r: 就绪队列的进程, 正在运行和等待 CPU 的进程数
# b: 处于不可中断的睡眠进程的进程数
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 525516 203844 3236436 0 0 104 364 5 15 10 4 73 13 0
1 1 0 508080 203844 3253676 0 0 48 1710 1585 4153 3 1 84 11 0
1 0 0 524708 203844 3236940 0 0 67 858 1525 3986 3 1 87 9 0

查看所有进程的:

1
2
3
4
5
6
7
8
# 每隔 5 秒输出 1 组数据, -u 表示输出 CPU 指标
pidstat -wu 5

# cswch: 每秒自愿上下文切换次数。进程无法获取到所需自愿导致的 I/O、内存不足。
# nvcswch: 每秒非自愿上下文切换次数。时间片已到,大量进程争抢 CPU 就容易发生。
11时30分05秒 UID PID cswch/s nvcswch/s Command
11时30分08秒 0 3 0.67 0.00 ksoftirqd/0
11时30分08秒 0 7 68.67 0.00 rcu_sched

/proc 是 Linux 的一个虚拟文件系统,用于内核空间与用户空间之间的通信。更多信息可以参考 man proc

查看一个进程的父进程

1
pstree | grep stress

僵尸进程 (状态为: Z) 不断增多,会耗尽进程号,说明有进程没能正确清理子进程的资源。

Bufferd I/O vs Direct I/O

Buffered I/O (标准 I/O)

Direct I/O (直接 I/O)

参考

推荐文章