Skip to content

Linux中的select函数

最近在排查一个用户同步数据非常慢的问题。使用perf trace -S -p $pid发现,进程的大部分时间花费在了select函数上:

# perf trace -S -p 9091
...
 12665.204 (35.778 ms): select(n: 18, inp: 0x28c13a0, outp: 0x28c1160                         ) = 1
 12700.995 ( 0.023 ms): read(fd: 17<socket:[169296244]>, buf: 0x2af3600, count: 32768         ) = 2896
^C
 Summary of events:

 p4d (9091), 13844 events, 100.0%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   select              2032 12389.683     0.000     6.097   124.075      7.78%
   write               2858    53.719     0.008     0.019     0.252      1.09%
   read                2032    38.850     0.004     0.019     0.259      1.35%

凭直觉觉得有些异常。按照Linux手册上对select的说法:

  int select(int nfds, fd_set *readfds, fd_set *writefds,             
             fd_set *exceptfds, struct timeval *timeout);  

select() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become “ready” for some class of I/O operation (e.g., input possible).
A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking. […]

Three independent sets of file descriptors are watched. Those listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block; in particular, a file descriptor is also ready on end-of-file), those in writefds will be watched to see if a write will not block, and those in exceptfds will be watched for exceptions.
On exit, the sets are modified in place to indicate which file descriptors actually changed status. Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be watched for the corresponding class of events.
[…]

Return Value
On success, select() and pselect() return the number of file descriptors contained in the three returned descriptor sets (that is, the total number of bits that are set in readfds, writefds, exceptfds) which may be zero if the timeout expires before anything interesting happens. On error, -1 is returned, and errno is set appropriately; the sets and timeout become undefined, so do not rely on their contents after an error.

由于perf trace里看不到相关函数的参数及返回值,用strace追踪则可以看到:

strace -p 9091
 […]
 write(19, "\271i\355\312\340\225\324\177\371\304$r\317\3511rc\304\353\360U\324\277e[\26\271\273\314\260\353"…, 4096) = 4096 
select(18, [17], [], NULL, NULL)        = 1 (in [17]) 
read(17, "\275\26hs\314\311\33\324\277\315B\357\26\250,q\362\5A)\257\231\21\225\273V\321\225\211\241\371\177"…, 32768) = 2896 
write(19, "\355\325\255\222\222\22\0232\322\322\257\375\263\315\337S\v\252\342\323\23S3\376\33626~\310\3776\266"…, 4096) = 4096 
select(18, [17], [], NULL, NULL)        = 1 (in [17]) 
read(17, "\34306x\262'\17\342\277#\357\306\324\314*\300\207\340u\250X\331\220\214^\274\361\272\267\300\27"…, 32768) = 5792
 write(19, "\331;r\371\221\\355\5\1\320\177\323S\353\310\215\7\247\346\n\0\2275\356\320\227\334\275O\264\237\334"…, 4096) = 4096
 select(18, [17], [], NULL, NULL
 ^Cstrace: Process 9091 detached

可以看到应用在调用select函数时只设置了readfds以及writefds参数,exceptfds和timout为NULL,而且writefds为空(不理解为空和NULL有什么差别,猜测是不检测要写的fd)。

其中fd 17是个socket,进程通过它从一个远程服务器下载数据。是问题的主要嫌疑对象。目前的猜测是,由于数据传输慢,select需要多次查询,socket才有一次准备好,因而大部分时间花费在select函数上,而非read。

今天用perf trace追踪了一下两个传输速度正常的进程,结果如下:

  8523.185 ( 0.006 ms): read(fd: 24, buf: 0x2b30459, count: 4096                              ) = 4096
^C
 Summary of events:

 p4d (10316), 218542 events, 100.0%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   select              2498  7035.923     0.000     2.817   112.853      2.74%
   read              104269   613.876     0.002     0.006     1.424      0.69%
   write               2493   174.014     0.022     0.070     0.607      0.99%
   open                   5     0.109     0.017     0.022     0.028      8.48%
   close                  5     0.015     0.002     0.003     0.004      8.22%
   flock                  2     0.012     0.003     0.006     0.008     40.06%
  11718.425 ( 0.004 ms): read(fd: 24, buf: 0x2b21699, count: 4096                              ) = 4096
^C
 Summary of events:

 p4d (30890), 545667 events, 99.3%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   select              6116  9339.862     0.000     1.527   190.234      3.67%
   read              261559  1052.123     0.001     0.004     1.099      0.48%
   write               6099   279.031     0.019     0.046     1.633      1.14%
   open                  16     0.447     0.011     0.028     0.054     11.16%
   close                 15     0.038     0.001     0.003     0.006     12.28%
   flock                  4     0.021     0.003     0.005     0.008     20.89%
Avatar

专业Linux/Unix/Windows系统管理员,开源技术爱好者。对操作系统底层技术,TCP/IP协议栈以及信息系统安全有强烈兴趣。电脑技术之外,则喜欢书法,古典诗词,数码摄影和背包行。

Sidebar