Linux下關(guān)于TCP的keep alive的實(shí)現(xiàn)源碼分析
來源:程序員人生 發(fā)布時(shí)間:2015-01-23 08:45:25 閱讀次數(shù):2609次
TCP下的Keep Alive
我們常說的TCP的keep alive,就是為了保證連接的有效性,在間隔1定的時(shí)間發(fā)探測(cè)包,根據(jù)回復(fù)來確認(rèn)該連接是不是有效。通常上層利用會(huì)自己提供心跳檢測(cè)機(jī)制,而Linux內(nèi)核本身也提供了從內(nèi)核層面的確保連接有效性的方式。
在sock 函數(shù)中可以設(shè)置是不是需要打開keep alive開關(guān),默許建立socket 是關(guān)閉keep alive的。代碼以下
optval = 1;
optlen = sizeof(optval);
if(setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen) < 0) {
perror("setsockopt()");
close(s);
exit(EXIT_FAILURE);
}
Keep Alive 的控制參數(shù)
tcp_keepalive_time 參數(shù)
控制keep alive的最長(zhǎng)空閑時(shí)間
tcp_keepalive_probes 參數(shù)
當(dāng)超過最長(zhǎng)空間時(shí)間后,內(nèi)核會(huì)嘗試發(fā)出探測(cè)包確認(rèn)客戶端時(shí)候存活,該參數(shù)控制的是嘗試的次數(shù)
tcp_keepalive_intvl 參數(shù)
當(dāng)超過最長(zhǎng)空閑時(shí)間后,內(nèi)核會(huì)發(fā)出探測(cè)包,當(dāng)沒有收到確認(rèn)回復(fù)的,該參數(shù)控制下個(gè)探測(cè)包的時(shí)間
Linux下如何實(shí)現(xiàn)Keep Alive
sock結(jié)構(gòu)體中的timer_list
在sock結(jié)構(gòu)體中,存在timer_list的結(jié)構(gòu)體sk_timer,參考下面結(jié)構(gòu)
struct sock{
...
struct timer_list sk_timer;
...
}
struct timer_list {
struct list_head entry;
unsigned long expires;
void (*function)(unsigned long);
unsigned long data;
struct tvec_base *base;
#ifdef CONFIG_TIMER_STATS
void *start_site;
char start_comm[16];
int start_pid;
#endif
#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
};
timer_list結(jié)構(gòu)體,是在sock里經(jīng)常使用的timer履行鏈表,entry代表的是鏈表的頭, expires代表的失效時(shí)間,而function就是履行的函數(shù)。
注冊(cè)keepalive處理函數(shù)
void inet_csk_init_xmit_timers(struct sock *sk,
void (*retransmit_handler)(unsigned long),
void (*delack_handler)(unsigned long),
void (*keepalive_handler)(unsigned long))
{
struct inet_connection_sock *icsk = inet_csk(sk);
setup_timer(&icsk->icsk_retransmit_timer, retransmit_handler,
(unsigned long)sk);
setup_timer(&icsk->icsk_delack_timer, delack_handler,
(unsigned long)sk);
setup_timer(&sk->sk_timer, keepalive_handler, (unsigned long)sk);//注冊(cè)了函數(shù)在sk_timer中
icsk->icsk_pending = icsk->icsk_ack.pending = 0;
}
當(dāng)連接完成的時(shí)候(也就是握手成功的時(shí)候),在新生成的sock里面的sk_timer結(jié)構(gòu)體中,注冊(cè)了函數(shù)keepalive_handler函數(shù)
void tcp_init_xmit_timers(struct sock *sk)
{
inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
&tcp_keepalive_timer);
}
而keepalive_handler函數(shù)就是tcp_keepalive_timer函數(shù)
static void tcp_keepalive_timer (unsigned long data)
{
......
if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
goto out;
elapsed = keepalive_time_when(tp);
/* It is alive without keepalive 8) */
if (tp->packets_out || tcp_send_head(sk))
goto resched;
elapsed = tcp_time_stamp - tp->rcv_tstamp;
if (elapsed >= keepalive_time_when(tp)) {
if (icsk->icsk_probes_out >= keepalive_probes(tp)) {
tcp_send_active_reset(sk, GFP_ATOMIC);
tcp_write_err(sk);
goto out;
}
if (tcp_write_wakeup(sk) <= 0) {
icsk->icsk_probes_out++;
elapsed = keepalive_intvl_when(tp);
} else {
/* If keepalive was lost due to local congestion,
* try harder.
*/
elapsed = TCP_RESOURCE_PROBE_INTERVAL;
}
} else {
/* It is tp->rcv_tstamp + keepalive_time_when(tp) */
elapsed = keepalive_time_when(tp) - elapsed;
}
.....
}
上面只是截取了1部份代碼,重點(diǎn)是前面提到的參數(shù)的實(shí)現(xiàn),代碼首先先檢查了是不是在sock里設(shè)置了參數(shù)SO_KEEPALIVE,也就是sock里面的flag:SOCK_KEEPOPEN。
如果設(shè)置了socket的SO_KEEPALIVE,才繼續(xù)檢查時(shí)間戳,取的上次收到包的時(shí)間戳和當(dāng)前時(shí)間戳的差值,進(jìn)行和參數(shù)keepalive_time的比較,如果已超時(shí)了,那末檢查發(fā)已出探測(cè)包失敗的次數(shù),如果次數(shù)已比keepalive_probes的大,那末發(fā)出reset包,同時(shí)寫毛病報(bào)告,關(guān)閉sock。
如果比設(shè)置的探測(cè)包次數(shù)小的話,那發(fā)出探測(cè)包,同時(shí)設(shè)置下次的校驗(yàn)的時(shí)間戳為keepalive_intvl, 而不在是keepalive_time。
注意:在這里keepalive_intvl只是控制觸發(fā)下次校驗(yàn)的時(shí)間
計(jì)算結(jié)束無效連接的時(shí)間N會(huì)有兩種情況
a. keepalive_intvl 的時(shí)間比 keepalive_time 大
N=keepalive_time +keepalive_intvl*keepalive_probes
b. keepalive_intvl 的時(shí)間比 keepalive_time小
N=keepalive_time +keepalive_time*keepalive_probes
這也就是為何在默許設(shè)置里,認(rèn)為無效的連接的時(shí)間實(shí)際上是7200*6 要12小時(shí)才會(huì)斷掉連接
代碼中設(shè)置keepalive_time,keepalive_probes,keepalive_intvl
setsockopt(s, SOL_TCP, TCP_KEEPIDLE, &val, sizeof(int))
setsockopt(s, SOL_TCP, TCP_KEEPINTVL, &val, sizeof(int))
setsockopt(s, SOL_TCP, TCP_KEEPCNT, &val, sizeof(int))
所對(duì)應(yīng)的3個(gè)參數(shù) TCP_KEEPIDLE --> keepalive_time, TCP_KEEPINTVL--> keepalive_intvl, TCP_KEEPCNT--> keepalive_probes
3個(gè)參數(shù)本身也有最大值的保護(hù),TCP_KEEPIDLE 最大是32767 TCP_KEEPINTVL 最大值是32767 TCP_KEEPCNT 最大值是127
JAVA中并沒有提供對(duì)這些參數(shù)的修改
處理中的定時(shí)器
而TCP連接進(jìn)程中,會(huì)有很多的定時(shí)器timer,做1些定時(shí)的檢查,比如前面的博客里提到的清除accept queue的定時(shí)器,ack的定時(shí)器,有興趣的可以參考tcp_timer.c
定時(shí)器的主要作用就是在固定時(shí)的狀態(tài)下進(jìn)行程序調(diào)用,在keep alive中就是定時(shí)發(fā)送探測(cè)包以肯定包的有效性。
JAVA設(shè)置Keep alive
Java 里只允許打開keep alive,但卻不允許設(shè)置keep alive的幾個(gè)相干參數(shù),Java 對(duì)客戶端中打開keep alive直接調(diào)用Socket.setKeepAlive函數(shù),而在
服務(wù)器真?zhèn)€ServerSocket 卻不允許設(shè)置keep alive的開關(guān),你只能在accept 1個(gè)新的連接的socket 的時(shí)候設(shè)置。
如何能設(shè)置剛才的需要的參數(shù)在java中?這里只是提1個(gè)思路
在java里的最后JNI的調(diào)用中,設(shè)置了常量數(shù)組的保護(hù),只有數(shù)組中的參數(shù)才被允許設(shè)置到socket中
const opts[] = {
{ java_net_SocketOptions_TCP_NODELAY, IPPROTO_TCP, TCP_NODELAY },
{ java_net_SocketOptions_SO_OOBINLINE, SOL_SOCKET, SO_OOBINLINE },
{ java_net_SocketOptions_SO_LINGER, SOL_SOCKET, SO_LINGER },
{ java_net_SocketOptions_SO_SNDBUF, SOL_SOCKET, SO_SNDBUF },
{ java_net_SocketOptions_SO_RCVBUF, SOL_SOCKET, SO_RCVBUF },
{ java_net_SocketOptions_SO_KEEPALIVE, SOL_SOCKET, SO_KEEPALIVE },
{ java_net_SocketOptions_SO_REUSEADDR, SOL_SOCKET, SO_REUSEADDR },
{ java_net_SocketOptions_SO_BROADCAST, SOL_SOCKET, SO_BROADCAST },
{ java_net_SocketOptions_IP_TOS, IPPROTO_IP, IP_TOS },
{ java_net_SocketOptions_IP_MULTICAST_IF, IPPROTO_IP, IP_MULTICAST_IF },
{ java_net_SocketOptions_IP_MULTICAST_IF2, IPPROTO_IP, IP_MULTICAST_IF },
{ java_net_SocketOptions_IP_MULTICAST_LOOP, IPPROTO_IP, IP_MULTICAST_LOOP },
};
直接設(shè)置是沒有辦法了,但可以用自己寫JNI的方法來添加,由于在socket里有FileDescriptor fd,而里面的int fd 就是對(duì)應(yīng)到內(nèi)核中socket的 fd, 只要拿到這個(gè)fd 就能夠調(diào)用自己的native方法來設(shè)置需要的參數(shù),固然你也能夠?qū)懽约旱?socket, 來封裝1個(gè)寫setsocketoption的native 函數(shù)。
生活不易,碼農(nóng)辛苦
如果您覺得本網(wǎng)站對(duì)您的學(xué)習(xí)有所幫助,可以手機(jī)掃描二維碼進(jìn)行捐贈(zèng)