PG数据库库监听队列的长度问题-快上网网站建设公司

PG数据库库监听队列的长度问题

不论MySQL 还是pg 数据库都通过监听某个ip/端口, 或者某个socket 来实现通讯.
这里涉及到一个问题,就是这个监听队列的长度问题.

mysql  是自己实现的,  在my.cnf 里有个配置选项  back_log 这就是设置监听队列的长度的.

PG 数据库的监听队列的长度, 似乎没有地方可以设置.

在做一个pgbench 的高并发压力测试的时候,似乎出现这个问题.

命令:
pgbench -n -r -c 250 -j 250 -T 2 -f update_smallrange.sql

错误消息：
Connection to database "" failed:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

但是从上面的“Resource temporarily unavailable”看不出是哪个资源出问题了。
经过调查，找到了下面一个链接
http://www.postgresql.org/message-id/20130617141622.GH5875@alap2.anarazel.de

[code]
From:Andres Freund To:pgsql-hackers(at)postgresql(dot)orgSubject:PQConnectPoll, connect(2), EWOULDBLOCK and somaxconnDate:2013-06-17 14:16:22Message-ID:20130617141622.GH5875@alap2.anarazel.de (view raw, whole thread or download thread mbox)Thread: 2013-06-17 14:16:22 from Andres Freund   2013-06-26 11:22:58 from Andres Freund 2013-06-26 16:07:54 from Tom Lane 2013-06-26 18:12:00 from Andres Freund    2013-06-27 00:07:40 from Tom Lane    2013-06-27 06:17:57 from Andres Freund    2013-06-27 13:48:25 from Tom Lane    2013-06-27 16:42:47 from Tom Lane    Lists:pgsql-hackersHi,

When postgres on linux receives connection on a high rate client
connections sometimes error out with:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected

To reproduce start something like on a server with sufficiently high
max_connections:
pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql

Now that's strange since that error should happen at connect(2) time,
not when sending the startup packet. Some investigation led me to
fe-secure.c's PQConnectPoll:

if (connect(conn->sock, addr_cur->ai_addr,
                     addr_cur->ai_addrlen) < 0)
{
if (SOCK_ERRNO == EINPROGRESS ||
      SOCK_ERRNO == EWOULDBLOCK ||
      SOCK_ERRNO == EINTR ||
      SOCK_ERRNO == 0)
{
      /*
      * This is fine - we're in non-blocking mode, and
      * the connection is in progress.  Tell caller to
      * wait for write-ready on socket.
      */
      conn->status = CONNECTION_STARTED;
      return PGRES_POLLING_WRITING;
}
/* otherwise, trouble */
}

So, we're accepting EWOULDBLOCK as a valid return value for
connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
linux. Unfortunately POSIX allows those two to share the same value...

My manpage tells me:
EAGAIN No more free local ports or insufficient entries in the routing cache.  For
   AF_INET see the description of
   /proc/sys/net/ipv4/ip_local_port_range ip(7)
   for information on how to increase the number of local
   ports.

So, the problem is that we took a failed connection as having been
initially successfull but in progress.

Not accepting EWOULDBLOCK in the above if() results in:
could not connect to server: Resource temporarily unavailable
   Is the server running locally and accepting
   connections on Unix domain socket "/tmp/.s.PGSQL.5440"?

which makes more sense.

Trivial patch attached.

Now, the question is why we cannot complete connections on unix sockets?
Some code reading reading shows net/unix/af_unix.c:unix_stream_connect()
shows:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
So, if we're in nonblocking mode - which we are - and the receive queue
is full we return EAGAIN. The receive queue for unix sockets is defined
as
static inline int unix_recvq_full(struct sock const *sk)
{
return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
Where sk_max_ack_backlog is whatever has been passed to the
listen(backlog) on the listening side.

Question: But postgres does listen(fd, MaxBackends * 2), how can that be
a problem?
Answer:
   If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn,
   then  it  is  silently  truncated to that value; the default value in this file is
   128.  In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with
   the value 128.

Setting somaxconn to something higher indeed makes the problem go away.

I'd guess that pretty much the same holds true for tcp connections,
although I didn't verify that which would explain some previous reports
on the lists.

TLDR: Increase /proc/sys/net/core/somaxconn

Greetings,

Andres Freund

--
Andres Freund                      http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

[/code]

原来是PG服务端的listen backlog(受内核参数somaxconn限制)不够用了，somaxconn的默认值是128，调大后，重启PG再测就OK了。

   /proc/sys/net/core/somaxconn
            This  file  defines a ceiling value for the backlog argument of listen(2); see the listen(2) manual page
            for details.

到这里解决方案就很明了了,

echo 256 > /proc/sys/net/core/somaxconn

然后重新启动pg 继续进行就ok 了.

新闻名称：PG数据库库监听队列的长度问题
网址分享：http://cdkjz.cn/article/jegjss.html

多年建站经验

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

咨询相关问题或预约面谈，可以通过以下方式与我们联系

网站建设

网站推广

案例

方案

电商网站开发

微信小程序

我们

联系

精准传达 • 有效沟通

查看其它板块

PG数据库库监听队列的长度问题

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接

网络推广

Network promotion

网站方案

Solution

电商网站开发

E-commerce & System

我们

About Us

联系

Contact Us

精准传达 • 有效沟通

查看其它板块

PG数据库库监听队列的长度问题

相关资讯

linux的npm命令 linux nps

mysql怎么释放内存 mysql 释放空间

html5确定按钮 html中确认按钮怎么写

mysql怎么去除唯一 mysql怎么删除唯一索引

android系统图库 android系统图片

表格颜色css样式 表格颜色css样式设置

iOS内核开发要怎么学 内核开发是做什么的

jquery字符串移动 jquery实现字符串反转

多一份参考，总有益处

联系快上网，免费获得专属《策划方案》及报价

大客户专线 成都：13518219792 座机：028-86922220

友情链接 交换友情链接

表格颜色css样式表格颜色css样式设置

iOS内核开发要怎么学内核开发是做什么的

大客户专线成都：13518219792 座机：028-86922220

友情链接交换友情链接