Celery worker stuck after several days

Celery worker stuck after several days

Welcome to my blog. Help yourself.
----- perrin

2019-08-25 22:53


Use Celery distributive queue to crawl period

My site is based on Python Django. Celery is used to crawl the stock data period. But it always stuck and the celery not works.

By inspecting the log, I know that celery worker stuck several days ago, but the celery beat producer trigger task several minutes ago. By inspecting the processes, both celery worker and celery beat work fine.

Deadlock causes celery worker stuck

This problem annoys me for half a year. I even complained that python relative wheels are not robust, and plan to change technology stack and use spring boot.broken heart But issues are the homely cook of a programmer. Get rid of issues is the basic requirement of the participant. So just find out a way to solve it. After searching with Google, I found a similar issue and solution. It is from this article on Media.smiley

https://medium.com/squad-engineering/two-years-with-celery-in-production-bug-fix-edition-22238669601d

Deadlock cause the celery worker stuck and not consuming the task from the queue. The effect reason is that Postgres set the callback of SSL, but the callback is also used by other processes. Postgres release the lock when unload, but others do not know about it and not release it. So it is the problem. Upgrade psycopg2 to 2.6 will solve the problem. But mine is version 2.8. So it can not exactly match my case. But I get a clue from this article. The keyword is deadlock. Similar problems may exist in my project.

User strace or cat /proc/{pid}/stack to inspect the stuck reason

In the article, strace is used. But the utility is not installed on my VPS. So I use cat /proc/{pid}/stack to get the call stack info.

[root@nfvbfqi9 mysite]# celery inspect active
-> celery@nfvbfqi9: OK
    * {'id': '38765e0a-b752-4582-9b3e-cf2ea5cf0c4c', 'name': 'stockinfo.tasks.update_concept_task', 'args': '()', 'kwargs': '{}', 'type': 'stockinfo.tasks.update_concept_task', 'hostname': 'celery@nfvbfqi9', 'time_start': 1566133200.1085725, 'acknowledged': True, 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': False}, 'worker_pid': 2939}
    * {'id': '7cf4a044-bf15-4fc8-a168-c552b393a87b', 'name': 'stockinfo.tasks.clear_django_sessions', 'args': '()', 'kwargs': '{}', 'type': 'stockinfo.tasks.clear_django_sessions', 'hostname': 'celery@nfvbfqi9', 'time_start': 1566252300.1116867, 'acknowledged': True, 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': False}, 'worker_pid': 2940}

Two tasks keep running for 2 days. As the timestamp is two days ago. But either task should not consume time more than an hour. So they are stuck. But why and where.wink

[root@nfvbfqi9 mysite]# cat /proc/2939/stack
[<ffffffff8144b9b9>] sk_wait_data+0xd9/0xe0
[<ffffffff814a422b>] tcp_recvmsg+0x2cb/0xe80
[<ffffffff814c58ca>] inet_recvmsg+0x5a/0x90
[<ffffffff81449ab3>] sock_recvmsg+0x133/0x160
[<ffffffff81449c2e>] sys_recvfrom+0xee/0x180
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@nfvbfqi9 mysite]# cat /proc/2940/stack
[<ffffffff811939cb>] pipe_wait+0x5b/0x80
[<ffffffff81194476>] pipe_read+0x3e6/0x4e0
[<ffffffff81188dba>] do_sync_read+0xfa/0x140
[<ffffffff811896a5>] vfs_read+0xb5/0x1a0
[<ffffffff811897e1>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Set timeout to avoid infinite waiting

After inspecting the call stack, I found that one is stuck by TCP, the other is stuck by pipe wait. So there is no real deadlock here, but definite waiting operation is the source. Just set timeout and both issues are resolved.


Commenter Name
Cras sit amet nibh libero, in gravida nulla. Nulla vel metus scelerisque ante sollicitudin. Cras purus odio, vestibulum in vulputate at, tempus viverra turpis. Fusce condimentum nunc ac nisi vulputate fringilla. Donec lacinia congue felis in faucibus.
Commenter Name
Cras sit amet nibh libero, in gravida nulla. Nulla vel metus scelerisque ante sollicitudin. Cras purus odio, vestibulum in vulputate at, tempus viverra turpis. Fusce condimentum nunc ac nisi vulputate fringilla. Donec lacinia congue felis in faucibus.
Commenter Name
Cras sit amet nibh libero, in gravida nulla. Nulla vel metus scelerisque ante sollicitudin. Cras purus odio, vestibulum in vulputate at, tempus viverra turpis. Fusce condimentum nunc ac nisi vulputate fringilla. Donec lacinia congue felis in faucibus.
Search
Page view info
{'ip': '34.237.51.35', 'lat_lon': (39.0481, -77.4728), 'city': 'Ashburn', 'pv': 62949, 'cur_pv': 45, 'cur_daily_pv': 2, 'ur_pv': 1, 'ur_cur_all': 1, 'ur_today_all': 1, 'ur_today_cur': 1, 'daily_pv': 164, 'daily_ip': 122, 'whole_ip': 10012, 'ip_delta': 3}
AMap
Baidu Map