Skip to content

Random crash every ~30 minutes (BitcoinSV) #966

@angelix

Description

@angelix

I am experience a very weird situation where electrumX is shutting down every ~30 minutes of operation with the following log/trace.

Coin: BitcoinSV
ElectrumX: latest
Services: RPC, WebSockets
OS: Ubuntu 19.04 (in Docker)

2020-02-23 16:59:57,961:INFO:MemPool:9,722 txs 4.95 MB touching 16,060 addresses
2020-02-23 17:00:14,817:INFO:SessionManager:closing stale session ids 1188, 1189, 1191
2020-02-23 17:00:58,422:INFO:MemPool:10,057 txs 5.13 MB touching 16,714 addresses
2020-02-23 17:00:58,427:INFO:MemPool:compact fee histogram: [(1, 296155), (0, 4830318)]
2020-02-23 17:01:08,984:INFO:ElectrumX:[1355] WSS XXX.XXX.XXX.XXX:64322, 62 total
2020-02-23 17:01:14,818:INFO:SessionManager:closing stale session ids 1205, 1207, 1208
2020-02-23 17:01:17,312:INFO:ElectrumX:[1356] WSS XXX.XXX.XXX.XXX:58360, 56 total
2020-02-23 17:01:18,479:INFO:Prefetcher:new block height 623,338 hash 000000000000000001d98fc86bc30b4e213868b4ccbac220c22a72c99953d038
2020-02-23 17:01:19,646:INFO:DB:flush #981 took 0.1s.  Height 623,338 txs: 360,376,515 (+3,083)
2020-02-23 17:01:19,646:INFO:BlockProcessor:processed 1 block size 2.66 MB in 1.0s
2020-02-23 17:01:20,507:INFO:ElectrumX:[1359] WSS XXX.XXX.XXX.XXX:59456, 59 total
2020-02-23 17:01:24,603:INFO:ElectrumX:[1283] notified of 1 address
2020-02-23 17:01:24,603:INFO:ElectrumX:[1308] notified of 1 address
2020-02-23 17:01:24,604:INFO:ElectrumX:[1314] notified of 1 address
2020-02-23 17:01:24,604:INFO:ElectrumX:[1336] notified of 1 address
2020-02-23 17:01:24,604:INFO:ElectrumX:[1340] notified of 1 address
2020-02-23 17:01:25,481:INFO:SessionManager:closing down server for ws://0.0.0.0:5104
2020-02-23 17:01:25,481:INFO:SessionManager:closing down server for rpc://0.0.0.0:2104
2020-02-23 17:01:25,481:INFO:SessionManager:closing down server for wss://0.0.0.0:6104
2020-02-23 17:01:29,907:INFO:Prefetcher:cancelled; prefetcher stopping
2020-02-23 17:01:29,908:INFO:BlockProcessor:flushing to DB for a clean shutdown...
2020-02-23 17:01:29,908:INFO:Controller:shutting down
2020-02-23 17:01:29,909:INFO:Controller:shutdown complete
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Task exception was never retrieved
future: <Task finished coro=<WebSocketCommonProtocol.close() done, defined at /usr/local/lib/python3.7/dist-packages/websockets/protocol.py:653> exception=AttributeError("'WebSocketServerProtocol' object has no attribute 'transfer_data_task'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 698, in close
    self.transfer_data_task,
AttributeError: 'WebSocketServerProtocol' object has no attribute 'transfer_data_task'
Error in connection handler
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/electrumx/lib/server_base.py", line 125, in run
    await server_task
  File "/electrumx/lib/server_base.py", line 98, in serve
    await self.serve(shutdown_event)
  File "/electrumx/server/controller.py", line 134, in serve
    await group.spawn(wait_for_catchup())
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 335, in keep_synchronized
    await group.spawn(self._logging(synchronized_event))
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 232, in _refresh_hashes
    await self.api.on_mempool(touched, height)
  File "/electrumx/server/controller.py", line 67, in on_mempool
    await self._maybe_notify()
  File "/electrumx/server/controller.py", line 55, in _maybe_notify
    await self.notify(height, touched)
  File "/electrumx/server/session.py", line 766, in _notify_sessions
    await self._task_group.spawn(session.notify, touched, height_changed)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 148, in spawn
    self._add_task(task)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 123, in _add_task
    raise RuntimeError('task group is closed')
RuntimeError: task group is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/server.py", line 191, in handler
    await self.ws_handler(self, path)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 54, in ws_server
    await transport.process_messages()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 72, in process_messages
    await self.session.process_messages(self.recv_message)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/session.py", line 217, in process_messages
    await self._group.spawn(self._process_messages, recv_message)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 195, in join
    task = await self.next_done()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 161, in next_done
    await self._done_event.wait()
  File "/usr/lib/python3.7/asyncio/locks.py", line 293, in wait
    await fut
concurrent.futures._base.CancelledError
Error in connection handler
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/electrumx/lib/server_base.py", line 125, in run
    await server_task
  File "/electrumx/lib/server_base.py", line 98, in serve
    await self.serve(shutdown_event)
  File "/electrumx/server/controller.py", line 134, in serve
    await group.spawn(wait_for_catchup())
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 335, in keep_synchronized
    await group.spawn(self._logging(synchronized_event))
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 232, in _refresh_hashes
    await self.api.on_mempool(touched, height)
  File "/electrumx/server/controller.py", line 67, in on_mempool
    await self._maybe_notify()
  File "/electrumx/server/controller.py", line 55, in _maybe_notify
    await self.notify(height, touched)
  File "/electrumx/server/session.py", line 766, in _notify_sessions
    await self._task_group.spawn(session.notify, touched, height_changed)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 148, in spawn
    self._add_task(task)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 123, in _add_task
    raise RuntimeError('task group is closed')
RuntimeError: task group is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/server.py", line 191, in handler
    await self.ws_handler(self, path)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 54, in ws_server
    await transport.process_messages()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 72, in process_messages
    await self.session.process_messages(self.recv_message)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/session.py", line 217, in process_messages
    await self._group.spawn(self._process_messages, recv_message)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 195, in join
    task = await self.next_done()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 161, in next_done
    await self._done_event.wait()
  File "/usr/lib/python3.7/asyncio/locks.py", line 293, in wait
    await fut
concurrent.futures._base.CancelledError
Error in connection handler
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/electrumx/lib/server_base.py", line 125, in run
    await server_task
  File "/electrumx/lib/server_base.py", line 98, in serve
    await self.serve(shutdown_event)
  File "/electrumx/server/controller.py", line 134, in serve
    await group.spawn(wait_for_catchup())
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 335, in keep_synchronized
    await group.spawn(self._logging(synchronized_event))
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 232, in _refresh_hashes
    await self.api.on_mempool(touched, height)
  File "/electrumx/server/controller.py", line 67, in on_mempool
    await self._maybe_notify()
  File "/electrumx/server/controller.py", line 55, in _maybe_notify
    await self.notify(height, touched)
  File "/electrumx/server/session.py", line 766, in _notify_sessions
    await self._task_group.spawn(session.notify, touched, height_changed)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 148, in spawn
    self._add_task(task)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 123, in _add_task
    raise RuntimeError('task group is closed')
RuntimeError: task group is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/websockets/server.py", line 191, in handler
    await self.ws_handler(self, path)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 54, in ws_server
    await transport.process_messages()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 72, in process_messages
    await self.session.process_messages(self.recv_message)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/session.py", line 217, in process_messages
    await self._group.spawn(self._process_messages, recv_message)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 195, in join
    task = await self.next_done()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 161, in next_done
    await self._done_event.wait()
  File "/usr/lib/python3.7/asyncio/locks.py", line 293, in wait
    await fut
concurrent.futures._base.CancelledError
2020-02-23 17:01:40,036:ERROR:electrumx:ElectrumX server terminated abnormally
Traceback (most recent call last):
  File "electrumx_server", line 35, in main
    asyncio.run(controller.run())
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/electrumx/lib/server_base.py", line 125, in run
    await server_task
  File "/electrumx/lib/server_base.py", line 98, in serve
    await self.serve(shutdown_event)
  File "/electrumx/server/controller.py", line 134, in serve
    await group.spawn(wait_for_catchup())
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 335, in keep_synchronized
    await group.spawn(self._logging(synchronized_event))
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 242, in __aexit__
    await self.join()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 211, in join
    raise task.exception()
  File "/electrumx/server/mempool.py", line 232, in _refresh_hashes
    await self.api.on_mempool(touched, height)
  File "/electrumx/server/controller.py", line 67, in on_mempool
    await self._maybe_notify()
  File "/electrumx/server/controller.py", line 55, in _maybe_notify
    await self.notify(height, touched)
  File "/electrumx/server/session.py", line 766, in _notify_sessions
    await self._task_group.spawn(session.notify, touched, height_changed)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 148, in spawn
    self._add_task(task)
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/curio.py", line 123, in _add_task
    raise RuntimeError('task group is closed')
RuntimeError: task group is closed

Also getting randomly the following exceptions, don't know if related.

2020-02-23 16:56:23,241:INFO:ElectrumX:[1296] WSS XXX.XXX.XXX.XXX:52886, 76 total
Task exception was never retrieved
future: <Task finished coro=<RPCSession._process_messages() done, defined at /usr/local/lib/python3.7/dist-packages/aiorpcx/session.py:432> exception=ConnectionClosedOK('code = 1000 (OK), no reason')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/session.py", line 435, in _process_messages
    message = await recv_message()
  File "/usr/local/lib/python3.7/dist-packages/aiorpcx/websocket.py", line 63, in recv_message
    message = await self.websocket.recv()
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 509, in recv
    await self.ensure_open()
  File "/usr/local/lib/python3.7/dist-packages/websockets/protocol.py", line 812, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: code = 1000 (OK), no reason
2020-02-23 16:56:25,560:INFO:ElectrumX:[1259] notified of 1 address

I'll continue investigating the issue. Any insights are welcome.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions