Skip to content

Urgent: Oplog Resolver Intermittently doesn't Complete Resolving #277

@corey-hammerton

Description

@corey-hammerton

On our backup server we experience cases where backups fail to complete successfully because the Oplog Resolver from a previous backup didn't complete. This happens on many of our backup configurations using default oplog configuration settings.

The ResolverThreads to complete successfully, as displayed by the logs below with hostnames redacted. The resolver, however, doesn't free the threads and continues with the backup. Manually killing the defunct processes releases the locks and the MainProcess thread logs that Oplog resolving completed.
Killing the processes on the server with high Virtual Memory Size achieves this.

[2018-09-15 00:23:33,726] [INFO] [PoolWorker-11] [ResolverThread:run:36] Resolving oplog for XXXXXXXXXXXXXXXXXXXXX:10000 to max ts: Timestamp(1536970998, 0)
[2018-09-15 00:23:33,796] [INFO] [PoolWorker-10] [ResolverThread:run:36] Resolving oplog for XXXXXXXXXXXXXXXXXXXXX:10001 to max ts: Timestamp(1536970998, 0)
[2018-09-15 00:23:33,828] [INFO] [PoolWorker-11] [ResolverThread:run:60] Applied 56 oplog changes to XXXXXXXXXXXXXXXXXXXXX:10000 oplog, end ts: Timestamp(1536970999, 1)
[2018-09-15 00:23:34,029] [INFO] [PoolWorker-10] [ResolverThread:run:60] Applied 205 oplog changes to XXXXXXXXXXXXXXXXXXXXX:10001 oplog, end ts: Timestamp(1536970997, 50)
[2018-09-17 02:17:23,739] [INFO] [MainProcess] [Resolver:run:142] Oplog resolving completed in 0.00 seconds

In our testing and debugging we have narrowed it down to https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/mongodb_consistent_backup/Oplog/Resolver/Resolver.py#L105. Please Advise.

Generic server information:

OS: CentOS 7.5.1804
YUM Package Version: 1.3.0
Python Version: 2.7.5
CPU Count: 8
RAM Size: 15G

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions