Skip to content

Modifying file in place corrupts directory records #122

@Goshin

Description

@Goshin

The bug was recorded back in #71 (comment). After some debugging, I figured out the cause.

When modifying the file in place, the library will try to update the file length in its directory record:

pycdlib/pycdlib/pycdlib.py

Lines 4501 to 4528 in bc5187a

# Finally write out the directory record entry.
# This is a little tricky because of what things mean. First of all,
# child.extents_to_here represents the total number of extents up to
# this child in the parent. Thus, to get the absolute extent offset,
# we start with the parent's extent location, add on the number of
# extents to here, and remove 1 (since our offset will be zero-based).
# Second, child.offset_to_here is the *last* byte that the child uses,
# so to get the start of it we subtract off the length of the child.
# Then we can multiply the extent location by the logical block size,
# add on the offset, and get to the absolute location in the file.
first_joliet = True
for record, is_pvd_unused in child.inode.linked_records:
if isinstance(record, dr.DirectoryRecord):
if self.joliet_vd is not None and id(record.vd) == id(self.joliet_vd) and first_joliet:
first_joliet = False
self.joliet_vd.remove_from_space_size(record.get_data_length())
self.joliet_vd.add_to_space_size(length)
if record.parent is None:
raise pycdlibexception.PyCdlibInternalError('Modifying file with empty parent')
abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
offset = record.offset_to_here - record.dr_len
abs_offset = abs_extent_loc * self.logical_block_size + offset
elif isinstance(record, udfmod.UDFFileEntry):
abs_offset = record.extent_location() * self.logical_block_size
record.set_data_length(length)
self._cdfp.seek(abs_offset)
self._cdfp.write(record.record())

The absolute offset of the directory record is calculated based on the parent's extent location and the offset within the parent's directory records:

pycdlib/pycdlib/pycdlib.py

Lines 4520 to 4522 in bc5187a

abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
offset = record.offset_to_here - record.dr_len
abs_offset = abs_extent_loc * self.logical_block_size + offset

Here, the record.offset_to_here value is calculated by summing up previous siblings' lengths up to itself:

pycdlib/pycdlib/dr.py

Lines 703 to 735 in bc5187a

def _recalculate_extents_and_offsets(self, index, logical_block_size):
# type: (int, int) -> Tuple[int, int]
"""
Internal method to recalculate the extents and offsets associated with
children of this directory record.
Parameters:
index - The index at which to start the recalculation.
logical_block_size - The block size to use for comparisons.
Returns:
A tuple where the first element is the total number of extents required
by the children and where the second element is the offset into the
last extent currently being used.
"""
if index == 0:
dirrecord_offset = 0
num_extents = 1
else:
dirrecord_offset = self.children[index - 1].offset_to_here
num_extents = self.children[index - 1].extents_to_here
for i in range(index, len(self.children)):
c = self.children[i]
dirrecord_len = c.dr_len
if (dirrecord_offset + dirrecord_len) > logical_block_size:
num_extents += 1
dirrecord_offset = 0
dirrecord_offset += dirrecord_len
c.extents_to_here = num_extents
c.offset_to_here = dirrecord_offset
c.index_in_parent = i
return num_extents, dirrecord_offset

However, Pycdlib maintains the children list in sorted order when walking directory records and adding children:

pycdlib/pycdlib/dr.py

Lines 760 to 775 in bc5187a

# First ensure that this is not a duplicate. For speed purposes, we
# recognize that bisect_left will always choose an index to the *left*
# of a duplicate child. Thus, to check for duplicates we only need to
# see if the child to be added is a duplicate with the entry that
# bisect_left returned.
index = bisect.bisect_left(self.children, child)
if index != len(self.children) and self.children[index].file_ident == child.file_ident:
if not self.children[index].is_associated_file() and not child.is_associated_file():
if not (self.rock_ridge is not None and self.file_identifier() == b'RR_MOVED'):
if not allow_duplicate:
raise pycdlibexception.PyCdlibInvalidInput('Failed adding duplicate name to parent')
self.children[index].data_continuation = child
self.children[index].file_flags |= (1 << self.FILE_FLAG_MULTI_EXTENT_BIT)
index += 1
self.children.insert(index, child)

In this case, the index of a directory record will likely differ from its actual index in the ISO file. Therefore, the directory record offset will be wrong when writing the length back to the ISO file, and some random sibling records will be corrupted.


The solution I can come up with is to rewrite the parent's directory records together. As a temporary fix, one can simply invoke self._write_directory_records() to override all the directory records.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions