Skip to content

Commit 677f3f0

Browse files
committed
Fix hasSingleTagInsideElement method
It would fail for e.g. `<div> <p>foo</p> </div>`. mozilla/readability uses children for the tag lookup, which return only elements. PHP does not have children property so b580cf2 mistakenly used `childNodes` instead, but that can return any node type. Let’s filter the children ourselves. Also add comments from mozilla/readability’s `_hasSingleTagInsideElement`.
1 parent 2912276 commit 677f3f0

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

src/Readability.php

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1477,14 +1477,23 @@ private function isPhrasingContent($node): bool
14771477
);
14781478
}
14791479

1480+
/**
1481+
* Checks if `$node` has only whitespace and a single element with `$tag` for the tag name.
1482+
* Returns false if `$node` contains non-empty text nodes
1483+
* or if it contains no element with given tag or more than 1 element.
1484+
*/
14801485
private function hasSingleTagInsideElement(\DOMElement $node, string $tag): bool
14811486
{
1482-
if (1 !== $node->childNodes->length || $node->childNodes->item(0)->nodeName !== $tag) {
1487+
$childNodes = iterator_to_array($node->childNodes);
1488+
$children = array_filter($childNodes, fn ($childNode) => $childNode instanceof \DOMElement);
1489+
1490+
// There should be exactly 1 element child with given tag
1491+
if (1 !== \count($children) || $children[0]->nodeName !== $tag) {
14831492
return false;
14841493
}
14851494

14861495
$a = array_filter(
1487-
iterator_to_array($node->childNodes),
1496+
$childNodes,
14881497
fn ($childNode) => $childNode instanceof \DOMText && preg_match($this->regexps['hasContent'], $this->getInnerText($childNode))
14891498
);
14901499

0 commit comments

Comments
 (0)