-
Notifications
You must be signed in to change notification settings - Fork 188
Description
Problem Description
Queries that search for artifacts containing specific package versions are slow on large GUAC databases. This performance issue stems from a missing index on a key table, forcing the database to perform a full Sequential Scan on a table with millions of rows. This significantly impacts the usability of GUAC for dependency and vulnerability lookups at scale.
During performance analysis on a database with over 6 million entries in bill_of_materials_included_software_packages, we identified a primary bottleneck in the query execution plan.
Analysis & Evidence
The EXPLAIN plan for a typical package search query revealed a highly inefficient operation: a Seq Scan on bill_of_materials_included_software_packages, which is the largest bottleneck, scanning over 6.7 million rows.
See EXPLAIN plan BEFORE adding indexes
QUERY PLAN
Gather (cost=1300.19..219481.05 rows=105 width=25)
Workers Planned: 2
-> Hash Join (cost=300.19..218470.55 rows=44 width=25)
Hash Cond: (bom.artifact_id = a.id)
-> Nested Loop (cost=298.49..218463.58 rows=2006 width=41)
-> Hash Join (cost=298.06..217456.86 rows=2006 width=41)
Hash Cond: (bom_pkg.package_version_id = pv.id)
-> Parallel Seq Scan on bill_of_materials_included_software_packages bom_pkg (cost=0.00..191915.30 rows=6725830 width=32)
-> Hash (cost=298.02..298.02 rows=3 width=41)
-> Hash Join (cost=22.32..298.02 rows=3 width=41)
Hash Cond: (pv.name_id = pn.id)
Join Filter: ((((pn.name)::text = 'stdlib'::text) AND ((pv.version)::text = '1.20'::text)) OR ((pn.name)::text = 'openssl'::text) OR (((pn.name)::text = 'ant'::text) AND ((pv.version)::text = '1.10.14'::text)))
-> Seq Scan on package_versions pv (cost=0.00..251.23 rows=9323 width=43)
-> Hash (cost=22.28..22.28 rows=3 width=30)
-> Bitmap Heap Scan on package_names pn (cost=12.86..22.28 rows=3 width=30)
Recheck Cond: (((name)::text = 'stdlib'::text) OR ((name)::text = 'openssl'::text) OR ((name)::text = 'ant'::text))
-> BitmapOr (cost=12.86..12.86 rows=3 width=0)
-> Bitmap Index Scan on packagename_name_namespace_type (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((name)::text = 'stdlib'::text)
-> Bitmap Index Scan on packagename_name_namespace_type (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((name)::text = 'openssl'::text)
-> Bitmap Index Scan on packagename_name_namespace_type (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((name)::text = 'ant'::text)
-> Index Scan using bill_of_materials_pkey on bill_of_materials bom (cost=0.43..0.50 rows=1 width=32)
Index Cond: (id = bom_pkg.bill_of_materials_id)
-> Hash (cost=1.69..1.69 rows=1 width=16)
-> Seq Scan on artifacts a (cost=0.00..1.69 rows=1 width=16)
Filter: (((algorithm)::text = 'sha256'::text) AND ((digest)::text = '2b3dd23a9614ad08a6af9faab29e2e9685c5bfd3088e769b7a5cdb9d76018349'::text))See EXPLAIN plan AFTER adding indexes
QUERY PLAN
Hash Join (cost=24.89..3447.70 rows=107 width=25)
Hash Cond: (bom.artifact_id = a.id)
-> Nested Loop (cost=23.18..3433.02 rows=4943 width=41)
-> Nested Loop (cost=22.75..959.28 rows=4943 width=41)
-> Hash Join (cost=22.32..298.02 rows=3 width=41)
Hash Cond: (pv.name_id = pn.id)
Join Filter: ((((pn.name)::text = 'stdlib'::text) AND ((pv.version)::text = '1.20'::text)) OR ((pn.name)::text = 'openssl'::text) OR (((pn.name)::text = 'ant'::text) AND ((pv.version)::text = '1.10.14'::text)))
-> Seq Scan on package_versions pv (cost=0.00..251.23 rows=9323 width=43)
-> Hash (cost=22.28..22.28 rows=3 width=30)
-> Bitmap Heap Scan on package_names pn (cost=12.86..22.28 rows=3 width=30)
Recheck Cond: (((name)::text = 'stdlib'::text) OR ((name)::text = 'openssl'::text) OR ((name)::text = 'ant'::text))
-> BitmapOr (cost=12.86..12.86 rows=3 width=0)
-> Bitmap Index Scan on packagename_name_namespace_type (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((name)::text = 'stdlib'::text)
-> Bitmap Index Scan on packagename_name_namespace_type (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((name)::text = 'openssl'::text)
-> Bitmap Index Scan on packagename_name_namespace_type (cost=0.00..4.29 rows=1 width=0)
Index Cond: ((name)::text = 'ant'::text)
-> Index Scan using idx_bom_included_packages_pkg_ver_id on bill_of_materials_included_software_packages bom_pkg (cost=0.43..162.12 rows=5830 width=32)
Index Cond: (package_version_id = pv.id)
-> Index Scan using bill_of_materials_pkey on bill_of_materials bom (cost=0.43..0.50 rows=1 width=32)
Index Cond: (id = bom_pkg.bill_of_materials_id)
-> Hash (cost=1.69..1.69 rows=1 width=16)
-> Seq Scan on artifacts a (cost=0.00..1.69 rows=1 width=16)
Filter: (((algorithm)::text = 'sha256'::text) AND ((digest)::text =
'2b3dd23a9614ad08a6af9faab29e2e9685c5bfd3088e769b7a5cdb9d76018349'::text))Proposed Solution
To resolve this performance issue, we propose adding an index on the package_version_id column of the bill_of_materials_included_software_packages table.
CREATE INDEX idx_bom_included_packages_pkg_ver_id
ON bill_of_materials_included_software_packages (package_version_id);
Impact
Adding this index provides a significant performance improvement for a fundamental GUAC use case. The estimated query cost dropped from ~219,000 to ~3,400 (a ~60x improvement).