Use better indexes for job history table

## Background

The `bgw_job_history` table currently have the following indexes:

- `bgw_job_stat_history_pkey` is the primary key used to index the record id (which is different from the job id)
- `bgw_job_stat_history_job_id_idx` is an index used for queries on the job id (there can be many records with the same job).

## Problem

Many queries of interest on the history table will involve the execution start time, for example, to get the last $1 entries for all jobs starting execution after a particular time you can use this query, which involves a qualifier on both `job_id` and `execution_start`. This kind of query is common when you want to find last N entries but ignore old jobs (to speed up the query, for example).

```sql
select
    id,
    job_id,
    execution_start,
    execution_finish,
    execution_finish - execution_start as execution_time,
    succeeded
from (
    select *,
        row_number() over (partition by job_id order by execution_start desc) as row_number
    from job_stat_history
    where
        job_id = any($2) and execution_start > $3
) t
where row_number <= $1
order by job_id, execution_time;
```

Other alternatives are a query only having a qualifier on `job_id`

```sql
select
    id,
    job_id,
    execution_start,
    execution_finish,
    execution_finish - execution_start as execution_time,
    succeeded
from (
    select *,
        row_number() over (partition by job_id order by execution_start desc) as row_number
    from job_stat_history
    where
        job_id = any($2)
) t
where row_number <= $1
order by job_id, execution_time;
```

and a query that has qualifier on only `execution_time`

```sql
select
    id,
    job_id,
    execution_start,
    execution_finish,
    execution_finish - execution_start as execution_time,
    succeeded
from (
    select *,
        row_number() over (partition by job_id order by execution_start desc) as row_number
    from job_stat_history
   where execution_start > $2
) t
where row_number <= $1
order by job_id, execution_time;
```

Looking for information for a particular `job_id` is expected to be a common ad-hoc query:

```sql
select
    id,
    job_id,
    execution_start,
    execution_finish,
    execution_finish - execution_start as execution_time,
    succeeded
from job_stat_history
where job_id = 4
order by execution_finish desc
limit 5;
```

## Proposal

Drop the `bgw_job_stat_history_job_id_idx` and recreate one that involves the execution start time to allow efficient queries involving the start time.

The `job_id` index is not very useful, even when looking for information on a particular `job_id`, since the number of jobs is not very high and a sequence scan with a filter is about as efficient for this query. An index would allow a Bitmap Heap Scan, but it is marginally faster in tests.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use better indexes for job history table #8573

Background

Problem

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use better indexes for job history table #8573

Description

Background

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions