Koito/thoughts/plans/20260325_bulk_import_optimization.md

# Bulk Import Optimization — Implementation Plan

## Overview

Optimize Koito's import pipeline from ~20 listens/min to thousands/min by adopting ListenBrainz-inspired patterns: a persistent entity lookup cache, batch DB writes, pre-deduplication, and deferred enrichment. The core insight from the ecosystem research is **write raw first, enrich async** — and the persistent lookup table benefits all scrobbles (live + import), not just bulk imports.

## Current State Analysis

### The Problem

Importing 49,050 Maloja scrobbles takes ~24 hours. The import is stable (our PR 1 fixes eliminated panics) but each scrobble runs through the full `SubmitListen` path:

- **`GetArtist`** issues 6 DB queries per lookup (including rank computation via window functions)
- **`GetAlbum`** issues 6 DB queries per lookup
- **`GetTrack`** issues 5-6 DB queries per lookup
- **`GetArtistImage` / `GetAlbumImage`** makes HTTP calls even when all image providers are disabled
- **`SaveListen`** is a single INSERT — the only fast part

Per unique scrobble: ~18 DB round-trips + 2 image lookups. Per repeated scrobble: ~18 DB round-trips (no caching). With 5,589 unique artists, 2,628 unique albums, and 49,050 total scrobbles, this is massively redundant.

### Key Discoveries

- `SubmitListenOpts.SkipSaveListen` (`catalog.go:43`) can be used to create entities without recording a listen — useful for entity pre-creation
- `SubmitListenOpts.SkipCacheImage` (`catalog.go:46`) controls image download but NOT image URL resolution — the HTTP calls still happen
- The Koito native importer (`importer/koito.go`) already bypasses `SubmitListen` and does direct DB calls — a precedent for a faster import path
- `pgxpool.Pool` is goroutine-safe — concurrent DB operations are safe at the pool level
- `SaveListen` SQL uses `ON CONFLICT DO NOTHING` — re-importing is idempotent
- No batch insert methods exist anywhere in the codebase
- `GetArtist`/`GetAlbum`/`GetTrack` compute full stats (listen count, time listened, rank) on every call — unnecessary during import

### Ecosystem Patterns (from research)

- **ListenBrainz**: Stores raw scrobbles immediately, resolves MBIDs asynchronously via background worker + Typesense index. Uses MessyBrainz as a stable `(artist, track, release) → ID` mapping.
- **Maloja**: Runs every import through the full normalize → dedup → cache-invalidate cycle. Works for live scrobbles, kills bulk import. **This is exactly Koito's current problem.**
- **Last.fm**: Resolves metadata at write time (corrections), batches up to 50 scrobbles per request.
- **General**: DB-level dedup via unique constraint + `ON CONFLICT` is the industry standard.

## Desired End State

1. A `track_lookup` table provides O(1) entity resolution for any `(artist, track, album)` tuple — both live and import scrobbles benefit
2. All 5 importers use a shared `BulkSubmitter` that pre-deduplicates, creates entities in parallel, and batch-inserts listens
3. Image/MBZ enrichment is fully deferred to existing background tasks during import
4. 49k Maloja import completes in **under 30 minutes** (vs 24 hours currently)
5. Live scrobbles are faster too — cache hit skips 18 DB queries, goes straight to 1 SELECT + 1 INSERT

### Verification

- `go build ./...` compiles
- `go test ./...` passes (existing + new tests)
- Manual: import 49k Maloja scrobbles in under 30 minutes on vo-pc
- Manual: verify live scrobbles from multi-scrobbler still work correctly
- Manual: verify album art appears after background image backfill runs

## What We're NOT Doing

- **Replacing the DB engine** (no TimescaleDB, no Redis) — Postgres is fine for self-hosted scale
- **Local MusicBrainz mirror or Typesense index** — overkill for single-user; the live API + background enrichment is sufficient
- **Changing the live `SubmitListen` API path** — the lookup cache makes it faster, but the logic stays the same
- **Parallelizing live scrobbles** — only imports use the worker pool; live scrobbles remain single-threaded through `SubmitListen`
- **Changing the ListenBrainz/Last.fm relay** — multi-scrobbler handles that independently

## Implementation Approach

Adopt ListenBrainz's "MessyBrainz" pattern as a persistent Postgres table: a normalized `(artist, track, album)` tuple maps to resolved `(artist_id, album_id, track_id)`. This is the foundational optimization — everything else builds on it.

```
Before (per scrobble):
  GetArtist (6 queries) → GetAlbum (6 queries) → GetTrack (6 queries) → SaveListen (1 query)
  = 19 queries minimum

After (cache hit):
  SELECT FROM track_lookup (1 query) → SaveListen (1 query)
  = 2 queries

After (bulk import, cache hit):
  In-memory map lookup (0 queries) → batched SaveListen
  = amortized ~0.01 queries per scrobble
```

---

## Phase 1: `track_lookup` Cache Table

### Overview

Add a persistent lookup table that maps normalized `(artist_name, track_title, release_title)` to resolved entity IDs. Integrate into `SubmitListen` so both live and import scrobbles benefit.

### Changes Required

#### 1. New Migration

**File**: `db/migrations/000006_track_lookup.sql`

```sql
-- +goose Up
CREATE TABLE track_lookup (
    lookup_key TEXT NOT NULL PRIMARY KEY,
    artist_id INT NOT NULL REFERENCES artists(id) ON DELETE CASCADE,
    album_id INT NOT NULL REFERENCES releases(id) ON DELETE CASCADE,
    track_id INT NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_track_lookup_track_id ON track_lookup(track_id);
CREATE INDEX idx_track_lookup_artist_id ON track_lookup(artist_id);
CREATE INDEX idx_track_lookup_album_id ON track_lookup(album_id);

-- +goose Down
DROP TABLE IF EXISTS track_lookup;
```

The `lookup_key` is a normalized string: `lower(artist) || '\x00' || lower(track) || '\x00' || lower(album)`. Using a single TEXT key with a null-byte separator is simpler and faster than a multi-column composite key with `citext`.

#### 2. New SQL Queries

**File**: `db/queries/track_lookup.sql`

```sql
-- name: GetTrackLookup :one
SELECT artist_id, album_id, track_id
FROM track_lookup
WHERE lookup_key = $1;

-- name: InsertTrackLookup :exec
INSERT INTO track_lookup (lookup_key, artist_id, album_id, track_id)
VALUES ($1, $2, $3, $4)
ON CONFLICT (lookup_key) DO UPDATE SET
    artist_id = EXCLUDED.artist_id,
    album_id = EXCLUDED.album_id,
    track_id = EXCLUDED.track_id;

-- name: DeleteTrackLookupByArtist :exec
DELETE FROM track_lookup WHERE artist_id = $1;

-- name: DeleteTrackLookupByAlbum :exec
DELETE FROM track_lookup WHERE album_id = $1;

-- name: DeleteTrackLookupByTrack :exec
DELETE FROM track_lookup WHERE track_id = $1;
```

#### 3. Regenerate sqlc

Run `sqlc generate` to create the Go bindings in `internal/repository/`.

#### 4. DB Interface + Psql Implementation

**File**: `internal/db/db.go` — Add to interface:

```go
// Track Lookup Cache
GetTrackLookup(ctx context.Context, key string) (*TrackLookupResult, error)
SaveTrackLookup(ctx context.Context, opts SaveTrackLookupOpts) error
InvalidateTrackLookup(ctx context.Context, opts InvalidateTrackLookupOpts) error
```

**File**: `internal/db/opts.go` — Add types:

```go
type TrackLookupResult struct {
    ArtistID int32
    AlbumID  int32
    TrackID  int32
}

type SaveTrackLookupOpts struct {
    Key      string
    ArtistID int32
    AlbumID  int32
    TrackID  int32
}

type InvalidateTrackLookupOpts struct {
    ArtistID int32
    AlbumID  int32
    TrackID  int32
}
```

**File**: `internal/db/psql/track_lookup.go` — New file implementing the three methods.

#### 5. Lookup Key Helper

**File**: `internal/catalog/lookup_key.go` — New file:

```go
package catalog

import "strings"

// TrackLookupKey builds a normalized cache key for entity resolution.
func TrackLookupKey(artist, track, album string) string {
    return strings.ToLower(artist) + "\x00" + strings.ToLower(track) + "\x00" + strings.ToLower(album)
}
```

#### 6. Integrate into SubmitListen

**File**: `internal/catalog/catalog.go` — Add fast path at the top of `SubmitListen`:

```go
func SubmitListen(ctx context.Context, store db.DB, opts SubmitListenOpts) error {
    l := logger.FromContext(ctx)

    if opts.Artist == "" || opts.TrackTitle == "" {
        return errors.New("track name and artist are required")
    }

    opts.Time = opts.Time.Truncate(time.Second)

    // Fast path: check lookup cache for known entity combo
    if !opts.SkipSaveListen {
        key := TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
        cached, err := store.GetTrackLookup(ctx, key)
        if err == nil && cached != nil {
            l.Debug().Msg("Track lookup cache hit — skipping entity resolution")
            return store.SaveListen(ctx, db.SaveListenOpts{
                TrackID: cached.TrackID,
                Time:    opts.Time,
                UserID:  opts.UserID,
                Client:  opts.Client,
            })
        }
    }

    // ... existing SubmitListen logic (unchanged) ...

    // After successful entity resolution, populate the cache
    store.SaveTrackLookup(ctx, db.SaveTrackLookupOpts{
        Key:      TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle),
        ArtistID: artists[0].ID,
        AlbumID:  rg.ID,
        TrackID:  track.ID,
    })

    // ... rest of existing logic ...
}
```

Note: The cache only applies when we have a direct artist+track+album text match. Scrobbles with MBZ IDs that resolve to different text representations will still go through the full path on first encounter, then be cached.

#### 7. Invalidation on Merge/Delete

**File**: `internal/db/psql/artist.go` — In `DeleteArtist` and `MergeArtists`, add:
```go
d.q.DeleteTrackLookupByArtist(ctx, id)
```

**File**: `internal/db/psql/album.go` — In `DeleteAlbum` and `MergeAlbums`, add:
```go
d.q.DeleteTrackLookupByAlbum(ctx, id)
```

**File**: `internal/db/psql/track.go` — In `DeleteTrack` and `MergeTracks`, add:
```go
d.q.DeleteTrackLookupByTrack(ctx, id)
```

### Success Criteria

- [ ] `go build ./...` compiles
- [ ] `go test ./...` passes
- [ ] New test: `TestSubmitListen_LookupCacheHit` — second identical scrobble uses cache
- [ ] New test: `TestSubmitListen_LookupCacheInvalidateOnDelete` — deleting entity clears cache
- [ ] Migration applies cleanly on fresh and existing databases
- [ ] Live scrobbles from multi-scrobbler populate the cache on first hit, use it on second

---

## Phase 2: `SaveListensBatch` DB Method

### Overview

Add a batch listen insert method using pgx's `CopyFrom` for high-throughput listen insertion. This is the DB foundation for the BulkSubmitter.

### Changes Required

#### 1. New SQL + DB Interface

**File**: `internal/db/db.go` — Add to interface:

```go
SaveListensBatch(ctx context.Context, opts []SaveListenOpts) (int64, error)
```

Returns the number of rows actually inserted (excluding `ON CONFLICT` duplicates).

#### 2. Psql Implementation

**File**: `internal/db/psql/listen.go` — New method:

```go
func (d *Psql) SaveListensBatch(ctx context.Context, opts []db.SaveListenOpts) (int64, error) {
    if len(opts) == 0 {
        return 0, nil
    }

    // Use a transaction with a temp table + INSERT ... ON CONFLICT pattern
    // since CopyFrom doesn't support ON CONFLICT directly
    tx, err := d.conn.BeginTx(ctx, pgx.TxOptions{})
    if err != nil {
        return 0, fmt.Errorf("SaveListensBatch: BeginTx: %w", err)
    }
    defer tx.Rollback(ctx)

    // Create temp table
    _, err = tx.Exec(ctx, `
        CREATE TEMP TABLE tmp_listens (
            track_id INT,
            listened_at TIMESTAMPTZ,
            user_id INT,
            client TEXT
        ) ON COMMIT DROP
    `)
    if err != nil {
        return 0, fmt.Errorf("SaveListensBatch: create temp table: %w", err)
    }

    // CopyFrom into temp table
    rows := make([][]interface{}, len(opts))
    for i, o := range opts {
        var client interface{}
        if o.Client != "" {
            client = o.Client
        }
        rows[i] = []interface{}{o.TrackID, o.Time, o.UserID, client}
    }

    _, err = tx.CopyFrom(ctx,
        pgx.Identifier{"tmp_listens"},
        []string{"track_id", "listened_at", "user_id", "client"},
        pgx.CopyFromRows(rows),
    )
    if err != nil {
        return 0, fmt.Errorf("SaveListensBatch: CopyFrom: %w", err)
    }

    // Insert from temp table with dedup
    tag, err := tx.Exec(ctx, `
        INSERT INTO listens (track_id, listened_at, user_id, client)
        SELECT track_id, listened_at, user_id, client FROM tmp_listens
        ON CONFLICT DO NOTHING
    `)
    if err != nil {
        return 0, fmt.Errorf("SaveListensBatch: insert: %w", err)
    }

    if err := tx.Commit(ctx); err != nil {
        return 0, fmt.Errorf("SaveListensBatch: Commit: %w", err)
    }

    return tag.RowsAffected(), nil
}
```

This uses the standard `CopyFrom → temp table → INSERT ON CONFLICT` pattern, which is the fastest bulk insert approach with pgx while still supporting deduplication.

### Success Criteria

- [ ] `go build ./...` compiles
- [ ] `go test ./...` passes
- [ ] New test: `TestSaveListensBatch` — insert 1000 listens, verify count
- [ ] New test: `TestSaveListensBatch_Dedup` — insert duplicates, verify no double-counting
- [ ] New test: `TestSaveListensBatch_Empty` — empty input returns 0, no error

---

## Phase 3: `BulkSubmitter` Helper

### Overview

A reusable import accelerator that all importers can use. Pre-deduplicates scrobbles in memory, resolves entities via the `track_lookup` cache (falling back to `SubmitListen` on cache miss), and batch-inserts listens.

### Design

```
BulkSubmitter
├── Accept(SubmitListenOpts)     — buffer a scrobble
├── Flush() (int, error)         — process all buffered scrobbles
│   ├── Phase A: Deduplicate by (artist, track, album) key
│   ├── Phase B: Resolve entities
│   │   ├── Check track_lookup cache (single SELECT)
│   │   ├── On miss: call SubmitListen(SkipSaveListen=true) to create entities
│   │   ├── Worker pool: N goroutines for parallel entity creation
│   │   └── Populate track_lookup cache after creation
│   ├── Phase C: Map all scrobbles to track_ids via resolved cache
│   └── Phase D: SaveListensBatch
└── Progress callback for logging
```

### Changes Required

#### 1. New Package

**File**: `internal/importer/bulk.go`

```go
package importer

import (
    "context"
    "sync"

    "github.com/gabehf/koito/internal/catalog"
    "github.com/gabehf/koito/internal/db"
    "github.com/gabehf/koito/internal/logger"
    "github.com/gabehf/koito/internal/mbz"
)

type BulkSubmitter struct {
    store      db.DB
    mbzc       mbz.MusicBrainzCaller
    ctx        context.Context
    buffer     []catalog.SubmitListenOpts
    workers    int
    onProgress func(imported, total int)
}

type BulkSubmitterOpts struct {
    Store      db.DB
    Mbzc       mbz.MusicBrainzCaller
    Workers    int                        // default 4
    OnProgress func(imported, total int)  // called every 500 items
}

func NewBulkSubmitter(ctx context.Context, opts BulkSubmitterOpts) *BulkSubmitter {
    workers := opts.Workers
    if workers <= 0 {
        workers = 4
    }
    return &BulkSubmitter{
        store:      opts.Store,
        mbzc:       opts.Mbzc,
        ctx:        ctx,
        workers:    workers,
        onProgress: opts.OnProgress,
    }
}

func (bs *BulkSubmitter) Accept(opts catalog.SubmitListenOpts) {
    bs.buffer = append(bs.buffer, opts)
}

func (bs *BulkSubmitter) Flush() (int, error) {
    l := logger.FromContext(bs.ctx)
    if len(bs.buffer) == 0 {
        return 0, nil
    }

    l.Info().Msgf("BulkSubmitter: Processing %d scrobbles", len(bs.buffer))

    // Phase A: Deduplicate — find unique (artist, track, album) tuples
    type entityKey = string
    unique := make(map[entityKey]catalog.SubmitListenOpts)
    for _, opts := range bs.buffer {
        key := catalog.TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
        if _, exists := unique[key]; !exists {
            unique[key] = opts
        }
    }
    l.Info().Msgf("BulkSubmitter: %d unique entity combos from %d scrobbles", len(unique), len(bs.buffer))

    // Phase B: Resolve entities — check cache, create on miss
    resolved := make(map[entityKey]int32) // key → trackID
    var mu sync.Mutex
    var wg sync.WaitGroup
    sem := make(chan struct{}, bs.workers)
    resolveCount := 0

    for key, opts := range unique {
        // Check track_lookup cache first
        cached, err := bs.store.GetTrackLookup(bs.ctx, key)
        if err == nil && cached != nil {
            mu.Lock()
            resolved[key] = cached.TrackID
            resolveCount++
            mu.Unlock()
            continue
        }

        // Cache miss — create entities via SubmitListen (with worker pool)
        wg.Add(1)
        sem <- struct{}{} // acquire worker slot
        go func(k entityKey, o catalog.SubmitListenOpts) {
            defer wg.Done()
            defer func() { <-sem }() // release worker slot

            o.SkipSaveListen = true
            o.SkipCacheImage = true
            err := catalog.SubmitListen(bs.ctx, bs.store, o)
            if err != nil {
                l.Err(err).Msgf("BulkSubmitter: Failed to create entities for '%s' by '%s'", o.TrackTitle, o.Artist)
                return
            }

            // Re-check cache (SubmitListen populates it in Phase 1's integration)
            cached, err := bs.store.GetTrackLookup(bs.ctx, k)
            if err == nil && cached != nil {
                mu.Lock()
                resolved[k] = cached.TrackID
                mu.Unlock()
            }
        }(key, opts)
    }
    wg.Wait()

    l.Info().Msgf("BulkSubmitter: Resolved %d/%d entity combos", len(resolved), len(unique))

    // Phase C: Build listen batch
    batch := make([]db.SaveListenOpts, 0, len(bs.buffer))
    skipped := 0
    for _, opts := range bs.buffer {
        key := catalog.TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
        trackID, ok := resolved[key]
        if !ok {
            skipped++
            continue
        }
        batch = append(batch, db.SaveListenOpts{
            TrackID: trackID,
            Time:    opts.Time.Truncate(time.Second),
            UserID:  opts.UserID,
            Client:  opts.Client,
        })
    }
    if skipped > 0 {
        l.Warn().Msgf("BulkSubmitter: Skipped %d scrobbles with unresolved entities", skipped)
    }

    // Phase D: Batch insert listens
    inserted, err := bs.store.SaveListensBatch(bs.ctx, batch)
    if err != nil {
        return 0, fmt.Errorf("BulkSubmitter: SaveListensBatch: %w", err)
    }

    l.Info().Msgf("BulkSubmitter: Inserted %d listens (%d duplicates skipped)", inserted, int64(len(batch))-inserted)
    return int(inserted), nil
}
```

#### 2. TOCTOU Safety for Parallel Entity Creation

The worker pool creates entities via `SubmitListen(SkipSaveListen=true)`. Two workers could race on the same artist name. The existing code uses a get-then-save pattern. Mitigations:

- Pre-dedup in Phase A ensures each unique tuple is processed by exactly one goroutine — **no TOCTOU within the worker pool**
- The only remaining race is between the import workers and live scrobbles from multi-scrobbler hitting the same `SubmitListen` path. This is already handled by the DB's unique constraints + `ON CONFLICT` clauses on join tables.

### Success Criteria

- [ ] `go build ./...` compiles
- [ ] `go test ./...` passes
- [ ] New test: `TestBulkSubmitter_BasicImport` — buffer 100 scrobbles, flush, verify all imported
- [ ] New test: `TestBulkSubmitter_Dedup` — buffer 100 scrobbles with 10 unique combos, verify 10 entity creations
- [ ] New test: `TestBulkSubmitter_CacheHit` — pre-populate track_lookup, verify no SubmitListen calls
- [ ] New test: `TestBulkSubmitter_PartialFailure` — one entity creation fails, rest still imported

---

## Phase 4: Migrate All Importers

### Overview

Wire all 5 importers to use BulkSubmitter instead of calling `SubmitListen` directly in a loop.

### Changes Required

#### 1. Maloja Importer

**File**: `internal/importer/maloja.go`

Replace the per-item `catalog.SubmitListen` loop with:

```go
func ImportMalojaFile(ctx context.Context, store db.DB, mbzc mbz.MusicBrainzCaller, filename string) error {
    l := logger.FromContext(ctx)
    // ... file reading and JSON parsing (unchanged) ...

    bs := NewBulkSubmitter(ctx, BulkSubmitterOpts{
        Store: store,
        Mbzc:  mbzc,
        OnProgress: func(imported, total int) {
            l.Info().Msgf("Maloja import progress: %d/%d", imported, total)
        },
    })

    for _, item := range items {
        // ... existing artist parsing, time window check (unchanged) ...

        bs.Accept(catalog.SubmitListenOpts{
            MbzCaller:      mbzc,
            Artist:         item.Track.Artists[0],
            ArtistNames:    artists,
            TrackTitle:     item.Track.Title,
            ReleaseTitle:   releaseTitle,
            Time:           ts.Local(),
            Client:         "maloja",
            UserID:         1,
            SkipCacheImage: true,
        })
    }

    count, err := bs.Flush()
    if err != nil {
        return fmt.Errorf("ImportMalojaFile: %w", err)
    }
    return finishImport(ctx, filename, count)
}
```

#### 2. Spotify Importer

**File**: `internal/importer/spotify.go` — Same pattern: Accept into BulkSubmitter, Flush at end.

#### 3. LastFM Importer

**File**: `internal/importer/lastfm.go` — Same pattern. Note: LastFM scrobbles include MBZ IDs, which will pass through to `SubmitListen` on cache miss for proper entity resolution.

#### 4. ListenBrainz Importer

**File**: `internal/importer/listenbrainz.go` — Same pattern. ListenBrainz data is the richest (full MBZ IDs, MBID mappings) — cache hits will be common after first import.

#### 5. Koito Importer

**File**: `internal/importer/koito.go` — This one currently bypasses `SubmitListen` with direct DB calls. Two options:
- **Option A**: Migrate to BulkSubmitter (consistent, benefits from cache)
- **Option B**: Leave as-is (it's already fast, Koito exports have pre-resolved IDs)

Recommend **Option A** for consistency, with the Koito importer becoming the simplest BulkSubmitter user since its data is pre-resolved.

### Success Criteria

- [ ] `go build ./...` compiles
- [ ] `go test ./...` passes — all existing import tests still pass
- [ ] `TestImportMaloja` — 38 listens imported correctly
- [ ] `TestImportMaloja_NullAlbum` — null album handled
- [ ] `TestImportMaloja_ApiFormat` — list format works
- [ ] `TestImportSpotify` — duration data preserved
- [ ] `TestImportLastFM` — MBZ IDs resolved
- [ ] `TestImportListenBrainz` — MBID mappings applied
- [ ] `TestImportKoito` — aliases preserved
- [ ] Manual: 49k Maloja import on vo-pc completes in under 30 minutes

---

## Phase 5: Skip Image Lookups During Import

### Overview

Short-circuit `GetArtistImage` and `GetAlbumImage` calls when `SkipCacheImage` is true. Currently these functions still make HTTP calls (or call providers that return "no providers enabled") even when the result won't be used. The existing background tasks (`FetchMissingArtistImages`, `FetchMissingAlbumImages`) will backfill images after import.

### Changes Required

#### 1. Early Return in Associate Functions

**File**: `internal/catalog/associate_artists.go`

In `resolveAliasOrCreateArtist` (line ~248) and `matchArtistsByNames` (line ~304):

```go
// Before:
imgUrl, err := images.GetArtistImage(ctx, images.ArtistImageOpts{...})
if err == nil && imgUrl != "" {
    imgid = uuid.New()
    if !opts.SkipCacheImage {
        // download image
    }
}

// After:
var imgUrl string
if !opts.SkipCacheImage {
    imgUrl, err = images.GetArtistImage(ctx, images.ArtistImageOpts{...})
    if err == nil && imgUrl != "" {
        imgid = uuid.New()
        // download image
    }
}
```

**File**: `internal/catalog/associate_album.go`

Same pattern in `createOrUpdateAlbumWithMbzReleaseID` (line ~125) and `matchAlbumByTitle` (line ~220).

### Success Criteria

- [ ] `go build ./...` compiles
- [ ] `go test ./...` passes
- [ ] No `GetArtistImage`/`GetAlbumImage` calls during import (verify via log: no "No image providers" warnings)
- [ ] Background tasks still fetch images after import completes
- [ ] Live scrobbles (SkipCacheImage=false) still fetch images normally

---

## Performance Estimates

| Scenario | Current | After Phase 1 | After All Phases |
|---|---|---|---|
| Repeated live scrobble | ~19 queries | 2 queries (cache hit) | 2 queries |
| New live scrobble | ~19 queries | ~19 queries + 1 cache write | ~19 queries + 1 cache write |
| 49k Maloja import | ~24 hours | ~12 hours (cache helps repeats) | ~15-30 minutes |
| 49k import (second run) | ~24 hours | ~20 minutes (all cache hits) | ~5 minutes |

## Implementation Order

1. **Phase 1** (track_lookup) — standalone, immediate benefit for all scrobbles
2. **Phase 5** (skip image lookups) — standalone, no dependencies, quick win
3. **Phase 2** (SaveListensBatch) — DB layer, needed by Phase 3
4. **Phase 3** (BulkSubmitter) — the core, depends on Phase 1 + 2
5. **Phase 4** (migrate importers) — depends on Phase 3

Phases 1 and 5 can be done first as independent PRs. Phases 2-4 are one PR.

## References

- ListenBrainz architecture: https://listenbrainz.readthedocs.io/en/latest/developers/architecture.html
- ListenBrainz MBID mapping: https://listenbrainz.readthedocs.io/en/latest/developers/mapping.html
- MusicBrainz rate limiting: https://musicbrainz.org/doc/MusicBrainz_API/Rate_Limiting
- PR 1 (importer fixes): https://github.com/gabehf/Koito/pull/228
- PR 2 (MBZ search): https://github.com/gabehf/Koito/pull/229
- Koito native importer (bypass pattern): `internal/importer/koito.go`
- Current SubmitListen: `internal/catalog/catalog.go:70`
- pgx CopyFrom docs: https://pkg.go.dev/github.com/jackc/pgx/v5#Conn.CopyFrom