Adopts ListenBrainz-inspired patterns to speed up imports from ~24h to under 30 minutes for 49k scrobbles. Phase 1 - track_lookup cache table: - New migration (000006) adds persistent entity lookup cache - Maps normalized (artist, track, album) → (artist_id, album_id, track_id) - SubmitListen fast path: cache hit skips 18 DB queries → 2 queries - Cache populated after entity resolution, invalidated on merge/delete - Benefits both live scrobbles and imports Phase 2 - SaveListensBatch: - New batch listen insert using pgx CopyFrom → temp table → INSERT ON CONFLICT - Thousands of inserts per second vs one-at-a-time Phase 3 - BulkSubmitter: - Reusable import accelerator for all importers - Pre-deduplicates scrobbles by (artist, track, album) in memory - Worker pool (4 goroutines) for parallel entity creation on cache miss - Batch listen insertion via SaveListensBatch Phase 4 - Migrate importers: - Maloja, Spotify, LastFM, ListenBrainz importers use BulkSubmitter - Koito importer left as-is (already fast with pre-resolved IDs) Phase 5 - Skip image lookups during import: - GetArtistImage/GetAlbumImage calls fully skipped when SkipCacheImage=true - Background tasks (FetchMissingArtistImages/FetchMissingAlbumImages) backfill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
26 KiB
Bulk Import Optimization — Implementation Plan
Overview
Optimize Koito's import pipeline from ~20 listens/min to thousands/min by adopting ListenBrainz-inspired patterns: a persistent entity lookup cache, batch DB writes, pre-deduplication, and deferred enrichment. The core insight from the ecosystem research is write raw first, enrich async — and the persistent lookup table benefits all scrobbles (live + import), not just bulk imports.
Current State Analysis
The Problem
Importing 49,050 Maloja scrobbles takes ~24 hours. The import is stable (our PR 1 fixes eliminated panics) but each scrobble runs through the full SubmitListen path:
GetArtistissues 6 DB queries per lookup (including rank computation via window functions)GetAlbumissues 6 DB queries per lookupGetTrackissues 5-6 DB queries per lookupGetArtistImage/GetAlbumImagemakes HTTP calls even when all image providers are disabledSaveListenis a single INSERT — the only fast part
Per unique scrobble: ~18 DB round-trips + 2 image lookups. Per repeated scrobble: ~18 DB round-trips (no caching). With 5,589 unique artists, 2,628 unique albums, and 49,050 total scrobbles, this is massively redundant.
Key Discoveries
SubmitListenOpts.SkipSaveListen(catalog.go:43) can be used to create entities without recording a listen — useful for entity pre-creationSubmitListenOpts.SkipCacheImage(catalog.go:46) controls image download but NOT image URL resolution — the HTTP calls still happen- The Koito native importer (
importer/koito.go) already bypassesSubmitListenand does direct DB calls — a precedent for a faster import path pgxpool.Poolis goroutine-safe — concurrent DB operations are safe at the pool levelSaveListenSQL usesON CONFLICT DO NOTHING— re-importing is idempotent- No batch insert methods exist anywhere in the codebase
GetArtist/GetAlbum/GetTrackcompute full stats (listen count, time listened, rank) on every call — unnecessary during import
Ecosystem Patterns (from research)
- ListenBrainz: Stores raw scrobbles immediately, resolves MBIDs asynchronously via background worker + Typesense index. Uses MessyBrainz as a stable
(artist, track, release) → IDmapping. - Maloja: Runs every import through the full normalize → dedup → cache-invalidate cycle. Works for live scrobbles, kills bulk import. This is exactly Koito's current problem.
- Last.fm: Resolves metadata at write time (corrections), batches up to 50 scrobbles per request.
- General: DB-level dedup via unique constraint +
ON CONFLICTis the industry standard.
Desired End State
- A
track_lookuptable provides O(1) entity resolution for any(artist, track, album)tuple — both live and import scrobbles benefit - All 5 importers use a shared
BulkSubmitterthat pre-deduplicates, creates entities in parallel, and batch-inserts listens - Image/MBZ enrichment is fully deferred to existing background tasks during import
- 49k Maloja import completes in under 30 minutes (vs 24 hours currently)
- Live scrobbles are faster too — cache hit skips 18 DB queries, goes straight to 1 SELECT + 1 INSERT
Verification
go build ./...compilesgo test ./...passes (existing + new tests)- Manual: import 49k Maloja scrobbles in under 30 minutes on vo-pc
- Manual: verify live scrobbles from multi-scrobbler still work correctly
- Manual: verify album art appears after background image backfill runs
What We're NOT Doing
- Replacing the DB engine (no TimescaleDB, no Redis) — Postgres is fine for self-hosted scale
- Local MusicBrainz mirror or Typesense index — overkill for single-user; the live API + background enrichment is sufficient
- Changing the live
SubmitListenAPI path — the lookup cache makes it faster, but the logic stays the same - Parallelizing live scrobbles — only imports use the worker pool; live scrobbles remain single-threaded through
SubmitListen - Changing the ListenBrainz/Last.fm relay — multi-scrobbler handles that independently
Implementation Approach
Adopt ListenBrainz's "MessyBrainz" pattern as a persistent Postgres table: a normalized (artist, track, album) tuple maps to resolved (artist_id, album_id, track_id). This is the foundational optimization — everything else builds on it.
Before (per scrobble):
GetArtist (6 queries) → GetAlbum (6 queries) → GetTrack (6 queries) → SaveListen (1 query)
= 19 queries minimum
After (cache hit):
SELECT FROM track_lookup (1 query) → SaveListen (1 query)
= 2 queries
After (bulk import, cache hit):
In-memory map lookup (0 queries) → batched SaveListen
= amortized ~0.01 queries per scrobble
Phase 1: track_lookup Cache Table
Overview
Add a persistent lookup table that maps normalized (artist_name, track_title, release_title) to resolved entity IDs. Integrate into SubmitListen so both live and import scrobbles benefit.
Changes Required
1. New Migration
File: db/migrations/000006_track_lookup.sql
-- +goose Up
CREATE TABLE track_lookup (
lookup_key TEXT NOT NULL PRIMARY KEY,
artist_id INT NOT NULL REFERENCES artists(id) ON DELETE CASCADE,
album_id INT NOT NULL REFERENCES releases(id) ON DELETE CASCADE,
track_id INT NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_track_lookup_track_id ON track_lookup(track_id);
CREATE INDEX idx_track_lookup_artist_id ON track_lookup(artist_id);
CREATE INDEX idx_track_lookup_album_id ON track_lookup(album_id);
-- +goose Down
DROP TABLE IF EXISTS track_lookup;
The lookup_key is a normalized string: lower(artist) || '\x00' || lower(track) || '\x00' || lower(album). Using a single TEXT key with a null-byte separator is simpler and faster than a multi-column composite key with citext.
2. New SQL Queries
File: db/queries/track_lookup.sql
-- name: GetTrackLookup :one
SELECT artist_id, album_id, track_id
FROM track_lookup
WHERE lookup_key = $1;
-- name: InsertTrackLookup :exec
INSERT INTO track_lookup (lookup_key, artist_id, album_id, track_id)
VALUES ($1, $2, $3, $4)
ON CONFLICT (lookup_key) DO UPDATE SET
artist_id = EXCLUDED.artist_id,
album_id = EXCLUDED.album_id,
track_id = EXCLUDED.track_id;
-- name: DeleteTrackLookupByArtist :exec
DELETE FROM track_lookup WHERE artist_id = $1;
-- name: DeleteTrackLookupByAlbum :exec
DELETE FROM track_lookup WHERE album_id = $1;
-- name: DeleteTrackLookupByTrack :exec
DELETE FROM track_lookup WHERE track_id = $1;
3. Regenerate sqlc
Run sqlc generate to create the Go bindings in internal/repository/.
4. DB Interface + Psql Implementation
File: internal/db/db.go — Add to interface:
// Track Lookup Cache
GetTrackLookup(ctx context.Context, key string) (*TrackLookupResult, error)
SaveTrackLookup(ctx context.Context, opts SaveTrackLookupOpts) error
InvalidateTrackLookup(ctx context.Context, opts InvalidateTrackLookupOpts) error
File: internal/db/opts.go — Add types:
type TrackLookupResult struct {
ArtistID int32
AlbumID int32
TrackID int32
}
type SaveTrackLookupOpts struct {
Key string
ArtistID int32
AlbumID int32
TrackID int32
}
type InvalidateTrackLookupOpts struct {
ArtistID int32
AlbumID int32
TrackID int32
}
File: internal/db/psql/track_lookup.go — New file implementing the three methods.
5. Lookup Key Helper
File: internal/catalog/lookup_key.go — New file:
package catalog
import "strings"
// TrackLookupKey builds a normalized cache key for entity resolution.
func TrackLookupKey(artist, track, album string) string {
return strings.ToLower(artist) + "\x00" + strings.ToLower(track) + "\x00" + strings.ToLower(album)
}
6. Integrate into SubmitListen
File: internal/catalog/catalog.go — Add fast path at the top of SubmitListen:
func SubmitListen(ctx context.Context, store db.DB, opts SubmitListenOpts) error {
l := logger.FromContext(ctx)
if opts.Artist == "" || opts.TrackTitle == "" {
return errors.New("track name and artist are required")
}
opts.Time = opts.Time.Truncate(time.Second)
// Fast path: check lookup cache for known entity combo
if !opts.SkipSaveListen {
key := TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
cached, err := store.GetTrackLookup(ctx, key)
if err == nil && cached != nil {
l.Debug().Msg("Track lookup cache hit — skipping entity resolution")
return store.SaveListen(ctx, db.SaveListenOpts{
TrackID: cached.TrackID,
Time: opts.Time,
UserID: opts.UserID,
Client: opts.Client,
})
}
}
// ... existing SubmitListen logic (unchanged) ...
// After successful entity resolution, populate the cache
store.SaveTrackLookup(ctx, db.SaveTrackLookupOpts{
Key: TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle),
ArtistID: artists[0].ID,
AlbumID: rg.ID,
TrackID: track.ID,
})
// ... rest of existing logic ...
}
Note: The cache only applies when we have a direct artist+track+album text match. Scrobbles with MBZ IDs that resolve to different text representations will still go through the full path on first encounter, then be cached.
7. Invalidation on Merge/Delete
File: internal/db/psql/artist.go — In DeleteArtist and MergeArtists, add:
d.q.DeleteTrackLookupByArtist(ctx, id)
File: internal/db/psql/album.go — In DeleteAlbum and MergeAlbums, add:
d.q.DeleteTrackLookupByAlbum(ctx, id)
File: internal/db/psql/track.go — In DeleteTrack and MergeTracks, add:
d.q.DeleteTrackLookupByTrack(ctx, id)
Success Criteria
go build ./...compilesgo test ./...passes- New test:
TestSubmitListen_LookupCacheHit— second identical scrobble uses cache - New test:
TestSubmitListen_LookupCacheInvalidateOnDelete— deleting entity clears cache - Migration applies cleanly on fresh and existing databases
- Live scrobbles from multi-scrobbler populate the cache on first hit, use it on second
Phase 2: SaveListensBatch DB Method
Overview
Add a batch listen insert method using pgx's CopyFrom for high-throughput listen insertion. This is the DB foundation for the BulkSubmitter.
Changes Required
1. New SQL + DB Interface
File: internal/db/db.go — Add to interface:
SaveListensBatch(ctx context.Context, opts []SaveListenOpts) (int64, error)
Returns the number of rows actually inserted (excluding ON CONFLICT duplicates).
2. Psql Implementation
File: internal/db/psql/listen.go — New method:
func (d *Psql) SaveListensBatch(ctx context.Context, opts []db.SaveListenOpts) (int64, error) {
if len(opts) == 0 {
return 0, nil
}
// Use a transaction with a temp table + INSERT ... ON CONFLICT pattern
// since CopyFrom doesn't support ON CONFLICT directly
tx, err := d.conn.BeginTx(ctx, pgx.TxOptions{})
if err != nil {
return 0, fmt.Errorf("SaveListensBatch: BeginTx: %w", err)
}
defer tx.Rollback(ctx)
// Create temp table
_, err = tx.Exec(ctx, `
CREATE TEMP TABLE tmp_listens (
track_id INT,
listened_at TIMESTAMPTZ,
user_id INT,
client TEXT
) ON COMMIT DROP
`)
if err != nil {
return 0, fmt.Errorf("SaveListensBatch: create temp table: %w", err)
}
// CopyFrom into temp table
rows := make([][]interface{}, len(opts))
for i, o := range opts {
var client interface{}
if o.Client != "" {
client = o.Client
}
rows[i] = []interface{}{o.TrackID, o.Time, o.UserID, client}
}
_, err = tx.CopyFrom(ctx,
pgx.Identifier{"tmp_listens"},
[]string{"track_id", "listened_at", "user_id", "client"},
pgx.CopyFromRows(rows),
)
if err != nil {
return 0, fmt.Errorf("SaveListensBatch: CopyFrom: %w", err)
}
// Insert from temp table with dedup
tag, err := tx.Exec(ctx, `
INSERT INTO listens (track_id, listened_at, user_id, client)
SELECT track_id, listened_at, user_id, client FROM tmp_listens
ON CONFLICT DO NOTHING
`)
if err != nil {
return 0, fmt.Errorf("SaveListensBatch: insert: %w", err)
}
if err := tx.Commit(ctx); err != nil {
return 0, fmt.Errorf("SaveListensBatch: Commit: %w", err)
}
return tag.RowsAffected(), nil
}
This uses the standard CopyFrom → temp table → INSERT ON CONFLICT pattern, which is the fastest bulk insert approach with pgx while still supporting deduplication.
Success Criteria
go build ./...compilesgo test ./...passes- New test:
TestSaveListensBatch— insert 1000 listens, verify count - New test:
TestSaveListensBatch_Dedup— insert duplicates, verify no double-counting - New test:
TestSaveListensBatch_Empty— empty input returns 0, no error
Phase 3: BulkSubmitter Helper
Overview
A reusable import accelerator that all importers can use. Pre-deduplicates scrobbles in memory, resolves entities via the track_lookup cache (falling back to SubmitListen on cache miss), and batch-inserts listens.
Design
BulkSubmitter
├── Accept(SubmitListenOpts) — buffer a scrobble
├── Flush() (int, error) — process all buffered scrobbles
│ ├── Phase A: Deduplicate by (artist, track, album) key
│ ├── Phase B: Resolve entities
│ │ ├── Check track_lookup cache (single SELECT)
│ │ ├── On miss: call SubmitListen(SkipSaveListen=true) to create entities
│ │ ├── Worker pool: N goroutines for parallel entity creation
│ │ └── Populate track_lookup cache after creation
│ ├── Phase C: Map all scrobbles to track_ids via resolved cache
│ └── Phase D: SaveListensBatch
└── Progress callback for logging
Changes Required
1. New Package
File: internal/importer/bulk.go
package importer
import (
"context"
"sync"
"github.com/gabehf/koito/internal/catalog"
"github.com/gabehf/koito/internal/db"
"github.com/gabehf/koito/internal/logger"
"github.com/gabehf/koito/internal/mbz"
)
type BulkSubmitter struct {
store db.DB
mbzc mbz.MusicBrainzCaller
ctx context.Context
buffer []catalog.SubmitListenOpts
workers int
onProgress func(imported, total int)
}
type BulkSubmitterOpts struct {
Store db.DB
Mbzc mbz.MusicBrainzCaller
Workers int // default 4
OnProgress func(imported, total int) // called every 500 items
}
func NewBulkSubmitter(ctx context.Context, opts BulkSubmitterOpts) *BulkSubmitter {
workers := opts.Workers
if workers <= 0 {
workers = 4
}
return &BulkSubmitter{
store: opts.Store,
mbzc: opts.Mbzc,
ctx: ctx,
workers: workers,
onProgress: opts.OnProgress,
}
}
func (bs *BulkSubmitter) Accept(opts catalog.SubmitListenOpts) {
bs.buffer = append(bs.buffer, opts)
}
func (bs *BulkSubmitter) Flush() (int, error) {
l := logger.FromContext(bs.ctx)
if len(bs.buffer) == 0 {
return 0, nil
}
l.Info().Msgf("BulkSubmitter: Processing %d scrobbles", len(bs.buffer))
// Phase A: Deduplicate — find unique (artist, track, album) tuples
type entityKey = string
unique := make(map[entityKey]catalog.SubmitListenOpts)
for _, opts := range bs.buffer {
key := catalog.TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
if _, exists := unique[key]; !exists {
unique[key] = opts
}
}
l.Info().Msgf("BulkSubmitter: %d unique entity combos from %d scrobbles", len(unique), len(bs.buffer))
// Phase B: Resolve entities — check cache, create on miss
resolved := make(map[entityKey]int32) // key → trackID
var mu sync.Mutex
var wg sync.WaitGroup
sem := make(chan struct{}, bs.workers)
resolveCount := 0
for key, opts := range unique {
// Check track_lookup cache first
cached, err := bs.store.GetTrackLookup(bs.ctx, key)
if err == nil && cached != nil {
mu.Lock()
resolved[key] = cached.TrackID
resolveCount++
mu.Unlock()
continue
}
// Cache miss — create entities via SubmitListen (with worker pool)
wg.Add(1)
sem <- struct{}{} // acquire worker slot
go func(k entityKey, o catalog.SubmitListenOpts) {
defer wg.Done()
defer func() { <-sem }() // release worker slot
o.SkipSaveListen = true
o.SkipCacheImage = true
err := catalog.SubmitListen(bs.ctx, bs.store, o)
if err != nil {
l.Err(err).Msgf("BulkSubmitter: Failed to create entities for '%s' by '%s'", o.TrackTitle, o.Artist)
return
}
// Re-check cache (SubmitListen populates it in Phase 1's integration)
cached, err := bs.store.GetTrackLookup(bs.ctx, k)
if err == nil && cached != nil {
mu.Lock()
resolved[k] = cached.TrackID
mu.Unlock()
}
}(key, opts)
}
wg.Wait()
l.Info().Msgf("BulkSubmitter: Resolved %d/%d entity combos", len(resolved), len(unique))
// Phase C: Build listen batch
batch := make([]db.SaveListenOpts, 0, len(bs.buffer))
skipped := 0
for _, opts := range bs.buffer {
key := catalog.TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
trackID, ok := resolved[key]
if !ok {
skipped++
continue
}
batch = append(batch, db.SaveListenOpts{
TrackID: trackID,
Time: opts.Time.Truncate(time.Second),
UserID: opts.UserID,
Client: opts.Client,
})
}
if skipped > 0 {
l.Warn().Msgf("BulkSubmitter: Skipped %d scrobbles with unresolved entities", skipped)
}
// Phase D: Batch insert listens
inserted, err := bs.store.SaveListensBatch(bs.ctx, batch)
if err != nil {
return 0, fmt.Errorf("BulkSubmitter: SaveListensBatch: %w", err)
}
l.Info().Msgf("BulkSubmitter: Inserted %d listens (%d duplicates skipped)", inserted, int64(len(batch))-inserted)
return int(inserted), nil
}
2. TOCTOU Safety for Parallel Entity Creation
The worker pool creates entities via SubmitListen(SkipSaveListen=true). Two workers could race on the same artist name. The existing code uses a get-then-save pattern. Mitigations:
- Pre-dedup in Phase A ensures each unique tuple is processed by exactly one goroutine — no TOCTOU within the worker pool
- The only remaining race is between the import workers and live scrobbles from multi-scrobbler hitting the same
SubmitListenpath. This is already handled by the DB's unique constraints +ON CONFLICTclauses on join tables.
Success Criteria
go build ./...compilesgo test ./...passes- New test:
TestBulkSubmitter_BasicImport— buffer 100 scrobbles, flush, verify all imported - New test:
TestBulkSubmitter_Dedup— buffer 100 scrobbles with 10 unique combos, verify 10 entity creations - New test:
TestBulkSubmitter_CacheHit— pre-populate track_lookup, verify no SubmitListen calls - New test:
TestBulkSubmitter_PartialFailure— one entity creation fails, rest still imported
Phase 4: Migrate All Importers
Overview
Wire all 5 importers to use BulkSubmitter instead of calling SubmitListen directly in a loop.
Changes Required
1. Maloja Importer
File: internal/importer/maloja.go
Replace the per-item catalog.SubmitListen loop with:
func ImportMalojaFile(ctx context.Context, store db.DB, mbzc mbz.MusicBrainzCaller, filename string) error {
l := logger.FromContext(ctx)
// ... file reading and JSON parsing (unchanged) ...
bs := NewBulkSubmitter(ctx, BulkSubmitterOpts{
Store: store,
Mbzc: mbzc,
OnProgress: func(imported, total int) {
l.Info().Msgf("Maloja import progress: %d/%d", imported, total)
},
})
for _, item := range items {
// ... existing artist parsing, time window check (unchanged) ...
bs.Accept(catalog.SubmitListenOpts{
MbzCaller: mbzc,
Artist: item.Track.Artists[0],
ArtistNames: artists,
TrackTitle: item.Track.Title,
ReleaseTitle: releaseTitle,
Time: ts.Local(),
Client: "maloja",
UserID: 1,
SkipCacheImage: true,
})
}
count, err := bs.Flush()
if err != nil {
return fmt.Errorf("ImportMalojaFile: %w", err)
}
return finishImport(ctx, filename, count)
}
2. Spotify Importer
File: internal/importer/spotify.go — Same pattern: Accept into BulkSubmitter, Flush at end.
3. LastFM Importer
File: internal/importer/lastfm.go — Same pattern. Note: LastFM scrobbles include MBZ IDs, which will pass through to SubmitListen on cache miss for proper entity resolution.
4. ListenBrainz Importer
File: internal/importer/listenbrainz.go — Same pattern. ListenBrainz data is the richest (full MBZ IDs, MBID mappings) — cache hits will be common after first import.
5. Koito Importer
File: internal/importer/koito.go — This one currently bypasses SubmitListen with direct DB calls. Two options:
- Option A: Migrate to BulkSubmitter (consistent, benefits from cache)
- Option B: Leave as-is (it's already fast, Koito exports have pre-resolved IDs)
Recommend Option A for consistency, with the Koito importer becoming the simplest BulkSubmitter user since its data is pre-resolved.
Success Criteria
go build ./...compilesgo test ./...passes — all existing import tests still passTestImportMaloja— 38 listens imported correctlyTestImportMaloja_NullAlbum— null album handledTestImportMaloja_ApiFormat— list format worksTestImportSpotify— duration data preservedTestImportLastFM— MBZ IDs resolvedTestImportListenBrainz— MBID mappings appliedTestImportKoito— aliases preserved- Manual: 49k Maloja import on vo-pc completes in under 30 minutes
Phase 5: Skip Image Lookups During Import
Overview
Short-circuit GetArtistImage and GetAlbumImage calls when SkipCacheImage is true. Currently these functions still make HTTP calls (or call providers that return "no providers enabled") even when the result won't be used. The existing background tasks (FetchMissingArtistImages, FetchMissingAlbumImages) will backfill images after import.
Changes Required
1. Early Return in Associate Functions
File: internal/catalog/associate_artists.go
In resolveAliasOrCreateArtist (line ~248) and matchArtistsByNames (line ~304):
// Before:
imgUrl, err := images.GetArtistImage(ctx, images.ArtistImageOpts{...})
if err == nil && imgUrl != "" {
imgid = uuid.New()
if !opts.SkipCacheImage {
// download image
}
}
// After:
var imgUrl string
if !opts.SkipCacheImage {
imgUrl, err = images.GetArtistImage(ctx, images.ArtistImageOpts{...})
if err == nil && imgUrl != "" {
imgid = uuid.New()
// download image
}
}
File: internal/catalog/associate_album.go
Same pattern in createOrUpdateAlbumWithMbzReleaseID (line ~125) and matchAlbumByTitle (line ~220).
Success Criteria
go build ./...compilesgo test ./...passes- No
GetArtistImage/GetAlbumImagecalls during import (verify via log: no "No image providers" warnings) - Background tasks still fetch images after import completes
- Live scrobbles (SkipCacheImage=false) still fetch images normally
Performance Estimates
| Scenario | Current | After Phase 1 | After All Phases |
|---|---|---|---|
| Repeated live scrobble | ~19 queries | 2 queries (cache hit) | 2 queries |
| New live scrobble | ~19 queries | ~19 queries + 1 cache write | ~19 queries + 1 cache write |
| 49k Maloja import | ~24 hours | ~12 hours (cache helps repeats) | ~15-30 minutes |
| 49k import (second run) | ~24 hours | ~20 minutes (all cache hits) | ~5 minutes |
Implementation Order
- Phase 1 (track_lookup) — standalone, immediate benefit for all scrobbles
- Phase 5 (skip image lookups) — standalone, no dependencies, quick win
- Phase 2 (SaveListensBatch) — DB layer, needed by Phase 3
- Phase 3 (BulkSubmitter) — the core, depends on Phase 1 + 2
- Phase 4 (migrate importers) — depends on Phase 3
Phases 1 and 5 can be done first as independent PRs. Phases 2-4 are one PR.
References
- ListenBrainz architecture: https://listenbrainz.readthedocs.io/en/latest/developers/architecture.html
- ListenBrainz MBID mapping: https://listenbrainz.readthedocs.io/en/latest/developers/mapping.html
- MusicBrainz rate limiting: https://musicbrainz.org/doc/MusicBrainz_API/Rate_Limiting
- PR 1 (importer fixes): https://github.com/gabehf/Koito/pull/228
- PR 2 (MBZ search): https://github.com/gabehf/Koito/pull/229
- Koito native importer (bypass pattern):
internal/importer/koito.go - Current SubmitListen:
internal/catalog/catalog.go:70 - pgx CopyFrom docs: https://pkg.go.dev/github.com/jackc/pgx/v5#Conn.CopyFrom