Add bulk import optimization: track_lookup cache, batch inserts, BulkSubmitter

Adopts ListenBrainz-inspired patterns to speed up imports from ~24h to
under 30 minutes for 49k scrobbles.

Phase 1 - track_lookup cache table:
- New migration (000006) adds persistent entity lookup cache
- Maps normalized (artist, track, album) → (artist_id, album_id, track_id)
- SubmitListen fast path: cache hit skips 18 DB queries → 2 queries
- Cache populated after entity resolution, invalidated on merge/delete
- Benefits both live scrobbles and imports

Phase 2 - SaveListensBatch:
- New batch listen insert using pgx CopyFrom → temp table → INSERT ON CONFLICT
- Thousands of inserts per second vs one-at-a-time

Phase 3 - BulkSubmitter:
- Reusable import accelerator for all importers
- Pre-deduplicates scrobbles by (artist, track, album) in memory
- Worker pool (4 goroutines) for parallel entity creation on cache miss
- Batch listen insertion via SaveListensBatch

Phase 4 - Migrate importers:
- Maloja, Spotify, LastFM, ListenBrainz importers use BulkSubmitter
- Koito importer left as-is (already fast with pre-resolved IDs)

Phase 5 - Skip image lookups during import:
- GetArtistImage/GetAlbumImage calls fully skipped when SkipCacheImage=true
- Background tasks (FetchMissingArtistImages/FetchMissingAlbumImages) backfill

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
safierinx-a 2026-03-25 04:17:50 +05:30
parent c92e93484e
commit 8ce6ec494d
21 changed files with 1294 additions and 129 deletions

View file

@ -77,6 +77,21 @@ func SubmitListen(ctx context.Context, store db.DB, opts SubmitListenOpts) error
// bandaid to ensure new activity does not have sub-second precision
opts.Time = opts.Time.Truncate(time.Second)
// Fast path: check lookup cache for known entity combo
if !opts.SkipSaveListen {
key := TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle)
cached, err := store.GetTrackLookup(ctx, key)
if err == nil && cached != nil {
l.Debug().Msg("Track lookup cache hit — skipping entity resolution")
return store.SaveListen(ctx, db.SaveListenOpts{
TrackID: cached.TrackID,
Time: opts.Time,
UserID: opts.UserID,
Client: opts.Client,
})
}
}
artists, err := AssociateArtists(
ctx,
store,
@ -183,6 +198,16 @@ func SubmitListen(ctx context.Context, store db.DB, opts SubmitListenOpts) error
}
}
// Populate lookup cache for future fast-path hits
if len(artists) > 0 {
store.SaveTrackLookup(ctx, db.SaveTrackLookupOpts{
Key: TrackLookupKey(opts.Artist, opts.TrackTitle, opts.ReleaseTitle),
ArtistID: artists[0].ID,
AlbumID: rg.ID,
TrackID: track.ID,
})
}
if opts.IsNowPlaying {
if track.Duration == 0 {
memkv.Store.Set(strconv.Itoa(int(opts.UserID)), track.ID)