Skip to content

parser-kodik integration guide

This page is the single entry point for any downstream service that wants to integrate with orinuno — primarily parser-kodik, but the contract below applies to any consumer that submits work and consumes results.

If you only have 30 seconds: run the four pre-flight checks below, submit via POST /api/v1/parse/requests, and consume completion via GET /api/v1/export/ready?updatedSince=…. Do not poll /parse/requests/{id}.

Before any consumer sends its first POST /parse/requests, hit these four endpoints. They surface the failure modes that account for ~all “orinuno is broken” incidents.

OrderEndpointPass conditionWhat it tells you
1GET /api/v1/health{status: UP, service: orinuno}Spring context booted
2GET :8081/actuator/health{status: UP} (with db.status=UP)DB pool is connected and Liquibase migrations applied
3GET /api/v1/health/tokensliveCount > 0At least one Kodik token exists in stable|unstable|legacy. Common gotcha: a fresh checkout has data/kodik_tokens.json empty — every Kodik call will fail with 503 until you seed via KODIK_TOKEN env or manual edit. See getting started → quick start.
4GET /api/v1/health/schema-driftstatus: CLEAN (or known drift you’ve vetted)Kodik response shape matches what KodikResponseMapper was compiled for

For a one-shot aggregate of all four, use the dedicated endpoint:

Terminal window
curl -sS http://localhost:8085/api/v1/health/integration | jq

It returns a single document with status: READY|DEGRADED|BLOCKED plus the per-check details so a consumer can implement a single readiness probe instead of fanning out across four URLs.

sequenceDiagram
    autonumber
    participant Consumer as Consumer (parser-kodik)
    participant Submit as POST /parse/requests
    participant Backpressure as GET /parse/requests?limit=0
    participant Export as GET /export/ready?updatedSince=
    participant Worker as RequestWorker (internal)

    Consumer->>Backpressure: GET ?status=PENDING&limit=0
    Backpressure-->>Consumer: 200 + X-Total-Count: 47
    alt X-Total-Count >= consumer threshold
        Note over Consumer: pause submission, retry later
    else queue has headroom
        Consumer->>Submit: POST {title|kinopoiskId|...}
        Note over Consumer,Submit: header X-Created-By: parser-kodik (required)
        Submit-->>Consumer: 201 Created (new) | 200 OK (idempotent hit)
        Note over Worker: claims row, processes async (no Consumer involvement)
        loop polling every N minutes
            Consumer->>Export: GET ?updatedSince=<last_seen_iso>
            Export-->>Consumer: PageResponse<ContentExportDto> with new/changed rows
            Consumer->>Consumer: persist, advance updatedSince watermark
        end
    end
POST /api/v1/parse/requests
Host: orinuno:8085
Content-Type: application/json
X-API-KEY: <your key, when configured>
X-Created-By: parser-kodik
{
"kinopoiskId": "326",
"decodeLinks": true
}
  • X-Created-By is required (non-blank). Empty / missing returns 400. This header is the rate-limit key (see §3) and shows up in every metric and log line that involves the request.
  • decodeLinks: true triggers per-variant decode after search. false ingests metadata only.
  • Payload must contain at least one of: title, id, playerLink, kinopoiskId, imdbId, mdlId, worldartAnimationId, worldartCinemaId, worldartLink, shikimoriId. Empty payload returns 400.

The submit hash is SHA-256(canonical-json(dto)) over a normalised view (trimmed/lowercased title, blank ids stripped). Hitting submit twice with the same payload while the prior row is still PENDING or RUNNING returns the existing row with 200 OK and created=false.

Do not call GET /parse/requests/{id} in a loop to detect completion. The authoritative completion signal is GET /api/v1/export/ready?updatedSince=…, which already powers all live integration tests. The parse-request log exists for observability and idempotency, not for state-machine driving.

The only allowed list-endpoint call is the backpressure probe: GET /parse/requests?status=PENDING&limit=0 returns an empty body with header X-Total-Count: <n>.

These are the production-relevant knobs. Tune cautiously — the defaults were chosen to coexist with Kodik’s tolerance for traffic from a single IP.

KnobDefaultWhereWhat it bounds
orinuno.parse.rate-limit-per-minute30OrinunoProperties.ParsePropertiesOutbound calls to kodik-api.com. Token-bucket via KodikApiRateLimiter. Exhaustion → 2 s wait, then up to 30 s blocking acquire, then RuntimeException.
orinuno.parse.inbound-rate-limit-per-minute60OrinunoProperties.ParsePropertiesInbound submissions per X-Created-By value. Bucket4j-backed. Exhaustion → 429 Too Many Requests with Retry-After.
orinuno.kodik.request-delay-ms500KodikPropertiesInter-decode pacing inside the per-content decode loop.
orinuno.requests.worker-poll-ms2000RequestsPropertiesRequestWorker.tick() cadence.
orinuno.requests.stale-after-ms300000RequestsPropertiesHeartbeat age that flips a RUNNING row back to PENDING.
orinuno.kodik.token-failover-max-attempts3KodikPropertiesRetries with the next eligible token when Kodik returns “invalid token”.

Hard topology constraints (today, single-instance only — see §6):

  • One @Scheduled(2s) worker thread per JVM, isolated on the orinuno-sched- pool.
  • One decoder maintenance thread on the isolated orinuno-decoder-maint- pool (see TD-PR-5).
  • HikariCP maximum-pool-size = 10 (Spring Boot default — not yet evaluated under sustained parser-kodik load).
StageP95
POST /parse/requests round-trip< 200 ms
PENDING → SEARCHING transition< 4 s (worker poll = 2 s)
Single-content search (no decode)< 5 s
Single-variant decode (warm Playwright)2–10 s
220-episode serial decode30–90 min
Stale RUNNING recovery< 60 s

For the long-tail jobs the consumer must never hold an HTTP connection open. Submit, drop, watch /export/ready.

Source: parse-requests.md → SLA targets.

What the consumer sees / what actually happened / what to do.

StatusBody / header signalWhat’s wrongWhat to do
400 from POST /parse/requestserror: id or title requiredEmpty payloadAdd at least one id field or title.
400 from POST /parse/requestserror: X-Created-By header is requiredMissing/blank X-Created-BySet the header to your service name.
400 from POST /parse/requestserror: Unknown idType…Wrong path/value (/embed/{idType}/{id})Use one of the seven supported slugs (see API → Embed).
401noneX-API-KEY missing/wrong (when api-key auth is configured)Set X-API-KEY header.
404 from /embed/{type}/{id}error: Kodik has no player for…Kodik returned found:falseThe id genuinely doesn’t exist on Kodik — don’t retry.
429 from POST /parse/requestsheader Retry-After: <seconds>Inbound rate limit hit (X-Created-By consumed budget)Back off Retry-After seconds. Tune orinuno.parse.inbound-rate-limit-per-minute if legitimate.
502 from /embed/*error: Kodik /get-player error: …Kodik returned a non-token error in bodyCheck /health/schema-drift. Often transient; safe to retry with backoff.
503 from /embed/*error: registry empty|all tokens deadToken registry has nothing usableCheck /health/tokens, seed a fresh KODIK_TOKEN.
Timeout on long decodenone — request returned 202 minutes agoWorker pinned on a slow / VPN-induced decodeCheck orinuno_parse_request_processing_seconds quantiles. See TD-PR-5 for the deadlock fix history.
Warning: 199 header on /api/v1/kodik/listheader presentSchema drift detected during this callContinue — body is still usable. Log the warning, expect a follow-up /health/schema-drift check.
Stuck PENDING queueX-Total-Count keeps climbingWorker not draining (crashed, deadlocked, or token-rejected)Check orinuno_parse_request_worker_tick_seconds (no recent samples ⇒ worker dead) and /health/tokens.

What to graph, what to alert on. All series are scraped at :8081/actuator/prometheus.

SeriesPurposeAlert when
orinuno_parse_requests{status="PENDING"}Queue depthsustained > N (consumer-defined) for > 10 min
rate(orinuno_parse_requests_completed_total{outcome="DONE"}[5m])Throughputdrops to 0 while PENDING > 0
rate(orinuno_parse_requests_completed_total{outcome="FAILED"}[5m])Error rate> 10 % of total completions for > 5 min
orinuno_parse_request_worker_tick_seconds{quantile="0.95"}Worker latency> 5 s sustained
orinuno_inbound_throttle_total{consumer="parser-kodik"}Inbound throttlingnon-zero (means consumer is being slowed down)
orinuno_kodik_calendar_fetch_total{outcome="error"}Calendar healthsustained > 0

The repo ships a Grafana dashboard for the parse-request flow: observability/grafana/dashboards/orinuno-parse-requests.json.

Bring up the local stack:

Terminal window
docker compose --profile observability up -d prometheus grafana

URLs in monitoring → local Grafana stack.

  • [SCHEMA DRIFT] (WARN) — Kodik changed something
  • KodikApiRateLimiter (INFO) — outbound bucket exhausted
  • KodikEmbedController (WARN) — embed-link resolve failed
  • RequestWorker (INFO/WARN) — claim/process/recover events
  • KodikTokenRegistry (INFO/WARN) — token tier transitions, dead-token quarantine

orinuno is currently single-instance only. Running multiple replicas against the same DB will not corrupt data, but several behaviours are not horizontally safe:

SubsystemWhy not safe to replicateWorkaround
RequestWorker.tick()FOR UPDATE SKIP LOCKED works correctly across replicas — this part is safeNone needed.
RequestWorker.recoverStale()Idempotent UPDATE — safe to run on every replicaNone needed.
KodikTokenLifecycle.scheduledRevalidation()Each replica probes Kodik with every token every 6 h — wasteful and may trip Kodik anti-abuseRun on exactly one replica via leader election; or accept the duplicated probe traffic.
DecoderMaintenanceSchedulerEach replica picks the same expired-mp4 batch and decodes redundantlySame — run on one replica until distributed lock arrives (see TECH_DEBT.md TD-PR-1).
KodikApiRateLimiterPer-process semaphore — N replicas multiply outbound rate by NSet orinuno.parse.rate-limit-per-minute to total_budget / N on every replica.

This will be revisited when TD-PR-1 (worker pool) lands.