Video Download
Orinuno resolves Kodik video content into a local file through three paths,
picked in priority order inside VideoDownloadService.downloadWithStrategy:
- Fast-path — direct CDN pull of an already-decoded
mp4_link. - Playwright HLS — headless Chromium replays the player, captures the
.m3u8, and pulls segments in parallel. - WebClient direct MP4 — reactive HTTP client, used when Playwright is unavailable or fails.
Path 1 — Cached mp4_link fast-path
Section titled “Path 1 — Cached mp4_link fast-path”If kodik_episode_variant.mp4_link is already populated and starts with
http, we skip Playwright entirely and go straight to the CDN:
variant.mp4_link (cached) → fetchWithRedirects → bodyToFlux(DataBuffer) → DownloadProgress.addBytes(...) → diskTypical latency to first byte: ~2 seconds. Byte-level progress is
reported from the very first chunk (see expectedTotalBytes below).
If the cached URL has expired (CDN 403/404 or truncated body), the service transparently falls back to Path 3 — a fresh decode plus WebClient download.
Path 2 — Playwright + HLS
Section titled “Path 2 — Playwright + HLS”Used when mp4_link is not yet decoded. Playwright replays the player,
captures the .m3u8 manifest, and Java HttpClient fetches segments in
parallel — this path is ~9× faster than direct MP4 download on typical
Kodik content (parallelism 16 vs. a single TCP stream).
Pipeline:
- Navigate — headless Chromium loads the player URL
(
kodikplayer.com/seria/{id}/{hash}/720p) inside aBrowserContextcreated bynewStealthContext(). - Trigger playback — simulate a click on the play button and the centre
of the viewport. The player POSTs to
/ftor, processes VAST ads, and starts loading the video. - Intercept the manifest —
page.onResponse()captures the request tosolodcdn.com/s/m/.... We do not callresponse.body()on it — the file is too large for a single.body()call. - Download via
APIRequestContext—context.request().get(videoUrl)is a server-side Playwright call. It bypasses CORS (unlikefetch()frompage.evaluate(...)) and inherits cookies from theBrowserContext. The CDN sees valid cookies and returns real bytes. - HLS in parallel + remux — if the body starts with
#EXTM3U, parse the list of.tssegments (often 200–1300 of them), extract cookies from theBrowserContext, and pull segments in parallel viajava.net.http.HttpClient(8–16 threads, configurable viahls-concurrency). We do not reuseAPIRequestContexthere — it shares a single WebSocket and is not thread-safe. Segments are concatenated in order into a.tsfile, then remuxed to.mp4viaffmpeg -c copy -movflags +faststart.
Stealth shim
Section titled “Stealth shim”PlaywrightVideoFetcher.newStealthContext() patches the most common
headless-detection signals via context.addInitScript(...) before any
navigation, so the shim also applies to nested iframes:
navigator.webdriver→undefinednavigator.languages→['en-US', 'en']navigator.plugins→ non-empty mock arraywindow.chrome→ object with aruntimestubNotification.permissionreturned through the Permissions API- Context defaults:
Chrome/135UA,1280×720viewport,en-USlocale,Europe/Londontimezone
This does not solve IP-based geo-blocking — Kodik’s player refuses to
start from blocked regions regardless of browser fingerprint. For that,
rotate egress through kodik_proxy (see the Geo-block handling section
below).
Path 3 — WebClient direct MP4 (fallback)
Section titled “Path 3 — WebClient direct MP4 (fallback)”Used when Playwright is disabled (orinuno.playwright.enabled=false) or
times out. Pipeline:
KodikVideoDecoderService.decode(kodik_link)resolves fresh quality URLs via the public/ftorendpoint.pickBestQualityUrl(...)picks the highest numeric quality that is anhttpURL — defensive filters drop_geo_blockedsentinels and any value not starting withhttp.fetchWithRedirects(...)follows up to 5 redirects through the reactivekodikCdnWebClient. On the terminal 2xx response, theContent-Lengthheader populatesexpectedTotalBytes.bodyToFlux(DataBuffer)streams the payload; eachDataBufferupdatestotalBytesviaprogress.addBytes(buf.readableByteCount())and is then written to disk.
This path works on CDNs that accept plain HTTP clients with a realistic
User-Agent and follows Kodik’s redirect chain. It is slower than Path 2
because it is a single TCP stream, not a segment-parallel HLS pull.
Why not a plain HTTP client? (historical)
Section titled “Why not a plain HTTP client? (historical)”| Approach | Result | Reason |
|---|---|---|
Single-pass exchangeToFlux without redirects | 0 bytes | Kodik CDN responds with a 302 that must be followed manually |
WebClient + manual redirect handling | Works | Current Path 3 |
Playwright + response.body() | Timeout | body() waits for the full stream; video is too large |
page.evaluate(fetch(...)) | CORS error | Browser fetch blocks cross-origin CDN calls |
Playwright APIRequestContext | Works | Server-side call with cookies from BrowserContext |
APIRequestContext multi-threaded | Errors | Not thread-safe — single WebSocket |
Playwright cookies + Java HttpClient | Works, fast | Cookies from BrowserContext, native parallelism |
.ts → .mp4 via ffmpeg stream copy | Instant | Browsers cannot play MPEG-TS natively |
Progress tracking
Section titled “Progress tracking”VideoDownloadService.DownloadProgress keeps an in-memory record with
atomic counters:
| Field | Populated by | Meaning |
|---|---|---|
totalSegments | Playwright HLS path | Total .ts segments in the manifest |
downloadedSegments | Playwright HLS path | Segments completed so far |
totalBytes | Both Playwright and WebClient paths | Bytes written so far |
expectedTotalBytes | WebClient path | Content-Length of the final 2xx response |
The REST surface:
POST /api/v1/download/{variantId}— fire-and-forget, returnsIN_PROGRESSimmediately.GET /api/v1/download/{variantId}/status— polls the counters.
The demo UI picks one of three progress modes depending on which counters are populated:
- Segments — shows
XX% · M/N segments · Y MB(HLS path). - Bytes — shows
XX% · Y MB / Z MBwhenexpectedTotalBytesis known (WebClient path withContent-Length). - Indeterminate — shows
Initializing…with an animating pulse bar and aphaseHintexplaining what is happening (Browser handshake,Playwright timed out — falling back to direct MP4, orDecoding fresh CDN URL (fallback)).
In every mode an elapsed timer (e.g. 12s or 2m 07s) is shown next to
the caption so it is always obvious that the download is making progress.
Streaming
Section titled “Streaming”GET /api/v1/stream/{variantId} serves the local file with full Range
support. If the file is missing, the stream endpoint kicks off a fresh
Playwright download before returning bytes. Useful for ad-hoc playback
without having to pre-download.
Geo-block handling
Section titled “Geo-block handling”Kodik IP-blocks the player in some regions (Kazakhstan is the observed example). Symptoms:
decode()still returns valid CDN URLs (the decode API lives on a separate IP policy).- Playwright loads the player page but the video request never fires → the
call times out after
videoWaitMs(30s by default). mp4_linksaved from/searchis literally the string"true"— the_geo_blockedsentinel. Orinuno defensively filters these out in three places (KodikVideoDecoderService.parseVideoResponse,ParserService.selectBestQuality,VideoDownloadService.pickBestQualityUrl, andStreamController.pickBestQuality). A Liquibase migration (20260425010000_cleanup_invalid_mp4_link.sql) nulls out pre-existing bad values on first boot.
Mitigations available today:
- Run the service from an unaffected region.
- Keep the current strategy order — the fast-path works the moment a decode has succeeded under a compatible egress.
Planned mitigation (tracked in BACKLOG.md as
IDEA-DOWNLOAD-PROXY): route each BrowserContext through a rotated
kodik_proxy entry (new Browser.NewContextOptions().setProxy(...)). The
proxy pool and ProxyProviderService already exist — PlaywrightVideoFetcher
just needs to consume them.
Fallback
Section titled “Fallback”If Playwright is disabled (orinuno.playwright.enabled=false) or fails at
launch, Path 3 (WebClient) runs directly. Byte-level progress still
populates via expectedTotalBytes / totalBytes.
Configuration
Section titled “Configuration”All Playwright-related properties live under orinuno.playwright.*. See
Configuration.