You added a <video> element to a page, dropped in a <track> tag pointing at the captions file you already had, and the browser politely ignored it. The video plays. There is no "CC" button. DevTools shows the file loaded fine — HTTP 200, content-type application/x-subrip. The captions are right there. The browser just refuses to use them.
The reason isn't a bug. The <track> element on HTML5 video, by spec, only accepts one subtitle format: WebVTT. Not SRT. Not SSA, not ASS, not TTML, not SBV. WebVTT only. If the file you have is captions.srt, the browser will load it and silently discard every cue.
What's actually different between SRT and VTT
The two formats are almost identical, which is why this catches people. They're both plain text. They both list numbered cues with a start timestamp, an arrow, and an end timestamp, followed by the caption text. The differences fit on a postcard:
- VTT requires a header line. The very first line of the file must be the literal string
WEBVTT. Optionally followed by a space and a comment on the same line. If that header is missing, the browser parser treats the whole file as invalid and shows nothing. - Timestamps use a dot, not a comma. SRT writes milliseconds as
00:00:04,500. VTT writes00:00:04.500. That's the single character that breaks every cue if you don't fix it. - Cue numbering is optional in VTT. SRT requires each cue to start with a 1-based integer ID. VTT doesn't care. You can keep the numbers; the parser ignores them.
- VTT is UTF-8 only. SRT files in the wild are often Windows-1252 or Latin-1 because legacy DVD ripping tools defaulted to it. Drop one of those into a browser and any accent or em-dash becomes garbage.
Here's the same two-cue file in both formats, side by side.
SRT
1
00:00:01,000 --> 00:00:04,000
Hello, world.
2
00:00:05,000 --> 00:00:09,000
This is the second cue.
VTT
WEBVTT
1
00:00:01.000 --> 00:00:04.000
Hello, world.
2
00:00:05.000 --> 00:00:09.000
This is the second cue.
That's it. Add WEBVTT\n\n at the top, swap commas for dots in the timestamp lines, re-save as UTF-8, and the file works. A real "SRT to VTT converter" is about fifteen lines of regex.
The <track> element that goes with it
Once you have the .vtt file, the markup is one line. Put the file in the same directory as the page (or anywhere CORS-reachable) and add a <track> child inside the <video>:
<video controls width="640">
<source src="lecture.mp4" type="video/mp4">
<track
kind="subtitles"
src="lecture.en.vtt"
srclang="en"
label="English"
default>
</video>
The default attribute is what makes the captions appear without the user clicking the CC button. srclang is what the browser uses to pick the right track when the user has a language preference set. label is what shows up in the CC menu. If you serve the .vtt from a different origin than the page, the response needs Access-Control-Allow-Origin — the browser treats subtitle files as a CORS-restricted resource even though they're text.
Why VTT is genuinely better than SRT for the web
VTT was designed for the web; SRT was designed in 2003 to ship with DivX rips. The web-native features show:
- Positioning cues.
line:,position:,align:, andsize:let you place a caption anywhere on the frame — useful when a chyron already occupies the bottom third. - Styling hooks.
<c.classname>tags inside cues plus aSTYLEblock at the top of the file apply CSS to specific cues. A speaker label can be a different color than the dialog. - Inline timing.
<00:00:02.500>mid-cue timestamps enable karaoke-style word highlighting. - NOTE comments. Translator or QA notes can live in the file without breaking the parse. SRT silently absorbs stray text into the next cue.
For a static .srt shipping with a tutorial video, none of this matters. For a serious video pipeline with editorial styling and accessibility QA, it matters a lot.
The 30-second walkthrough
- Open freefileconverter.ai/srt-to-vtt.
- Drag the .srt file onto the drop zone. Multi-select works — drop ten at once for a full season run.
- The conversion happens the instant the file lands. There's no "convert" button to press because there's nothing to wait for. The offline-pill at the bottom of the page stays lit; the file never leaves your tab.
- Download each .vtt with the per-row button, or hit Download all (ZIP) for batch runs. A typical S01E01–S01E10 conversion finishes before the download dialog opens.
- Drop the .vtt next to your video, point a
<track>at it, ship.
The whole thing runs as a string transform in the browser. We didn't write a server endpoint for this because there's nothing for a server to do. A 200-byte text file going up to a stranger's box for a regex replace is the most obviously wasteful pattern in the "free online converter" category — and it's the default for almost every result on the first page of Google.
Why every "online SRT to VTT" tool is upload-based, and why that's wrong
SEO, mostly. An ad-supported converter site wants you to upload because the upload is the conversion event they track, and the longer they hold the file the more ads they can serve next to a "preparing your download…" spinner. There is no technical reason for any of it — subtitle conversion is byte-for-byte deterministic and fits inside the JS engine your browser already runs.
The cost of uploading: a caption file for an unreleased episode, an internal training video, or a client project ends up in someone else's S3 bucket with no retention policy you can verify. For an NDA workflow that's the only reason that matters. See why "free" converters upload your files for the longer argument.
Going the other direction, or extracting plain text
The reverse case is almost as common: a vendor handed you a .vtt but your editing tool (Premiere, DaVinci Resolve, most YouTube workflows) wants .srt. Same converter family, opposite direction — VTT to SRT strips the WEBVTT header, swaps dots back to commas, and re-numbers cues if needed.
If you want just the transcript text — no timestamps, no cue numbers, just the lines — SRT to plain text drops the timing and concatenates the cue bodies. Useful for searchable transcripts, blog post drafts from a podcast, or feeding the dialog into a translation tool.
Convert SRT to VTT in your browserEdge cases worth knowing
- BOM at the start of the SRT. Some Windows tools save .srt with a UTF-8 byte-order mark. The converter strips it; the
WEBVTTheader has to be the literal first bytes. - Latin-1 encoded SRT. If the original is Windows-1252, the converter detects it and re-emits as UTF-8. Without that step every é would render as é in the browser.
- Overlapping cues. VTT allows them (two captions on screen at once); SRT technically doesn't but most parsers tolerate it. We pass overlaps through unchanged.
- Frame-number timestamps. A few legacy SRT files use frame numbers instead of
HH:MM:SS,mmm. Non-standard; the converter leaves them — they need to be remuxed at the source.
If you control the export side
If you're cutting captions in a tool that lets you pick the output format (Premiere, Resolve, Final Cut, Descript, kapwing), pick WebVTT directly. It skips the conversion step and unlocks positioning and styling if you ever want them. SRT is only the right export when the downstream tool demands it — legacy YouTube uploads, some Vimeo paths, a handful of LMS platforms. For anything inside an HTML5 <video>, WebVTT is the format.