How to Convert SRT to VTT for HTML5 Video, Locally

You added a <video> element to a page, dropped in a <track> tag pointing at the captions file you already had, and the browser politely ignored it. The video plays. There is no "CC" button. DevTools shows the file loaded fine — HTTP 200, content-type application/x-subrip. The captions are right there. The browser just refuses to use them.

The reason isn't a bug. The <track> element on HTML5 video, by spec, only accepts one subtitle format: WebVTT. Not SRT. Not SSA, not ASS, not TTML, not SBV. WebVTT only. If the file you have is captions.srt, the browser will load it and silently discard every cue.

Open the SRT to VTT converter

What's actually different between SRT and VTT

The two formats are almost identical, which is why this catches people. They're both plain text. They both list numbered cues with a start timestamp, an arrow, and an end timestamp, followed by the caption text. The differences fit on a postcard:

VTT requires a header line. The very first line of the file must be the literal string WEBVTT. Optionally followed by a space and a comment on the same line. If that header is missing, the browser parser treats the whole file as invalid and shows nothing.
Timestamps use a dot, not a comma. SRT writes milliseconds as 00:00:04,500. VTT writes 00:00:04.500. That's the single character that breaks every cue if you don't fix it.
Cue numbering is optional in VTT. SRT requires each cue to start with a 1-based integer ID. VTT doesn't care. You can keep the numbers; the parser ignores them.
VTT is UTF-8 only. SRT files in the wild are often Windows-1252 or Latin-1 because legacy DVD ripping tools defaulted to it. Drop one of those into a browser and any accent or em-dash becomes garbage.

Here's the same two-cue file in both formats, side by side.

SRT

1
00:00:01,000 --> 00:00:04,000
Hello, world.

2
00:00:05,000 --> 00:00:09,000
This is the second cue.

VTT

WEBVTT

1
00:00:01.000 --> 00:00:04.000
Hello, world.

2
00:00:05.000 --> 00:00:09.000
This is the second cue.

That's it. Add WEBVTT\n\n at the top, swap commas for dots in the timestamp lines, re-save as UTF-8, and the file works. A real "SRT to VTT converter" is about fifteen lines of regex.

The `<track>` element that goes with it

Once you have the .vtt file, the markup is one line. Put the file in the same directory as the page (or anywhere CORS-reachable) and add a <track> child inside the <video>:

<video controls width="640">
  <source src="lecture.mp4" type="video/mp4">
  <track
    kind="subtitles"
    src="lecture.en.vtt"
    srclang="en"
    label="English"
    default>
</video>

The default attribute is what makes the captions appear without the user clicking the CC button. srclang is what the browser uses to pick the right track when the user has a language preference set. label is what shows up in the CC menu. If you serve the .vtt from a different origin than the page, the response needs Access-Control-Allow-Origin — the browser treats subtitle files as a CORS-restricted resource even though they're text.

Why VTT is genuinely better than SRT for the web

VTT was designed for the web; SRT was designed in 2003 to ship with DivX rips. The web-native features show:

Positioning cues. line:, position:, align:, and size: let you place a caption anywhere on the frame — useful when a chyron already occupies the bottom third.
Styling hooks. <c.classname> tags inside cues plus a STYLE block at the top of the file apply CSS to specific cues. A speaker label can be a different color than the dialog.
Inline timing. <00:00:02.500> mid-cue timestamps enable karaoke-style word highlighting.
NOTE comments. Translator or QA notes can live in the file without breaking the parse. SRT silently absorbs stray text into the next cue.

For a static .srt shipping with a tutorial video, none of this matters. For a serious video pipeline with editorial styling and accessibility QA, it matters a lot.

The 30-second walkthrough

Open freefileconverter.ai/srt-to-vtt.
Drag the .srt file onto the drop zone. Multi-select works — drop ten at once for a full season run.
The conversion happens the instant the file lands. There's no "convert" button to press because there's nothing to wait for. The offline-pill at the bottom of the page stays lit; the file never leaves your tab.
Download each .vtt with the per-row button, or hit Download all (ZIP) for batch runs. A typical S01E01–S01E10 conversion finishes before the download dialog opens.
Drop the .vtt next to your video, point a <track> at it, ship.

The whole thing runs as a string transform in the browser. We didn't write a server endpoint for this because there's nothing for a server to do. A 200-byte text file going up to a stranger's box for a regex replace is the most obviously wasteful pattern in the "free online converter" category — and it's the default for almost every result on the first page of Google.

Why every "online SRT to VTT" tool is upload-based, and why that's wrong

SEO, mostly. An ad-supported converter site wants you to upload because the upload is the conversion event they track, and the longer they hold the file the more ads they can serve next to a "preparing your download…" spinner. There is no technical reason for any of it — subtitle conversion is byte-for-byte deterministic and fits inside the JS engine your browser already runs.

The cost of uploading: a caption file for an unreleased episode, an internal training video, or a client project ends up in someone else's S3 bucket with no retention policy you can verify. For an NDA workflow that's the only reason that matters. See why "free" converters upload your files for the longer argument.

Going the other direction, or extracting plain text

The reverse case is almost as common: a vendor handed you a .vtt but your editing tool (Premiere, DaVinci Resolve, most YouTube workflows) wants .srt. Same converter family, opposite direction — VTT to SRT strips the WEBVTT header, swaps dots back to commas, and re-numbers cues if needed.

If you want just the transcript text — no timestamps, no cue numbers, just the lines — SRT to plain text drops the timing and concatenates the cue bodies. Useful for searchable transcripts, blog post drafts from a podcast, or feeding the dialog into a translation tool.

Convert SRT to VTT in your browser

Edge cases worth knowing

BOM at the start of the SRT. Some Windows tools save .srt with a UTF-8 byte-order mark. The converter strips it; the WEBVTT header has to be the literal first bytes.
Latin-1 encoded SRT. If the original is Windows-1252, the converter detects it and re-emits as UTF-8. Without that step every é would render as Ã© in the browser.
Overlapping cues. VTT allows them (two captions on screen at once); SRT technically doesn't but most parsers tolerate it. We pass overlaps through unchanged.
Frame-number timestamps. A few legacy SRT files use frame numbers instead of HH:MM:SS,mmm. Non-standard; the converter leaves them — they need to be remuxed at the source.

If you control the export side

If you're cutting captions in a tool that lets you pick the output format (Premiere, Resolve, Final Cut, Descript, kapwing), pick WebVTT directly. It skips the conversion step and unlocks positioning and styling if you ever want them. SRT is only the right export when the downstream tool demands it — legacy YouTube uploads, some Vimeo paths, a handful of LMS platforms. For anything inside an HTML5 <video>, WebVTT is the format.

How to Convert SRT to VTT for HTML5 Video, Locally

What's actually different between SRT and VTT

SRT

VTT

The `<track>` element that goes with it

Why VTT is genuinely better than SRT for the web

The 30-second walkthrough

Why every "online SRT to VTT" tool is upload-based, and why that's wrong

Going the other direction, or extracting plain text

Edge cases worth knowing

If you control the export side

Frequently asked questions

Related reading

Related tools

How to Convert SRT to VTT for HTML5 Video, Locally

What's actually different between SRT and VTT

SRT

VTT

The <track> element that goes with it

Why VTT is genuinely better than SRT for the web

The 30-second walkthrough

Why every "online SRT to VTT" tool is upload-based, and why that's wrong

Going the other direction, or extracting plain text

Edge cases worth knowing

If you control the export side

Frequently asked questions

Related reading

Related tools

The `<track>` element that goes with it