DOCX to HTML and plain text

Extracting readable content from Word files with mammoth — what to expect from the output.

May 23, 2026 2 min read #documents

What you can do

Open a Word (.docx) file and download either:

HTML — for viewing in a browser or light editing, or
Plain text — paragraphs without formatting.

How it works (simple)

The .docx file is read as a ZIP of XML parts (that is how Office Open XML works).
mammoth walks the document XML and maps Word structures to HTML or text.
The result is wrapped in a minimal HTML page (for HTML export) or plain lines (for text export).
You download the output — still entirely on your machine.

What runs in your browser

mammoth.js focuses on semantic content — headings, paragraphs, lists, links — rather than pixel-perfect layout. It is a popular choice for “good enough” Word extraction in the browser.

Tradeoffs and limits

Layout fidelity: Complex templates, text boxes, and exact positioning are not preserved.
Images: Embedded images may be omitted or simplified depending on the document.
Macros & forms: Not supported; only document body content is targeted.
Legacy .doc: Only modern .docx is supported, not the older binary .doc format.

See also

How Ask Jeeves converts files in your browser