What you can do
Open a Word (.docx) file and download either:
- HTML — for viewing in a browser or light editing, or
- Plain text — paragraphs without formatting.
How it works (simple)
- The
.docxfile is read as a ZIP of XML parts (that is how Office Open XML works). - mammoth walks the document XML and maps Word structures to HTML or text.
- The result is wrapped in a minimal HTML page (for HTML export) or plain lines (for text export).
- You download the output — still entirely on your machine.
What runs in your browser
mammoth.js focuses on semantic content — headings, paragraphs, lists, links — rather than pixel-perfect layout. It is a popular choice for “good enough” Word extraction in the browser.
Tradeoffs and limits
- Layout fidelity: Complex templates, text boxes, and exact positioning are not preserved.
- Images: Embedded images may be omitted or simplified depending on the document.
- Macros & forms: Not supported; only document body content is targeted.
- Legacy
.doc: Only modern.docxis supported, not the older binary.docformat.