Skip to content

sophia2152/DocumentsParser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

DOCx files parser

Parses the docx file and returns an html string. Any help or critics appreciated.

Supported elements:

  • paragraphs (w:p),
  • images (pic:pic),
  • links (w:hyperlink),
  • tables (w:t),
  • boorkmarks (w:bookmarkStart),
  • lists.

TODO:

  • shapes support
  • table cell styles
  • add links filter
  • images optimization (remove extra equal images)
  • add i18n

Known issues:

  • memory consuming
  • for big files, executing more than 30 sec (default timout time)

Usage example:

<?php
    // load lib
	require_once('DocumentsParser.php');

	// init parser
	$parserSettings = array(
		'filesDestinationFolder' => 'images',
	);

	$defaultStyles = array();

	$parser = new DocumentsParser($parserSettings, $defaultStyles);

	// parse DOCx
	$html = $parser->parseFile('test_document.docx');

	// save content to file
	file_put_contents('test_document.html', $html);
?>

Reminder links:

About

DOCx parser class in php

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 91.9%
  • PHP 8.1%