Revival of the JTidy Project updated to work with HTML5 new tags. Along with new option to not remove unknown tags:
Tidy tidy = new Tidy();
tidy.setDropProprietaryTags(false);
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
You can use JTidy as an html checker/pretty-printer or as a DOM parser.
First of all, you will need to download a JTidy distribution. Inside it you will find a jtidy.jar
containing all
classes, no other libraries are needed.
Now that you have JTidy you can use it in different ways.
Run java -jar jtidy.jar {options}
to access the JTidy command line interface.
java -jar jtidy.jar -h
will output a short help on JTidy command line with a few examples.
java -jar jtidy.jar -help-config
outputs all the available configuration options and
java -jar jtidy.jar -show-config
the current (default) values.
Detailed instructions on how to use the JTidy ant task can be found in JTidyTask
JavaDoc.
To use JTidy embedded in your program, you best set up a Maven dependency to the official release at maven-central:
<dependency>
<groupId>com.github.jtidy</groupId>
<artifactId>jtidy</artifactId>
<version>1.0.5</version>
</dependency>
If you require a Java-6-compatible version, you can use the back-ported artifact:
<dependency>
<groupId>com.github.jtidy</groupId>
<artifactId>jtidy-java6</artifactId>
<version>1.0.4</version>
</dependency>
The entry point for accessing JTidy functionalities is the org.w3c.tidy.Tidy
class. This is a simple usage example:
Tidy tidy = new Tidy(); // obtain a new Tidy instance
tidy.setXHTML(boolean xhtml); // set desired config options using tidy setters
... // (equivalent to command line options)
tidy.parse(inputStream, System.out); // run tidy, providing an input and output stream
Using parseDOM(java.io.InputStream in, java.io.OutputStream out)
instead of parse()
you will also obtain a DOM
document you can parse and print out later using pprint(org.w3c.dom.Document doc, java.io.OutputStream out)
(note that the JTidy DOM implementation is not fully-featured, and many DOM methods are not supported).
JTidy also provides a MessageListener
interface you can implement to be notified about warnings and errors in your
HTML code. For details on advanced uses refer to the JTidy JavaDoc.
JTidy was initially written by Andy Quick. The project has been maintained at sourceforge.net by Fabrizio Giustina from 2004 to 2010. Since the JTidy project on SourceForge.net seemed to fall into disrepair years ago and had not been updated for years. A few had forked it on Github. William L. Thomson Jr. came along and created a fork of others forks with a tag for his packaging needs as a dependency for JMeter. Then another came along, Dell Green who noticed some issues, tests failing, and undertook fixing both.
Since the code belonged to neither, William decided to create a JTidy organization and revive the project via community support. Which you are welcome to join in. Eventually this should become the official new home for JTidy.
Thanks to all past authors and developers. Those of which who could be found on Github have been invited to join this project. Along with those that this repository was forked from.
You are welcome to contribute issues and pull-request. Please have a look at the coding conventions and test cases.
This project is licensed under the zlib/libpng license. More information is available in the LICENSE.txt file
The project is looking for new contributors and project maintainers.
Checkout v.Nu validator for a possible modern replacement.