JTidy is a Java port of HTML Tidy (http://www.w3.org/People/Raggett/tidy/). Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.