Michael Iarrobino, Product Manager at Copyright Clearance Center, explains the pitfalls of converting full-text PDFs to XML for text mining. To get the best results ...