Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126

Research #llm 📝 Blog|Analyzed: Dec 29, 2025 08:28•

Published: Apr 9, 2018 17:23

•

1 min read

Analysis

This article discusses a podcast episode featuring David Rosenberg, a data scientist at Bloomberg, focusing on their work in extracting data from unstructured financial documents like PDFs. The core of the discussion revolves around a deep learning pipeline developed to efficiently extract data from tables and charts. The article highlights key aspects of the project, including the construction of the pipeline, the sourcing of training data, the use of LaTeX as an intermediate representation, and the optimization for pixel-perfect accuracy. The article suggests the episode provides valuable insights into practical applications of deep learning in information extraction within the financial industry.