Romanian language. Our text analysis is performed in two steps: document structure
detection and text normalization. The output is a tree-based representation of the processed
data. Parsing is made efficient with the help of the Boost Spirit LL parser [1], the usage of this
tool allowing for a greater flexibility in the source code and in the output representation.