Document categorization and indexing is often either purely manually rule-based or purely driven by statistics. In real-world scenarios the short comings of both approaches, respectively, prove problematic: Rule-based approaches require a lot of resources and insight. Statistical approaches require considerable training data which my not be available.
Document categorization and indexing in the Cogito Studio allows to employ powerful machine learning methods that leave the user with human readable rules, ready for inspection and further manual refinement.
Using a rich stack of morphological, syntactical and semantic components, the user can decide to selectively dive into specific rules where the result of the automatic training procedure appears not satisfactory, start from scratch alltogether or alternatively accept the results of the automatic training without spending additional efforts.
Wrapped in an intuitive application, the approach has proven to deliver competitive results in real-world client evaluations and is today a key component in the portfolio of Expert System that enables internal as well as external users to benefit from semantic processing in their projects.
Computational Linguistics in Erlangen-Nürnberg. Researcher at CMU in Pittsburgh, then 7 years at IBM Scientific Center in Heidelberg in NLP projects, before co-founding TEMIS and being the managing directors of the german branch of TEMIS. Since the merger of TEMIS and Expert System managing director of Expert System in Germany.