Despite recent efforts to achieve a high level of interoperability of Machine Learning (ML) experiments, positively collaborating with the Reproducible Research context, we still run into problems created due to the existence of different ML platforms: each of those have a specific conceptualization or schema for representing data and metadata. This scenario leads to an extra coding-effort to achieve both the desired interoperability and a better provenance level as well as a more automatized environment for obtaining the generated results. Hence, when using ML libraries, it is a common task to re-design specific data models (schemata) and develop wrappers to manage the produced outputs. In this article, we discuss this gap focusing on the solution for the question: “What is the cleanest and lowest-impact solution, i.e., the minimal effort to achieve both higher interoperability and provenance metadata levels in the Integrated Development Environments (IDE) context and how to facilitate the inherent data querying task?”. We introduce a novel and low-impact methodology specifically designed for code built in that context, combining Semantic Web concepts and reflection in order to minimize the gap for exporting ML metadata in a structured manner, allowing embedded code annotations that are, in run-time, converted in one of the state-of-the-art ML schemas for the Semantic Web: MEX Vocabulary.
Diego Esteves is currently a PhD student at the AKSW - Smart Data Analytics Research Group. Before starting at the AKSW group in Leipzig he worked for over 10 years in large companies such as Accenture, B2W Inc., Wilson Sons and BTG Pactual Investment Bank. He has large work experience in system integration, data analysis, applied machine learning, anti-money laundering models, stock markets, supply chain management and logistics systems. Esteves was approved for studying in a Federal Technical High School in 2002, where he obtained his degree in Data Processing (2004) as well as worked part-time as trainee at IBOPE (Brazilian Institute of Public Opinion and Statistics). He also received his Bachelor in Information Systems at CEFET-Rio (2009), a MBA in Software Engineering at UFRJ (Federal University of Rio de Janeiro) in 2010 and a Master of Science in Machine Learning applied to Stock Market at IME (Military Institute of Engineering) in 2014, Brazil. His main research topics are: Fact Finding Algorithms and Machine Learning Interoperability / Metadata Generation.