Mentor, data infrastructure architect, data governance practitioner, software engineer, independent scientist.
I have been working in the domain of data warehousing and data governance for over 16 years, as an architect, software engineer and consultant. I worked for leading tech companies in the Silicon Valley and elsewhere, such as Facebook, Squarespace and Anaconda. I have advised and worked for various non-governmental organizations as well, to introduce industry practices into the space of open government data.
Currently on sabbatical. Doing independent research in the cross-disciplinary field of complex systems, virtual evolution, unconventional computing and non-linear languages. Developing a bio-chemistry inspired programming language.
I am an open-source creator and contributor, conference speaker and a creator and lead developer of open-source data warehouse toolkit Data Brewery and it’s flagship multi-dimensional analytical server Cubes.
My core expertise is in business intelligence and data warehousing. I consult in the areas of data warehouse architecture, data governance, data quality in addition to designing metadata repositories and unified sources of organizational truth.
I help organisations decommission suffering-causing, usually legacy, systems.
- Data infrastructure architecture
- Systems for data governance and data quality management
- Metadata systems, metadata-based processing and metadata modelling
- Relational algebra
- Design of domain specific languages
Values: System’s adaptability, ecosystem’s technology agnosticism and transparency of data quality.
- Researching possibilities and engineering architecture of an ecosystem to develop an universal body of knowledge about complex systems
- Combining discoveries in bio-chemistry, for the purpose of conceptual complex systems simulation
- Design and development of domain specific languages and compilers
- Facebook: Tech lead for data warehouse architecture for revenue data streams. Designed and developed declarative and technology-agnostic ETL/data framework to develop metadata-driven pipelines for multi-dimensional metrics.
- Squarespace: Tech lead for data warehouse migration, designed a new warehouse architecture. Introduced and integrated metadata driven OLAP server into the data ecosystem to assure reporting consistency.
- Knowerce: Founder of data consultancy serving multinational clientele (Anaconda, Raiffeisen, Open Knowledge International, Pfizer, Transparency International, …)
- Orange Slovakia: Customer intelligence systems, datamart design and development, data integration.
- Open Knowledge International: Defined a foundation of data processing pipeline concepts for School of Data.
- Open Knowledge Labs: providing business intelligence and data warehousing expertise, helping to form open-data standards.
- Transparency International Slovakia: Built very first analytical open-data business intelligence portal in Slovakia (and CEE region) for Open Public Procurements.
- Fair Play Alliance Slovakia: Developed first open data portal with data quality management elements in Slovakia.
A multi-dimensional conceptual data framework and server. The main features are OLAP and aggregated browsing with default relational database, multi-dimensional analysis, logical view of analysed data. The purpose was to focus on how analysts look at data, how they think of data, not not how the data are physically implemented in the data stores, hierarchical conceptual dimensions. The framework is SQL-dialect agnostic and uses relational-algebra with dialect-specific compilers to generate concrete database queries.
Links: Project Home, Github sources, Documentation
An experimental Python framework for data processing and data quality measurement. Basic concept are abstract data objects, operations and dynamic operation dispatch.
Links: Project Home, Github sources, Documentation
Small utility library for embedding arithmetic expressions parser and compiler into other libraries and applications.
Links: Github (contains documentation)
Smalltalk implementation on top of Objective C runtime. Used as a scripting framework for creating scriptable servers or applications. StepTalk, when combined with the dynamism that the Objective-C language provides, goes way beyond mere scripting. It is written using GNUstep.
Was a toolkit for multi-agent based simulations written in ObjectiveC. Featured iterative simulator, simulation server, data probing and collecting mechanism and virtual laboratory application (Farmer) to control and visualize the simulation.
Other minor libraries from the past:
- XY - two-dimensional plotting in ObjectiveC/OpenStep/early Cocoa
- Develpment Kit - ObjectiveC source code generator
- Various contributions to the GNUstep project.
Biochemsitry inspired programming language. Experimental research project.
Links: Slides, Document, Github
- Data Natives 2019, Berlin: Forces and Threats in a Data Warehouse or why Architecture and Metadata Matters. (slides)
- PyData NYC 2014: Panel: Python in Business Intelligence
- PyData NYC 2014: Cubes 1.0 (Python OLAP) – new features (video, slides)
- Transparency Camp 2014, Washington DC, USA – Open-source OLAP
- Data Harvest 2014, Brussels, – panel Journos and Codes cooperating, talk Lessons from Business Intelligence for Open Data, talk Data Governance – why and how?,
- PyCon 2014, Montreal – Cubes – Distributed Data Warehouse (Scribd document/pdf)
- PyData 2012, New York – talk Python in Business Intelligence (video, slides); lightning talk Cubes - Lightweight OLAP (video); lightning talk PyData Academy (video, slides)
- PyTexas 2012, College Station, TX – lightning talk – Cubes OLAP
- Data Harvest, May 2012, Brussels – Open Public Procurements of Slovakia
- EuroPython 2012, Florence, Italy – talk and training: Cubes – lightweight Python OLAP (video, slides)
- BigClean 2011, Prague, Czech Republic – Open Data Data Quality (slides)
- Transparency Camp 2011, Washington DC, USA – Slovak Open Public Procurements; Cubes - open-source OLAP
- Open Knowledge Conference 2010, London, UK – Screen-scraping Slovak Public Procurements
- Transparency Camp 2010, Washington DC, USA – Data Camp and Data Camp ETL – first Slovak Open Data projects
- E-Democracy 2009, Berlin, Germany, Open Data in Slovakia and Data Camp - a Data publishing application
- Znalosti 2004 (Knowledge 2004) – “The Trust – Evolutionary Simulation and Modelling”, February 2004
- ESUG 2003 – Smalltalk Conference – “StepTalk”, August 2003
- Cognition, artificial Life and Computer Intelligence, Stará Lesná, Vysoké Tatry. “Learning of a System Using Simulation with Minimal Assumptions”, May 2003