Building a Data Catalog That Meets Your Needs
Most of the well-established data catalog tools – such as IBM IGC, Informatica EDC, and Collibra – were created over a decade ago and, not surprisingly, are a bit out of synch with current data cataloging requirements. For example, most of them are focused on describing (critical) data elements in great detail, which was great at early maturity stages of data governance.
Today’s requirements are aligned with enabling data self-service, self-serve business intelligence (BI) & analytics – and include (but are not limited to) the following capabilities:
- Describe a dataset (i.e. a set of data elements forming an informational unit)
- Track its provenance (i.e. show where it comes from)
- Control authenticity of data values (i.e. whether any values borrowed from a preceding dataset are changed at the time of the new dataset publishing)
“Legacy” vendors are enhancing their offerings, e.g. IBM has added integration with Watson Knowledge Catalog to its Information Governance Catalog, Collibra has added Machine Learning capabilities to its Data Governance Catalog, Informatica has added Axon to enhance its Enterprise Information Catalog. However, all these enhancements seem to share the typical symptoms of “catching up” – integration problems, complicated product offering, unclear product strategy.
Since product integration problems exist even in the “original” vendor offerings, third-party software vendors are jumping in with their add-ons to the existing suites, e.g. Compact Solutions, which is compatible with Collibra DGC, IBM IGC, and Informatica EDC.
Note: Some master data management and ETL platforms have fairly rich data cataloging capabilities built into their platforms – e.g. Ataccama ONE, TIBCO EBX (a.k.a. Orchestra), Talend Data Fabric, Anzo (by Cambridge Semantics).
If you are looking for a suit of data cataloging functionalities in a seamlessly packaged suite, then you should look for the new generation products – such as TopBraid Enterprise Data Governance, Alation Data Catalog, Waterline Data Catalog.
Bottom Line
Whether you already have a “legacy” data catalog or are looking to establish a new one, do not compromise on the required functionalities – there are lots of technology options.
Put your requirements first: user-friendly comprehensive catalog of your organizational data assets with business descriptions, rich metadata, and complete provenance information.
“Search & browse” are today’s “staple” requirements – “select & connect” are the essentials to enable true data and analytics self-service.