Imagine a world where your data is a ship sailing on the vast ocean of information, where its origins and journey are unknown and its destination uncertain. Without a proper navigation system, this ship could easily get lost or end up in the wrong port. This is where data lineage comes in.Bigeye is a data observability company that gives data quality engineering teams the ability to measure, improve and communicate data quality. Underbelly partnered with Bigeye to lead the discovery and design for a multifaceted lineage feature that would provide value to customers and differentiate Bigeye from the market.
What is data lineage?
Data lineage is the process of tracking data from its origin or source to the point where it is used or consumed, including all the transformations, movements, and modifications that occur to the data as it flows through various systems and processes. It's like a GPS for your data, helping you understand where your data came from, where it's going, and what's been done to it. When it comes to businesses, data lineage is crucial for ensuring data quality, compliance, and governance.
Look both ways
Upstream and downstream lineage are like two sides of a coin, each providing a unique perspective on the data's journey. Upstream lineage tracks data from its origin or source to the point where it is used, including all the transformations, movements, and modifications that occur to the data as it flows through various systems and processes. It helps organizations identify the root cause of any issues and understand the underlying reasons for the problem. Downstream lineage, on the other hand, tracks data from the point where it is used to its final destination. It helps organizations understand how changes to the data at one point in the lineage will affect the final outcome and decision making through impact analysis. Together, upstream and downstream lineage provide a comprehensive understanding of the data's journey and help organizations make informed decisions based on accurate and reliable information.
When it came to the visual design for data lineage, it was important to consider how the data's journey would be represented. The visual design should be easy to understand, with a clear representation of the data's origin, destination, and any transformations or modifications made to the data. Additionally, it should be easy to navigate, with clear labels and hierarchy of information.Another important factor to consider in the design process is how the data lineage would be integrated into existing systems and processes and what processes would be used to collect and store the metadata. This ensured that the data lineage is user-friendly and seamlessly integrated with the users' workflow. By considering all these factors, we could create an effective and user-friendly data lineage design that helps organizations make informed decisions based on accurate and reliable information.
Our unique approach
The unique approach we implemented for designing complex lineage is to show just the right amount of information to help the user understand a data issue while also giving a clear recommendation on what action to take next.In upstream lineage, the lineage graph presents recommendations and navigation links to what Bigeye infers are the root causes of a particular issue. The user can jump to those root causes and tackle the rest of the root cause analysis from there with confidence that resolving issues there will be the most impactful.In downstream lineage, the lineage graph highlights the impacted nodes in the data pipeline. By highlighting the path to affected data entities, Bigeye reduces the effort required to understand the impact of an issue on the entities a user cares about. A user can search and filter by business intelligence tools and export the results as a CSV.Additionally, the lineage designs at Bigeye are understandable to a broader audience beyond experienced Data Engineers. We added tooltips to explain the different lineage table states instead of assuming the user would be familiar with them. We also did user research to ensure the terms we used in our designs were industry standard.
With every lineage feature and iteration, we considered how scaling would affect the lineage look and function and how users interact with it. To make large pipelines more readable and easier to view, we introduced zooming features, clear hover interactions, and stoplight colors to represent table health.We collaborated with engineering teams to develop a system of accessibility features that support screen readers and allow users to tab through lineage while keeping the graph’s integrity. Building for accessibility will make product analytics more consistent.
The lineage feature not only aligns with the existing Bigeye product but also elevates Bigeye above competitors to offer a unique take on lineage—giving users more targeted issue resolution recommendations. Lineage brought significant business value to Bigeye and helped differentiate it as a leader of lineage visualizations in the market.