Red Pill Analytics Builds a Modern Data Platform for the National Bone Marrow Program

The National Marrow Donor Program (NMDP) is a nonprofit organization based in Minneapolis, Minnesota. NMDP operates the Be The Match (BTM) Registry of volunteer hematopoietic cell donors and umbilical cord blood units in the United States. Be The Match is the world’s largest bone marrow donor registry, providing cures for patients with blood cancers like leukemia and lymphoma and other life-threatening diseases. In addition to providing access to the registry, they provide support resources to donors, financial assistance to patients, and funding for research opportunities. In 2018, Be The Match matched 6200 transplants and provided $4.3 million in financial assistance. With the phenomenal success of their life-saving work came challenges.

A BI & Analytics Bottleneck

While Be The Match enjoyed many successes with their BI platform in the past, the gap between ongoing enhancement requests and the capabilities of the existing system became increasingly wider. Further, due to the inflexible nature of the existing toolset, the BI team could not quickly respond to customer requests for bug fixes or enhancements which became continuously more frustrating for both the BI team and their customers. These deficits typically resulted in two outcomes: waterfall-style projects or the business creating their own solutions; neither of which was entirely desirable. Be The Match faced a decision point: upgrade the existing tools or reimagine the technology stack.

With the support of senior leadership, Be The Match decided to reimagine the analytics toolset and hired Red Pill Analytics to help create a modern data platform. The ensuing journey included: evaluating and selecting new technologies, executing proofs-of-concept, procurement, implementation, development, and various degrees of change management.

Evaluating a new BI Stack

During the initial phase of the project, a Red Pill Analytics Solution Architect assisted Be The Match in an objective evaluation of all components of a modern BI & Analytics stack. This included data integration/transformation tools and a data warehouse. The front-end reporting tool evaluation was pushed out to allow representatives from the user community to have more time to review in order to provide valuable input. An exhaustive fact-finding effort resulted in several documents which, combined with discussions between the BI team and various stakeholders, were used to rate and compare tools. While there were several categories and sub-categories, at a high level, the tools were selected based on these driving factors:

  • The data replication tool should have native connectivity to all of the source database platforms and the target warehouse and should replicate the data in real time, using change-log reading mechanisms whenever possible to ensure that data was consumed without interfering with core database workloads.
  • The data transformation tool needed to have native connectivity to the target data warehouse and should use “push-down” data processing. It is also preferred to have robust track-back and documentation functionality.
  • The data warehouse should require low up-front costs and be able to scale up as the environment grows. The data warehouse should also be SQL-based and be able to accommodate relational as well as semi-structured data.
  • All tools must be within budget.

Data Integration & Transformation

After thoughtful consideration, Attunity Replicate and Fivetran were selected as the replication tools, WhereScape RED as the transformation tool, and Snowflake as the data warehouse.

  • Attunity Replicate performs an initial full load and then uses database log mining (whenever possible) to process CDC and has connectors for the required source databases.

  • Fivetran was included to replicate Salesforce and positioned for other applications not supported by Attunity (e.g. JIRA, Google Analytics).

  • Snowflake uses a consumption-based pricing model so there was not a large up-front investment needed to start the project. The environment can grow over time as more data is introduced and user adoption goes up.

  • WhereScape RED has a Snowflake connector, uses push-down (ELT) processing that leverages the Snowflake investment, and provides auto-generation of documentation.

After the tools were selected, a proof-of-concept was conducted by building out a pilot use case that was representative of the larger vision. With the POC successful, Red Pill assisted Be The Match in the next steps of procurement and implementation of the new tools. Implementation activities included architecture design, set-up/configuration, and development for each component of the platform. Throughout the entire process, Red Pill and the BI team worked closely with an Information Security Architect to ensure that the environment adhered to all of Be The Match security policies.

It is worth noting that the overall guiding principle was to replicate the entirety of Be The Match’s databases that contain business data in Snowflake, as opposed to replicating data based on user requirements. This strategy resulted in the BI team being better equipped for conversations with business users in the future as any requested data would already be available in the new platform. It was also mutually beneficial as queries run by analysts and data integration processes could be shifted over to Snowflake and given dedicated compute resources that could be appropriately sized and those queries were no longer running on source system databases, consuming potentially valuable resources for the core application.

Data Viz

With the effort to replicate source data into Snowflake well underway, the remaining task was to select and implement a front-end reporting tool. On par with the evaluation for the rest of the tools in the stack, Red Pill worked with Be The Match to gather information for various reporting tools and compiled a list of potential candidates. A POC was carried out for each of the three finalists and after several iterations involving demos, hands-on development, and discussion, Looker was selected to round out the BI & Analytics platform. Ultimately, Looker was selected for its ability to deliver consistent results across the enterprise, drill-down functionality, and relative ease of use.

Recapping the Tools

  • Attunity Replicate runs on-premises, reads the database changelogs for CDC, and pushes data to Snowflake. It also includes data transformation functionality using SQLite syntax.
  • Fivetran runs in the cloud and has connectors for replicating both on-premises and cloud-based databases and applications. Fivetran includes a light data transformation layer as well.
  • WhereScape RED provides a graphical user interface for development and push-down (ELT) processing which leveraged Be The Match’s investment in Snowflake.
  • Snowflake is a cloud-native data warehouse that separates compute from storage, allowing the environment to grow over time as more data is introduced and user adoption increases.
  • Looker uses a semantic model and native version-control to aid in promoting consistency and data governance and offers an easy-to-use report building interface complete with full API coverage.

The Final Results

Attunity, Fivetran, WhereScape, Snowflake, and Looker collectively created a feature-rich, flexible, and robust Business Intelligence & Analytics platform to effectively and efficiently serve Be The Match and their customers. 

Since Be The Match was new to Looker, Red Pill and the BI team took an iterative approach to dashboard design and development. Following a rapid-delivery/rapid-feedback cadence, each iteration progressed toward more refined reports and dashboards as the BI team learned requirements from the business and the business grew more familiar with Looker.

To align with Be The Match’s vision to shift towards a more data-driven culture, Red Pill Analytics and the BI team leveraged dashboard development as a means to train power users on Looker in an effort to enable discovery and self-service analytics culture throughout the enterprise.

Highlights of the new platform included:

  • A single data source for business data, replicated in real-time
  • Federated Looker data models
  • Vertical-specific Looker Explores
  • Pre-built Looker dashboards for consumption by less-than-power users. An org-wide KPI Looker dashboard displayed on every user’s laptop each morning

While Red Pill Analytics met Be The Match’s requirements for the project, it should be noted that the implementation is ongoing.

One More Thing

Adjusting to a modern data platform can be challenging. Fortunately, Be the Match's data science team was already using tools like R, SAS, and various third-party SQL query clients. RPA collaborated with the team to connect these tools to Snowflake, which allowed them to continue using their tools of choice in the new platform. Red Pill Analytics also assisted with converting queries and processes that had been running on the source databases to ensure continuity. In the end, the successful collaboration between both teams ensured a smooth transition to a modern data platform.

WHAT DO THEY THINK?

“Filler Text for a quote about how amazing we are.”
I.M. Portant, CFO

THE GOALS

  • Build a modern data platform 
  • Avoid a BI & Analytics bottleneck
  • Provide a single source of data truth
  • Promote data governance
  • Allow users to focus on analysis, rather than procurement and curation of data

THE APPROACH

  • Replicate data from various transactional source systems into Snowflake using Attunity and Fivetran
  • Focus on replicating all data right away as opposed to user-driven data requirements
  • Use WhereScape to transform and load data into a data marketplace and data warehouse
  • Use an iterative approach to define user requirements for reports and dashboards
  • Train power users for self-sufficiency

THE RESULTS

  • A single source of truth from a modern data platfor
  • A Data Lake with replicated raw business data
  • A Data Marketplace with lightly-curated data sets for easier consumption
  • A Data Warehouse with dimensional data models, federated across business verticals
  • Looker Explores along with “starter” reports and dashboards delivered by the BI team
  • A dashboard with org-wide KPIs, automatically displayed on each user’s laptop every day