Legacy Chemoinformatics Modernization

Monolith → Microservices + Graph DB • ETL to AWS • PMI R&D

Weeks → Hours • Two Excellence Awards • Distributed Systems

← Back to Case Studies

The Challenge

A large monolithic chemoinformatics application had become a major bottleneck. Processing that should have taken hours was taking weeks. The codebase was impossible to maintain, data was locked in silos, and every new requirement required risky, time-consuming changes to the monolith. ETL processes were manual, fragile, and on-premise only.

The teams (Chemoinformatics and Bioinformatics) needed a modern, distributed, scalable platform that could handle complex chemical knowledge bases while dramatically reducing processing time and enabling cloud-native ETL pipelines.

Solution Architecture

Microservices Architecture
Graph Database (Neo4j)
ETL Pipelines
AWS Cloud
Distributed Systems
Legacy Modernization

Microservices Architecture: Decomposed the monolith into independent, deployable services for data ingestion, processing, knowledge management, and APIs. Dramatically improved maintainability and deployment speed.

Graph Database: Pioneered adoption of graph databases (e.g. Neo4j) for chemical knowledge-base platforms. Enabled powerful relationship queries across compounds, reactions, assays, and literature that were impossible in the old relational model.

ETL Pipelines + AWS: Replaced manual on-premise ETL with robust, scalable cloud-native pipelines (SDI-to-AWS and similar). Data unification across heterogeneous sources with reliable scheduling and monitoring.

Distributed Systems: Built for scale and resilience — services communicate via well-defined APIs, with clear boundaries and independent scaling.

Delivery Approach (Phased Modernization)

Architecture & Proof of Concept → Defined target microservices architecture and graph DB model. Built small proof-of-concept showing weeks-to-hours processing improvement.
Core Migration (Monolith → Microservices) — Gradually extracted key domains (data ingestion, processing, knowledge base) into independent services while keeping the system running.
Graph DB & Knowledge Layer — Migrated critical chemical knowledge relationships into the graph database. Enabled new types of queries and cross-domain insights.
ETL to Cloud + Distributed Ops — Built and deployed scalable ETL pipelines on AWS. Established proper distributed systems practices, monitoring, and CI/CD.
Ongoing — Continuous support, further service extraction, and knowledge transfer to internal teams.

Phased, low-risk migration strategy. Always delivered value incrementally while modernizing the core platform.

Investment Model

  • Multi-phase project (architecture, migration, graph layer, cloud ETL)
  • Single senior architect/developer leading, with A-Team support on specific streams when needed
  • Focus on incremental value delivery with minimal business disruption
  • Total hours: Multi-quarter effort (exact figures in project records)

Status

Core modernization completed and in production. Processing time reduced from weeks to hours. Graph database approach adopted for chemical knowledge bases. ETL pipelines running reliably in AWS. Two Excellence Awards received from Philip Morris International R&D for this body of work. Multiple systems still in daily use across Chemoinformatics and Bioinformatics teams.

Why It Matters

This is pure "Legacy Rescue" gold. We took a critical, unmaintainable monolithic system that was actively slowing down R&D and transformed it into a modern, distributed, cloud-native platform with graph-based knowledge management and robust ETL. The quantifiable impact (weeks → hours) plus the two Excellence Awards prove that deep architectural modernization, when done right, delivers massive business value.

It also demonstrates the A-Team's core strength: senior architects who can lead complex distributed systems work end-to-end — from monolith decomposition and graph modeling all the way to production ETL on AWS — while mentoring teams and keeping the lights on during the transition.

Stuck with a legacy monolith that's killing your velocity? Let's rescue it →