contact Search
Search
Insight

Leaving a legacy: build the right data foundations to scale AI

Jenna Goldstein

For many large organisations, the ambition to implement artificial intelligence (AI) runs headfirst into the reality of data in legacy estates: fragmented storage, inconsistent semantic models, and information that is hard to find, trust, or move. Leaders who want to scale beyond their AI pilots may need to go back to basics and consider whether their existing data foundations are fit for purpose.  

While often challenging, creating a strong data foundation is achievable with the right approach. Ensure you are taking the key steps to make critical data accessible, build the right infrastructure and governance, and pursue iterative improvements.

The critical role of data quality and accessibility

AI is only as good as the data it is trained on. Poor data quality, inaccessible data and data silos can all undermine your AI initiatives. Organisations must invest in data governance, data integration, and data quality management to ensure that their AI systems have access to the quality information they need to deliver business value. 

A strong foundation begins with accessibility. As a senior digital leader explained, speaking at Berkeley’s AI: beyond the pilot panel discussion event, data must be reachable across legacy and cloud platforms. Fortunately, the tools available to clean and integrate that data have matured dramatically in just a few years. 

“Data is important and integral. The capabilities to clean it, move it between different systems and integrate it across different hierarchies has moved forward. … Having the right tools now to ensure that your data is accessible data, whether it's in a cloud infrastructure or a legacy system, is key,” she said.  

Driving value from unstructured data

A significant step to realising bottom-line impact from AI can be found in systematically converting unstructured data (e.g. documents, emails, reports, call transcripts) into structured data – to be ingested by AI models and drive business decisions.  

One of our panellists, a senior AI leader at a global transportation business, described how this move is already paying off in a highly complex, legacy heavy environment. 

We've got a bunch of use cases where we pointed LLMs and trained them on various types of unstrcutured data. It's used all over the business now to do various different things, leading to bottom-line benefit. Use the models that turn your unstructured data into structured data and drive value from that... It's been literally a money maker”

A senior AI leader, speaking at Berkeley's event, AI: beyond the pilot

 His fellow panellists agreed that AI value isn’t only found in petabytescale lakes. The senior digital leader shared that her organisation is focusing on leveraging ‘small data’ as a valuable source of insight.  

“One of the things we're teaching people is if you have SharePoints for your team – if you have teams, channels, or individual documents that are sitting on your computer – you need to make those findable, accessible and readable for AI to enable great decision-making,” she said. 

Defining the 'minimum viable data foundation' to serve AI

Organisations frequently ask what ‘minimum’ quality is required to unlock AI benefits, especially when pressure mounts to deliver value quickly. Legacy systems are a fact of life for many organisations and replacing them wholesale is unlikely to meet that time pressure. Instead, it is often more practical to integrate them with modern data platforms and tools. This allows organisations to leverage their existing investments while enabling new capabilities. 

As raised during the Q&A portion of our panel event, some organisations may need to “build the plane while flying it” i.e. make incremental data improvements while deploying highimpact AI use cases. 

Pragmatically, this could mean prioritising a few datasets that can power your first AI products. Pick high-value domains and deliver tangible outcomes while you lift quality iteratively. 

Then treat every AI deployment as a feedback loop and use that insight to focus data remediation where it matters most. 

A practical playbook for legacy environments

An example sequence that organisations could apply:

1.

Catalogue and connect priority sources of sizes that are pragmatic for first AI products.

2.

Stand up a modern data engineering stack; automate ingestion, quality checks, and schema harmonisation.

3.

Pilot unstructured to structured pipelines to unlock value from typically vast stores of inaccessible information in documents, email and media.

4.

Index 'small data' (SharePoint/Teams) to improve information retrieval and knowledge generation for staff.

5.

Simplify apps in waves with continuity gates and agreed data models.

6.

Embed risk controls: contestability checks, audit trails, fallback playbooks.

7.

Iterate relentlessly: treat every AI deployment as an identifier for data issues and fix what matters first.

Data: an asset investment for AI scale

A legacy environment doesn’t have to be a barrier to scaling AI. For many organisations, it’s a reality to be navigated.  

A successful AI strategy starts with recognising data as a valuable assetBy investing in data quality, integrating legacy systems, and leveraging both big and small data, organisations can unlock the full potential of AI and drive meaningful business outcomes.