Back

Data Warehouse and Artificial Intelligence: How to Connect Structured Data and Intelligent Models?

Source: Äri-IT Spring 2025

Author: Tanno Pärli, Business Analysis Consultant at BCS Itera

Artificial intelligence has made significant strides in recent years, with a notable portion of this development linked to Large Language Models (LLMs) such as ChatGPT and Copilot.

We often encounter them in online conversations where AI responds in a human-like manner. However, when it comes to more serious business needs or technical applications, simply chatting isn’t enough. Valuable applications require AI to have access to the right context and accurate information – and this is where the data warehouse comes into play. Below, we will explore why correct prompting (i.e., how a task or question is formulated for AI) is crucial, what the RAG (Retrieved Augmented Generation) methodology entails, and how a data warehouse provides the foundation for this process.

Artificial Intelligence and LLMs: A Convenient Chat Partner or a Serious Tool?

Anyone who has heard of ChatGPT or Copilot knows that these models allow us to ask questions and receive detailed answers within seconds. The so-called intelligence of AI is particularly evident in its natural language use and sometimes surprisingly accurate responses. Nevertheless, we may notice that when asking something very specific, narrow, or internal to a company, the answer can become inaccurate or be completely absent. The reason is simple: large language models have been trained on a vast amount of general information, but they lack detailed knowledge of a specific organization’s data or a very specialized field. This rightfully raises the question: how can we get ChatGPT or another language model to solve our actual business or research tasks? Of course, one can try their luck by asking a question directly, hoping that the model knows the answer. However, it’s important to remember that public LLMs are primarily trained on publicly available texts (web texts, books, articles). If the model lacks access to the correct data, the accuracy of the response suffers.

The Importance of Correct Prompting: Asking is (More Than) Half the Answer

It might seem that using artificial intelligence is incredibly simple: you ask a question, and the model generates an answer. Yet, the reality is a bit more complex because the question needs to be presented correctly to the LLM. This is where the term “prompt engineering” comes into play, encompassing the methods and practices of formulating tasks for AI in a way that results in the best possible answer.

Context: The wording we use to describe the task signals to the model what and how to search. For example, it can be very important to mention timeframes, industry conditions, and consider the language of the target audience.
Structure: If we want to obtain specific data points, it might be useful to ask the model to present the answer in a table format or with references to specific documents.
Expected Output: By specifying whether we want an argued overview, conclusions, a checklist, a summary, or a forecast, we can guide the model’s reasoning.

LLMs are extremely sensitive to the form and style of the input (the prompt). Simply asking “tell me about this or that” can yield a general and inaccurate answer. But when we add context, provide guiding keywords, and structure the question thoughtfully, the result is often a much higher quality response.

RAG (Retrieved Augmented Generation) – The Key to Valuable Answers

The most exciting development in the field is retrieved augmented generation, or RAG for short. What does this mean? It’s a methodology where the natural language model of artificial intelligence is supplemented (augmented) with specific data, which usually involves an organization’s internal documents, databases, or other sources. In other words: before we give a task to an LLM, we search our data repositories for the information needed for context and add it as part of the prompt.

Why is this necessary? Imagine we want a sales forecast for a specific product category for the next quarter. A general LLM might provide world-class examples and theories, but it cannot give an accurate answer within the context of a specific company – at least not without the data that the company itself has collected. With RAG, before querying the model, the system searches the data warehouse or document management system for specific historical sales figures, target market conditions, customer portfolio data, etc. This information is added to the prompt so that the LLM can work with real and accurate numbers.

Is it confidential and secure? When RAG is implemented correctly, the organization doesn’t need to upload its data to a public language model at all. Instead, an internal service logic is created where the LLM is only given the context needed to formulate an answer. The data is used only within the company, so the model can enrich its knowledge with relevant information.

The key to this process is the “retrieve” – the artificial intelligence application must be able to effectively identify important information from the sources. This is where the importance of data warehouses comes in.

Data Warehouse as a Central Component of RAG

For RAG to work, data must be accessible, in a unified format, and reliable. Although it might seem that “a database is a database,” a data warehouse is actually more than just random data storage. A data warehouse is designed to provide:

A single view of data: All data important to the organization is consolidated into a structured format, taking into account various dimensions such as time, products, customers, geographical location, etc.
Quality control: Often, there are many sources of data – ERP systems, CRMs, marketing platforms, etc. Data warehouse developers and administrators must ensure that the information entering it is validated, consistently understood, and standardized where necessary (e.g., names, currencies, units).
A persistent historical perspective: Many regular databases can change or overlap over time, but a data warehouse is designed to maintain a chronological historical archive, which is particularly important for analyzing business processes and identifying trends.

For artificial intelligence operating on a RAG basis, a data warehouse is a goldmine. A correctly built warehouse helps to quickly find relevant records, figures, and context that the model needs to generate an accurate answer.

Why Isn’t a Regular Data Collection Sufficient?

Many companies believe they already have enough data, scattered across various systems – isn’t that enough? Theoretically, an AI application can work to some extent with distributed data masses, but the result is not as reliable as with a structured approach. The main reasons are:

Lack of context: Random CSV files, Excel spreadsheets, and log files may not fit together, let alone have a clear connection to the time dimension or other dimensions.
Different semantics: In one table, a field might be “CustomerID,” in another “Client#,” and in a third “Contact.” Essentially, they refer to the same thing but are not clearly and consistently labeled.
Lack of quality control: Manually managed data sources tend to contain inconsistent formatting, typos, missing records, etc.

A data warehouse, on the other hand, offers a standardized and time-tested way to integrate information from different sources, link it into a unified whole, and maintain it as a high-quality data collection.

A Prompt That Grabs Data from the Data Warehouse: How Does It Work?

If we want to implement RAG, we need to create a process that links the data warehouse with the AI prompt. In general terms, it might look like this:

User question or problem: For example, “What is our region-based sales trend for product X over the last six months, and what recommendations would AI give to sales representatives?”
Identification of relevant information: The application queries the data warehouse and retrieves data rows that include the sales of product X for the last six months in all regions.
Data structuring: The system formats the data as a short but informative text or table (e.g., “Sales results for region A: 10,000 units, region B: 8,000 units…”).
Context insertion into the prompt: Before submitting the question to the LLM, the structured information (along with necessary explanatory wording) is added to the prompt.
Artificial intelligence responds: The LLM analyzes the received context, applies its language and pattern recognition skills, and generates recommendations, forecasts, or other expected output.

As a result, we get a much more accurate and specific answer because the AI essentially worked with our own real data, rather than deriving general knowledge from anonymous web sources.

Data Warehouse and Artificial Intelligence – Synergy Creates a Competitive Advantage

Organizations that understand the importance of integrating data warehouses and artificial intelligence have a step ahead of their competitors. Why?

High-quality decision-making information: RAG combined with structured data helps leaders make better business decisions because the results are based on real data, not general opinions.
More efficient data usage: The data warehouse is no longer just a place for generating historical reports; it also supports real-time artificial intelligence processes.
Automation and scaling: When the RAG process is well-established, it can be automated so that questions posed to AI regularly use the latest information arriving in the data warehouse.

A high-quality data warehouse is not just a dream for IT professionals; its practical benefits are also reflected on the business side: faster report generation, more accurate forecasts, and fewer errors.

Does RAG Replace Traditional Data Analysts?

The short answer is no. While RAG is a powerful tool, it doesn’t function without a constantly updated, structured, and high-quality data warehouse. Someone needs to ensure:

Data model maintenance: New dimensions, new data sources, data version control.
Business logic updates: If the organization changes its sales or financial strategy, this must also be reflected in the interpretation of the data.
Quality control: Machine learning and artificial intelligence are only as smart as the data they are fed. Errors or incomplete data will lead to errors in the model’s responses.

Therefore, the role of data analysts, data scientists, and data warehouse architects remains crucial in creating and managing artificial intelligence-based solutions.

How to Get Started?

Map your data: Review what data sources the organization already has, in what format they are, and whether they already share some common language (e.g., IDs, times, industry dimensions).
Build (or enhance) a data warehouse: If a data warehouse structure already exists, it’s worth focusing on quality control, scaling, and adding new data flows. If you’re just starting, create a plan to bring data into a unified data repository. Determine the necessary tools and frequency.
Choose a suitable artificial intelligence platform: RAG can be technically implemented in various ways – there are both ready-made solutions (e.g., cloud services from Microsoft, Google, Amazon) and open-source solutions. It’s important to find a secure and scalable option that suits the company’s business needs.
Create a systematic approach to prompting: Many companies are adopting a new role called “prompt engineer” or assigning this role to data scientists who know how to systematically design input for artificial intelligence.
Address data security and privacy: Since RAG means that some of your internal information ends up in the prompt (and from there through the AI processor), it’s crucial to establish security policies, access controls, and data anonymization principles if personal data is involved.

Summary and a Look into the Future

Artificial intelligence is no longer just a buzzword but has evolved into a tool that solves real business problems. However, for it to truly offer value, several important steps must be taken: learning to prompt skillfully, using the RAG methodology, and ensuring a proper data warehouse infrastructure.

Prompting is an art in itself: It’s worth dedicating resources to it because you get what you ask for.
RAG is like a bridge between general knowledge and specific business context: In addition to universal knowledge, AI can use your company’s data.
The data warehouse is what makes RAG successful: If the data warehouse is well-designed, reliable, and of high quality, the accuracy of AI responses increases many times over.

In the future, even closer integration can be expected. Data warehouses are becoming increasingly flexible (so-called data lakehouse solutions combine the functions of traditional warehouses and data lakes), and LLMs are rapidly evolving. It’s not impossible that in a few years, decision support based on artificial intelligence, drawing knowledge from a real-time updated data warehouse, will be an integral part of every major organization.

Companies that are the first to adopt these developments will gain a clear competitive advantage. Therefore, companies and organizations, regardless of their field, should already be exploring how they can leverage the synergy of RAG and data warehouses to their benefit. This requires some investment, the right people, and openness to the new, but the return can be significant – in terms of time savings, cost optimization, and better decisions.

This is how the synergy of “Data Warehouse and Artificial Intelligence” opens up possibilities for smart business applications, more efficient data analysis, and user-friendly yet powerful AI solutions.

Solutions

Newsfeed: