AI in Practice: How to Use Large Language Models Wisely and Safely

At the moment you can’t go anywhere without hearing about new tools and breakthrough AI models—people keep repeating like a broken record that artificial intelligence will soon change everything. But we, developers, know very well that there is a huge gap between a flashy demo and a solution that actually works in production. 

The aim of this article is therefore not to tell stories about the “magic of AI”. I want to take you behind the scenes and show the real life of these technologies: high-profile failures, the root causes of problems and—most importantly—practical advice on how to use AI wisely, safely and effectively. 

The generative AI divide 

Every day we live inside an information bubble. From all sides we hear about an AI revolution and assistants that will do half our work for us. Reality is, however, more brutal. 

MIT’s report The GenAI Divide 1shows that as many as 95% of corporate AI deployments fail. “Failure” means there is no measurable return on investment—zero impact on financial results. Billions of dollars are poured down the drain and only 5% of companies realise real benefits. 

The report defines a “generative AI divide”: an elite 5% of firms successfully deploy AI, earn millions and gain competitive advantage, while the remaining 95% are stuck. 

What’s the paradox here? Over 80% of companies have trialled tools such as ChatGPT or Copilot, and many of us use them daily to draft emails or meeting summaries. So, if AI tools are so widespread, why do corporate deployments fail so often? 

The learning gap 

Most corporate AI systems are static—they do not learn from user interactions, they do not remember context or working style. Everything has to be explained from scratch each time. 

Imagine a lawyer at a large firm: AI can draft a simple letter—great! But would you entrust it with preparing a key, million-pound contract? No because the model doesn’t remember previous edits or the client’s preferences, so it repeats the same mistakes. 

The MIT report highlights three main failure patterns: 

  • The pilot-to-production gap: of the 60% of firms that start exploring AI, only 5% reach full production deployment. Projects die because they don’t work in the real business world. 
  • Shadow AI: employees use private tools even when official deployments fail. Demand for AI is huge but corporate systems often fall short. 
  • The “build vs buy” trap: off-the-shelf solutions bought from partners have a 67% chance of success, whereas projects built in-house succeed only 33% of the time. 

AI reports—how to read them 

The MIT figure of 95% failures can be striking, but it’s worth looking more broadly: 

  • BCG reports that 80% of AI deployments meet or exceed expectations. 
  • Google Cloud claims 74% of companies see a return on investment within the first year. 
  • Boston Consulting Group indicates that 26% of firms generate real business value as AI leaders. 

The differences mainly come down to how “success” is defined. MIT set the bar very high: success = full production deployment of AI with direct impact on the profit and loss account. Other studies use more pragmatic criteria such as productivity improvements, cost reduction or process optimisation. 

When AI fails: the Air Canada example 

In 2022 the story of Jack Moffatt made headlines and became a textbook example of what can go wrong when companies place too much trust in artificial intelligence. Mr Moffatt was travelling after the death of his grandmother and wanted to use a bereavement fare—a discount for people travelling to a close relative’s funeral. He couldn’t find clear information on Air Canada’s website, so he asked the chatbot. 

The AI assistant replied with confidence: buy the full-price ticket and within 90 days you can apply for a partial refund. The chatbot even supplied a link to the relevant form. It sounded credible—Mr Moffatt trusted the system and bought a ticket for over US$1,600. 

After the trip he filed for a refund and… was denied. Under Air Canada’s policy, bereavement discounts were not applicable after travel. A months-long exchange of emails followed and eventually the case went to court. 

Air Canada tried to defend itself with two, to put it mildly, controversial arguments: 

  • The chatbot is an independent entity—the company argued it was not responsible for content generated by the AI. The court rejected this, ruling that the chatbot is an integral part of the website and the company is responsible for its communications. 
  • The customer should have checked the terms and conditions—since the correct information was available elsewhere on the site, the customer was at fault. The court also rejected this, asking why a user should be expected to assume one part of the site is more reliable than another. 

In the end Air Canada lost the case and had to refund the money. The sum was symbolic, but the case set a precedent—it showed that companies are accountable for AI errors just as they are for any other content published on their site. 

This story demonstrates two fundamental truths: 

  • Responsibility cannot be delegated to technology: no matter how “intelligent” a tool appears, the company remains liable for its operation and consequences. 
  • Hallucinations are an intrinsic feature of language models: AI can generate plausible-sounding but false information. It is the role of engineers and managers to design guardrails that minimise the risk of mistakes. 

In practice the balance is difficult. Too weak safeguards lead to embarrassing errors; too strong—as with a specialist Microsoft chatbot for Minecraft that answered most questions with “I don’t know”—make the tool useless. 

How language models work and what that means in practice 

A large language model (LLM) is essentially an advanced text autocompletion system. Its basic task is simple—predict which word should come next in a given sequence. Everything else that appears as “intelligence”—writing code, creating summaries, generating presentations—is a side effect of that single function operating at an unimaginable scale. 

The breakthrough: transformers 

In 2017 the seminal paper Attention Is All You Need introduced the transformer architecture. Unlike previous models that read text sequentially and easily “forgot” earlier words, transformers analyse all tokens simultaneously. Thanks to the self-attention mechanism they can link relevant information across the entire sentence. Example? In the sentence “Michael, sitting on the chair, was eating pepperoni pizza. It was his favourite flavour” the word “it” will be linked to “pizza”, not to “chair”, because the context relates to eating. This is how models begin to “understand” language—at least in a statistical sense. 

How models learn 

Training an LLM consists of two main stages: 

  • Pre-training: the model is “locked away” for many months on vast datasets covering much of the internet, books, articles and Wikipedia. It learns grammar, facts and writing styles, but it also absorbs errors, biases and toxic content. 
  • Fine-tuning: the “school of good manners.” The model is adapted for useful and safe operation using RLHF (Reinforcement Learning with Human Feedback). Trainers rate generated responses, indicating the best and worst ones, and the model learns human preferences. The result? The model can answer helpfully, but it does not expand its knowledge — it merely filters behaviours. 

The key lesson: LLMs predict next words. Everything else is a side effect, not “magic” or genuine understanding. 

A revolution in application development 

The traditional application development process can be long and costly: idea → meetings → specifications → mockups → development → first release. AI radically shortens that cycle. Tools such as Uizard or Visily can turn a text description into an interactive prototype. 

What does this mean for developers? No AI tool will ever fully replace programmers, but it will certainly change how they work. Thanks to AI, business teams will be able to create simple tools themselves and produce higher-quality prototypes much more easily. Developers can then focus on what is truly hard and valuable: business logic, architecture, performance and security. 

At the same time the nature of errors changes—fewer typos and small mistakes, but more fundamental architectural faults and security vulnerabilities. There is also a psychological trap known as automation bias: we trust the machine when we should be critically verifying generated code. 

Safe uses of AI tools 

Three golden rules for using AI 

  • You are the pilot, not the passenger: critically verify everything AI suggests. 
  • Don’t trust—test: treat code as if it were written by an unknown intern and apply TDD. 
  • Share, but not secrets: use enterprise or self-hosted secure versions; do not paste confidential data into public tools. 

AI in everyday work—research mode 

Increasingly, AI aids not only in code generation but also in faster information discovery. ChatGPT’s research mode allows you to ask questions in natural language and the model searches hundreds of sources, analyses them and returns a summary with links. This can be far more effective than classic keyword search.

RAG: Retrieval-Augmented Generation 

RAG is an example of a safe approach to using AI with documentation. The model does not guess from memory but uses only selected fragments of our documentation, which gives three benefits:

  • Control: answers come exclusively from our data. 
  • Credibility: each answer cites a source. 
  • Currency: changes in documentation are immediately reflected. 

In practice this means faster answers, shorter onboarding and documentation that becomes a living interactivetool rather than a graveyard of PDFs.

Summary—three things to remember 

Remember that AI is a powerful tool, not a magic wand—most failures stem from human error. The crucial point is to understand that AI predicts words, it does not think, so our critical oversight is always required. The biggest winners in this revolution start with real problems, asking “Which problem can I solve?” rather than looking for yet another area to “shoehorn” AI into. Treat AI as a clever partner: learn, test, experiment and focus on what actually delivers value.

If you are considering implementing an AI-based solution within your organisation and want to ensure it delivers real, measurable results, fill in the form below. Our specialists will be happy to support you in the next steps.

Contact us

  1. ai_report_2025.pdf ↩︎

Previous Post