Skip to main content

LLM — Large Language Models

LLM is an AI product that provides the organization with foundation language model capabilities. It ensures centralized access to LLMs for all teams and products.


Purpose

The LLM as a product addresses unified management of access to language models:

  • standardizing the use of LLMs across the organization
  • controlling costs and security
  • providing other AI products (Knowledge Assistant, Code Agent, Automation AI) with baseline generation capabilities

Deployment Options

The organization chooses a deployment model depending on its requirements:

  • Cloud APIs — OpenAI, Anthropic, Google; minimal infrastructure costs, quick start
  • On-premise / Private Cloud — deploying open-source models (Llama, Mistral, Qwen); full data control
  • Hybrid — cloud APIs for non-critical tasks, on-premise for sensitive data

Use Cases

The LLM is used for a wide range of tasks:

  • text generation (reports, emails, documentation)
  • document summarization
  • classification and categorization
  • extracting information from unstructured data
  • translation and localization
  • code generation
  • sentiment analysis

Key Decisions

When forming the LLM product, the following decisions must be made:

  • Cloud vs. On-premise — the balance between cost, data confidentiality, and latency
  • Model selection — which models to use for which tasks (cost vs. quality)
  • Fine-tuning vs. Prompting — fine-tuning a model for a task or working through prompt engineering
  • Cost management — token economics, budgeting, per-team limits

Infrastructure

The LLM product infrastructure includes:

  • GPU clusters — for on-premise deployment and fine-tuning
  • API Gateway — request routing, rate limiting, authentication
  • Prompt Management — managing and versioning prompts
  • Guardrails — filtering inputs and outputs (safety, compliance)
  • Observability — request logging, quality metrics, cost monitoring
  • Caching — reducing the cost of repeated requests

Cost Management

LLM costs can grow quickly. It is necessary to:

  • track consumption by teams and products
  • set limits and quotas
  • choose a model of the appropriate size for the task
  • use caching for routine requests
  • optimize prompts to reduce token counts

Risks

  • Data confidentiality — transmitting sensitive information to external APIs
  • Hallucinations — generating inaccurate information
  • Vendor lock-in — dependence on a single provider
  • Regulatory requirements — compliance with data processing requirements

Knowledge Assistant (RAG)