LLM — Large Language Models

LLM is an AI product that provides the organization with foundation language model capabilities. It ensures centralized access to LLMs for all teams and products.

Purpose

The LLM as a product addresses unified management of access to language models:

standardizing the use of LLMs across the organization
controlling costs and security
providing other AI products (Knowledge Assistant, Code Agent, Automation AI) with baseline generation capabilities

Deployment Options

The organization chooses a deployment model depending on its requirements:

Cloud APIs — OpenAI, Anthropic, Google; minimal infrastructure costs, quick start
On-premise / Private Cloud — deploying open-source models (Llama, Mistral, Qwen); full data control
Hybrid — cloud APIs for non-critical tasks, on-premise for sensitive data

Use Cases

The LLM is used for a wide range of tasks:

text generation (reports, emails, documentation)
document summarization
classification and categorization
extracting information from unstructured data
translation and localization
code generation
sentiment analysis

Key Decisions

When forming the LLM product, the following decisions must be made:

Cloud vs. On-premise — the balance between cost, data confidentiality, and latency
Model selection — which models to use for which tasks (cost vs. quality)
Fine-tuning vs. Prompting — fine-tuning a model for a task or working through prompt engineering
Cost management — token economics, budgeting, per-team limits

Infrastructure

The LLM product infrastructure includes:

GPU clusters — for on-premise deployment and fine-tuning
API Gateway — request routing, rate limiting, authentication
Prompt Management — managing and versioning prompts
Guardrails — filtering inputs and outputs (safety, compliance)
Observability — request logging, quality metrics, cost monitoring
Caching — reducing the cost of repeated requests

Cost Management

LLM costs can grow quickly. It is necessary to:

track consumption by teams and products
set limits and quotas
choose a model of the appropriate size for the task
use caching for routine requests
optimize prompts to reduce token counts

Risks

Data confidentiality — transmitting sensitive information to external APIs
Hallucinations — generating inaccurate information
Vendor lock-in — dependence on a single provider
Regulatory requirements — compliance with data processing requirements

Knowledge Assistant (RAG)

Purpose​

Deployment Options​

Use Cases​

Key Decisions​

Infrastructure​

Cost Management​

Risks​

Related Sections​