TiDB User Day 2025 | 10月3日(金) 10:00〜17:00 開催!登録する

AI is Evolving. Your Database Should, too.

Modern AI workloads need more than search. They demand reasoning, memory, and real-time responses. TiDB combines vector, full-text, and graph-based retrieval with transactional consistency and analytical power, all in one unified platform.

Unified-Search

Unified Search

Retrieve the right context across structured, unstructured, and vector data. TiDB supports vector search, full-text search, and SQL-native hybrid queries so your LLMs always get the most relevant, grounded information.

Graph-based-reasoning

Graph-Based Reasoning

Move beyond keyword matches. Use Graph RAG and knowledge graph capabilities to connect ideas, trace relationships, and enable multi-hop reasoning across data sources.

Agent-memory

Agent Memory & State

Give your applications memory. TiDB lets you store and retrieve agent memory and user state using PyTiDB, enabling dynamic, multi-turn interactions and persistent personalization.

Real-time

Real-Time, Distributed Engine

No batch pipelines. No lag. TiDB’s distributed SQL architecture supports high-throughput writes and low-latency reads — all with strong consistency and scale built-in.

Real-World AI Use Cases, Built on TiDB

Build intelligent AI applications — from chat agents and search assistants to knowledge graphs and real-time decision engines.

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    embedding = Column(VectorType(dim=3))

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(content="dog",  embedding=[1, 2, 1]))
    session.add(Doc(content="fish", embedding=[1, 2, 4]))
    session.commit()

# Search nearest 1 embedding using L2 distance
with Session(engine) as session:
    results = session.execute(
        select(Doc.content)
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(1)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    content TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    ("dog",  "[1, 2, 1]"),
    ("fish", "[1, 2, 4]");

-- Search nearest 1 embedding using L2 distance
SELECT content
    FROM doc
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 1;

See Full Documentation

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    kind = Column(Text)
    embedding = Column(VectorType(dim=3))

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(content="dog",  kind="animal", embedding=[1, 2, 1]))
    session.add(Doc(content="fish", kind="animal", embedding=[1, 2, 4]))
    session.add(Doc(content="tree", kind="plant",  embedding=[1, 2, 3]))
    session.commit()

# Search with any conditions via WHERE clause
with Session(engine) as session:
    results = session.execute(
        select(Doc.content)
        .where(Doc.kind == "animal")
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(1)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    content TEXT,
    kind TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    ("dog",  "animal", "[1, 2, 1]"),
    ("fish", "animal", "[1, 2, 4]"),
    ("tree", "plant",  "[1, 2, 3]");

-- Search with any conditions via WHERE clause
SELECT content
    FROM doc
    WHERE kind = "animal"
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 1;

See Full Documentation

Python
SQL
# Schema
class Vertex(Base):
    __tablename__ = "vertices"
    id = Column(Integer, primary_key=True)
    name = Column(Text)

class Edge(Base):
    __tablename__ = "edges"
    id = Column(Integer, primary_key=True)
    source_id = Column(Integer)
    target_id = Column(Integer)
    description = Column(Text)
    description_vec = Column(VectorType(dim=3))

# Insert data here...

# Semantic Search the vertex, then retrieve its source edge and target edge
with Session(engine) as session:
    query_edges = (
        select(Edge)
        .with_only_columns(Edge.source_id, Edge.target_id, Edge.description, Edge.id)
        .order_by(Edge.description_vec.cosine_distance([1, 2, 3]))
        .limit(100)
    ).subquery()

    # Retrieve edge with vertex all at once
    MVertexSource = aliased(Vertex)
    MVertexTarget = aliased(Vertex)
    MEdge = aliased(Edge, query_edges)
    results = session.execute(
        select(MEdge, MVertexSource, MVertexTarget)
        .join(MVertexSource, MVertexSource.id == MEdge.source_id)
        .join(MVertexTarget, MVertexTarget.id == MEdge.target_id)
    )
    for edge, vertexSource, vertexTarget in results:
        print(edge.__dict__, vertexSource.__dict__, vertexTarget.__dict__)

See Full Documentation

-- Schema
CREATE TABLE vertices (
    id INT PRIMARY KEY,
    name TEXT
);

CREATE TABLE edges (
    id INT PRIMARY KEY,
    source_id INT,
    target_id INT,
    description TEXT,
    description_vec VECTOR(3),
    VECTOR INDEX ((VEC_COSINE_DISTANCE(description_vec)))
);

-- Insert data here...

-- Semantic Search the vertex, then retrieve its source edge and target edge
SELECT *
FROM (
    SELECT source_id, target_id, description
    FROM edges
    ORDER BY VEC_COSINE_DISTANCE(description_vec, '[1, 2, 3]')
    LIMIT 100
) AS e
INNER JOIN vertices AS source_vertex ON source_vertex.id = e.source_id
INNER JOIN vertices AS target_vertex ON target_vertex.id = e.target_id;

See Full Documentation

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    embedding = Column(VectorType(dim=3))

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(content="dog",  embedding=[1, 2, 1]))
    session.add(Doc(content="fish", embedding=[1, 2, 4]))
    session.commit()

# Search
with Session(engine) as session:
    results = session.execute(
        # Retrieve ID and content in the search result,
        # or any other metadata you like
        select(Doc.id, Doc.content)
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(1)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    content TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    ("dog", "[1, 2, 1]"),
    ("fish", "[1, 2, 4]");

-- Retrieve ID and content in the search result,
-- or any other metadata you like
SELECT id, content
    FROM doc
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 1;

See Full Documentation

Python
SQL
# Schema
class Feedback(Base):
    __tablename__ = "feedback"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer)
    # Use 0=Like, 1=Dislike. Add index for faster query.
    is_dislike = Column(Integer, index=True)
    chat_context = Column(LONGTEXT)
    timestamp = Column(Integer)

# Insert data here...

# Dig into details of recent 100 dislike feedbacks
with Session(engine) as session:
    results = session.execute(
        select(Feedback)
        .where(Feedback.is_dislike == 1)
        .order_by(Feedback.timestamp.desc())
        .limit(100)
    )
    for obj in results.scalars():
        print(obj.id, obj.user_id, obj.chat_context)

# Plot feedback trends in the last hour
with Session(engine) as session:
    results = session.execute(text("""
        SELECT COUNT(*), FROM_UNIXTIME(FLOOR(timestamp / 60) * 60) AS window_start
        FROM feedback
        WHERE timestamp >= UNIX_TIMESTAMP() - 60 * 60
        AND is_dislike = 1
        GROUP BY window_start
    """)).all()
    print(results)

See Full Documentation

CREATE TABLE feedback (
    user_id INT,
    is_dislike INT, -- 0=Like, 1=Dislike
    chat_context LONGTEXT,
    timestamp INT,
    INDEX (is_dislike)
);

-- Insert data here...

-- Dig into details of recent 100 dislike feedbacks
SELECT *
    FROM feedback
    WHERE is_dislike = 1
    ORDER BY timestamp DESC
    LIMIT 100;

-- Plot feedback trends in the last hour
SELECT COUNT(*), FROM_UNIXTIME(FLOOR(timestamp / 60) * 60) AS window_start
    FROM feedback
    WHERE
        timestamp >= UNIX_TIMESTAMP() - 60 * 60
        AND is_dislike = 1
    GROUP BY window_start;

See Full Documentation

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    owner_id = Column(Integer)
    embedding = Column(VectorType(dim=3))
    content = Column(Text)

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(owner_id=10, content="dog",  embedding=[1, 2, 1]))
    session.add(Doc(owner_id=10, content="fish", embedding=[1, 2, 4]))
    session.add(Doc(owner_id=17, content="tree", embedding=[1, 0, 0]))
    session.commit()

# Search top 10 similar docs owned by user 17
with Session(engine) as session:
    results = session.execute(
        select(Doc.owner_id, Doc.content)
        .where(Doc.owner_id == 17)
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(10)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    owner_id INT,
    content TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    (10, "dog", "[1, 2, 1]"),
    (10, "fish", "[1, 2, 4]"),
    (17, "tree", "[1, 2, 4]");

-- Search top 10 similar docs owned by user 17
SELECT owner_id, content
    FROM doc
    WHERE owner_id = 17
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 10;

See Full Documentation

Say goodbye to data synchronization, duplication, or maintaining multiple data stores.

Search vector embeddings

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    embedding = Column(VectorType(dim=3))

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(content="dog",  embedding=[1, 2, 1]))
    session.add(Doc(content="fish", embedding=[1, 2, 4]))
    session.commit()

# Search nearest 1 embedding using L2 distance
with Session(engine) as session:
    results = session.execute(
        select(Doc.content)
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(1)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    content TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    ("dog",  "[1, 2, 1]"),
    ("fish", "[1, 2, 4]");

-- Search nearest 1 embedding using L2 distance
SELECT content
    FROM doc
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 1;

See Full Documentation

Filter by complex conditions

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    kind = Column(Text)
    embedding = Column(VectorType(dim=3))

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(content="dog",  kind="animal", embedding=[1, 2, 1]))
    session.add(Doc(content="fish", kind="animal", embedding=[1, 2, 4]))
    session.add(Doc(content="tree", kind="plant",  embedding=[1, 2, 3]))
    session.commit()

# Search with any conditions via WHERE clause
with Session(engine) as session:
    results = session.execute(
        select(Doc.content)
        .where(Doc.kind == "animal")
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(1)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    content TEXT,
    kind TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    ("dog",  "animal", "[1, 2, 1]"),
    ("fish", "animal", "[1, 2, 4]"),
    ("tree", "plant",  "[1, 2, 3]");

-- Search with any conditions via WHERE clause
SELECT content
    FROM doc
    WHERE kind = "animal"
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 1;

See Full Documentation

Search knowledge graph semantically

Python
SQL
# Schema
class Vertex(Base):
    __tablename__ = "vertices"
    id = Column(Integer, primary_key=True)
    name = Column(Text)

class Edge(Base):
    __tablename__ = "edges"
    id = Column(Integer, primary_key=True)
    source_id = Column(Integer)
    target_id = Column(Integer)
    description = Column(Text)
    description_vec = Column(VectorType(dim=3))

# Insert data here...

# Semantic Search the vertex, then retrieve its source edge and target edge
with Session(engine) as session:
    query_edges = (
        select(Edge)
        .with_only_columns(Edge.source_id, Edge.target_id, Edge.description, Edge.id)
        .order_by(Edge.description_vec.cosine_distance([1, 2, 3]))
        .limit(100)
    ).subquery()

    # Retrieve edge with vertex all at once
    MVertexSource = aliased(Vertex)
    MVertexTarget = aliased(Vertex)
    MEdge = aliased(Edge, query_edges)
    results = session.execute(
        select(MEdge, MVertexSource, MVertexTarget)
        .join(MVertexSource, MVertexSource.id == MEdge.source_id)
        .join(MVertexTarget, MVertexTarget.id == MEdge.target_id)
    )
    for edge, vertexSource, vertexTarget in results:
        print(edge.__dict__, vertexSource.__dict__, vertexTarget.__dict__)

See Full Documentation

-- Schema
CREATE TABLE vertices (
    id INT PRIMARY KEY,
    name TEXT
);

CREATE TABLE edges (
    id INT PRIMARY KEY,
    source_id INT,
    target_id INT,
    description TEXT,
    description_vec VECTOR(3),
    VECTOR INDEX ((VEC_COSINE_DISTANCE(description_vec)))
);

-- Insert data here...

-- Semantic Search the vertex, then retrieve its source edge and target edge
SELECT *
FROM (
    SELECT source_id, target_id, description
    FROM edges
    ORDER BY VEC_COSINE_DISTANCE(description_vec, '[1, 2, 3]')
    LIMIT 100
) AS e
INNER JOIN vertices AS source_vertex ON source_vertex.id = e.source_id
INNER JOIN vertices AS target_vertex ON target_vertex.id = e.target_id;

See Full Documentation

Return search results with text chunks

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    embedding = Column(VectorType(dim=3))

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(content="dog",  embedding=[1, 2, 1]))
    session.add(Doc(content="fish", embedding=[1, 2, 4]))
    session.commit()

# Search
with Session(engine) as session:
    results = session.execute(
        # Retrieve ID and content in the search result,
        # or any other metadata you like
        select(Doc.id, Doc.content)
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(1)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    content TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    ("dog", "[1, 2, 1]"),
    ("fish", "[1, 2, 4]");

-- Retrieve ID and content in the search result,
-- or any other metadata you like
SELECT id, content
    FROM doc
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 1;

See Full Documentation

Store and analyze RAG feedbacks

Python
SQL
# Schema
class Feedback(Base):
    __tablename__ = "feedback"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer)
    # Use 0=Like, 1=Dislike. Add index for faster query.
    is_dislike = Column(Integer, index=True)
    chat_context = Column(LONGTEXT)
    timestamp = Column(Integer)

# Insert data here...

# Dig into details of recent 100 dislike feedbacks
with Session(engine) as session:
    results = session.execute(
        select(Feedback)
        .where(Feedback.is_dislike == 1)
        .order_by(Feedback.timestamp.desc())
        .limit(100)
    )
    for obj in results.scalars():
        print(obj.id, obj.user_id, obj.chat_context)

# Plot feedback trends in the last hour
with Session(engine) as session:
    results = session.execute(text("""
        SELECT COUNT(*), FROM_UNIXTIME(FLOOR(timestamp / 60) * 60) AS window_start
        FROM feedback
        WHERE timestamp >= UNIX_TIMESTAMP() - 60 * 60
        AND is_dislike = 1
        GROUP BY window_start
    """)).all()
    print(results)

See Full Documentation

CREATE TABLE feedback (
    user_id INT,
    is_dislike INT, -- 0=Like, 1=Dislike
    chat_context LONGTEXT,
    timestamp INT,
    INDEX (is_dislike)
);

-- Insert data here...

-- Dig into details of recent 100 dislike feedbacks
SELECT *
    FROM feedback
    WHERE is_dislike = 1
    ORDER BY timestamp DESC
    LIMIT 100;

-- Plot feedback trends in the last hour
SELECT COUNT(*), FROM_UNIXTIME(FLOOR(timestamp / 60) * 60) AS window_start
    FROM feedback
    WHERE
        timestamp >= UNIX_TIMESTAMP() - 60 * 60
        AND is_dislike = 1
    GROUP BY window_start;

See Full Documentation

Secure RAG with built-in access control

Python
SQL
# Define schema
class Doc(Base):
    __tablename__ = "doc"
    id = Column(Integer, primary_key=True)
    owner_id = Column(Integer)
    embedding = Column(VectorType(dim=3))
    content = Column(Text)

# Create table and index
Base.metadata.create_all(engine)
VectorAdaptor(engine).create_vector_index(
    Doc.embedding, tidb_vector.DistanceMetric.L2
)

# Insert
with Session(engine) as session:
    session.add(Doc(owner_id=10, content="dog",  embedding=[1, 2, 1]))
    session.add(Doc(owner_id=10, content="fish", embedding=[1, 2, 4]))
    session.add(Doc(owner_id=17, content="tree", embedding=[1, 0, 0]))
    session.commit()

# Search top 10 similar docs owned by user 17
with Session(engine) as session:
    results = session.execute(
        select(Doc.owner_id, Doc.content)
        .where(Doc.owner_id == 17)
        .order_by(Doc.embedding.l2_distance([1, 2, 3]))
        .limit(10)
    ).all()
    print(results)

See Full Documentation

CREATE TABLE doc(
    owner_id INT,
    content TEXT,
    embedding VECTOR(3),
    VECTOR INDEX ((VEC_L2_DISTANCE(embedding)))
);

INSERT INTO doc VALUES
    (10, "dog", "[1, 2, 1]"),
    (10, "fish", "[1, 2, 4]"),
    (17, "tree", "[1, 2, 4]");

-- Search top 10 similar docs owned by user 17
SELECT owner_id, content
    FROM doc
    WHERE owner_id = 17
    ORDER BY VEC_L2_DISTANCE(embedding, "[1, 2, 3]")
    LIMIT 10;

See Full Documentation

Run it on Cloud

  • Free for 25GiB storage and 250 million RU per month
  • Scale instantly in response to your workload
  • Zero cost when idle

Start for Free

OR

Run it Locally

  • Run TiDB locally with full features
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
~/.tiup/bin/tiup playground nightly --tag my_vector_db

Read Docs

* Supported platforms: Linux (x64 / ARM64), MacOS (x64 / ARM64)

Integrations

Production-ready integrations for every stage of your AI pipeline.

Quick start: copy paste your Connection String into listed integrations.

Explore All Integrations

Luyu Zhang
Founder & CEO of Dify.AI

With TiDB, our users can concentrate on building their GenAI apps rather than worrying about setup. Our engineers have automated all database management tasks using TiDB’s API, significantly reducing our time and effort. The scale-to-zero capability of TiDB lets us provide dedicated databases to our customers without the burden of idle resource costs.

Enterprise-grade Security

compliant

Compliant

We are in compliance with the latest industry standards: SOC2 Type II and SOC3 certified, EU CoC, PCI DSS and HIPAA compliant.
Get Compliance Reports

secure

Secure

TiDB Cloud supports Role-based Access Control, Encryption at rest, Private Link, and more.
Security Features Documentation

high-available

High Available

TiDB Cloud offers up to 99.99% uptime SLA and Zonal HA, supporting cross-AZ HA and automatic backups.
SLA Policy

FAQ

TiDB goes beyond vector-only retrieval. It combines semantic search, full-text search, and structured filtering, all in a single distributed SQL engine. You can perform hybrid queries, enforce consistency, and reason over data relationships using Graph RAG and knowledge graphs.

TiDB is designed as a unified platform that handles operational, analytical, and AI workloads—all in one engine. Instead of maintaining separate systems for transactions, analytics, vector search, text, and graph traversal, TiDB gives you a single, horizontally scalable database with native support for all of these capabilities. This means less complexity, faster development, and more reliable, real-time applications.

Yes. TiDB supports real-time ingestion, low-latency reads, and persistent memory, making it ideal for Retrieval-Augmented Generation (RAG), intelligent agents, and multi-turn conversational apps. With PyTiDB, you can store and manage agent memory directly in your database.

Graph RAG allows LLMs to retrieve context based on relationships, not just keywords or vectors. TiDB enables this by supporting graph-structured knowledge, multi-hop traversal, and visualization tools that help you build and query semantic graphs natively.

Absolutely. TiDB supports SQL and Python, integrates easily with LangChain, LlamaIndex, and embedding models, and is available via TiDB Cloud for serverless or dedicated deployments. You can start using it with the tools you already know.

You can launch a free TiDB Cloud Starter cluster in minutes and start exploring vector, full-text, and hybrid search with your own data. Use our developer tools and example notebooks to build intelligent apps from day one.