Skip to content

Data Model Overview (Document / Graph / Time-series / Vector)

Core

Sao không nhét hết data vào MySQL table? Khi data là social graph, sensor stream triệu record/giây, hoặc semantic vector cho AI hiểu — relational table bất lực. Mỗi data shape cần modeling khác.


1. Ngoài relational: sao cần model khác?

Relational DB (MySQL, PostgreSQL) dùng "table + row + column", hợp structured + relationship rõ. Nhưng real-world data đa dạng hơn:

ShapePain relationalModel phù hợp
User profile (field linh hoạt, nested)ALTER TABLE liên tục, nhiều NULL columnDocument
Social network (friend of friend of friend)Multi-layer JOIN exponential slowGraph
Monitoring (triệu row/giây write)Write bottleneck, history bloatTime-series
AI semantic search ("nội dung gần ý nghĩa")Không express semantic similarityVector

Insight

Không phải "thay" relational, mà bổ sung. Đa số system core business vẫn MySQL/PostgreSQL, nhưng introduce model chuyên cho scenario đặc thù → perf tăng nhiều order.


2. Document Model

2.1 Document model là gì?

Store data thành JSON/BSON document, mỗi record self-contained, có thể field khác nhau.

json
{
  "_id": "user_1001",
  "name": "Hoàng",
  "tags": ["VIP", "Active"],
  "address": { "city": "HCM", "district": "Q1" },
  "orders": [
    { "id": "o1", "amount": 299 },
    { "id": "o2", "amount": 599 }
  ]
}

Đặc điểm:

  • No Schema constraint: không cần pre-define table, field thêm/xoá tuỳ
  • Nested: address, orders nested trong document, 1 read = full data
  • Horizontal scale: native sharding, handle scale lớn

2.2 Document vs Relational

DimRelational (MySQL)Document (MongoDB)
StructureFixed Schema, ALTER TABLEFlexible, add field anytime
NestedMulti-table JOINEmbedded
Cross-recordJOIN strongWeak
UseStable businessVariable content

2.3 Use cases

  • CMS: article, comment, tag structure khác
  • User profile: user khác có attribute khác
  • Product catalog: phone có "screen size", food có "expiry", field hoàn toàn khác
  • Config center: service config không uniform

Misconception

"MongoDB không cần design data structure" — sai! Document model vẫn cần design: nested không quá deep, sub-document update thường xuyên nên tách collection riêng.


3. Graph Model

3.1 Graph là gì?

Node + Edge express entity + relationship. Mỗi node = entity, mỗi edge = relationship, cả 2 mang attribute.

(Hoàng) --[Follow]--> (Linh) --[Follow]--> (An)
   |                                        |
   +---------[Buy]----> (iPhone) <--[Buy]---+

3.2 Killer capability: multi-hop query

Scenario: social network tìm "friend of friend of friend"

Relational (3-layer JOIN):

sql
SELECT DISTINCT f3.name
FROM friends f1
JOIN friends f2 ON f1.friend_id = f2.user_id
JOIN friends f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 1001;

Graph DB (Cypher query language):

cypher
MATCH (me)-[:FOLLOWS*1..3]->(target)
WHERE me.name = 'Hoàng'
RETURN DISTINCT target.name

Relational mỗi hop thêm = JOIN thêm, perf giảm exponential. Graph DB traverse qua pointer, multi-hop perf gần như không đổi.

3.3 Use cases

  • Social network: friend recommendation, mutual follow, influence
  • Knowledge graph: entity relationship reasoning
  • Fraud detection: phát hiện money loop, related account network
  • Recommendation: graph user-product-tag

4. Time-Series Model

4.1 Time-Series là gì?

Lấy timestamp làm main axis, optimize "write theo time order, query theo time range".

timestamp            device      cpu_usage   memory
2024-01-15 10:00:01  server-01   45%         12.3GB
2024-01-15 10:00:02  server-01   67%         12.5GB
2024-01-15 10:00:03  server-01   92%         14.1GB

4.2 Sao không dùng MySQL?

IssueMySQLTS DB (InfluxDB)
Write speed10k/s1M/s
HistoryManual cleanup, table bloatTTL auto
Aggregate queryGROUP BY slowDownsampling built-in
Storage efficiencyGeneric, wasteColumnar compression, save 90%

4.3 Use cases

  • Server monitoring: CPU, memory, disk per second
  • IoT sensor: temperature, humidity, GPS
  • Financial feed: stock price, trading volume
  • Log analysis: app log timeline aggregation

5. Vector Model

5.1 Vector là gì?

Convert text, image, audio → high-dim vector qua Embedding model, compute distance giữa vector = semantic similarity.

"Nhà hàng Nhật ngon" → Embedding → [0.82, 0.15, 0.91, 0.33, ...]
                                       ↓ Cosine similarity
"Sushi master Tokyo"  → [0.80, 0.18, 0.89, ...] → 96% similar
"Italian pizza"       → [0.12, 0.85, 0.20, ...] → 31% similar
DimKeyword (LIKE / full-text)Vector
MethodExact string matchSemantic similarity
"Nhà hàng Nhật ngon"Chỉ match text chứa "Nhật"Tìm được "sushi", "sashimi", "izakaya"
Multi-langPhải xử riêngCross-language semantic
Multi-modalChỉ textText + image + audio unified

5.3 Use cases

  • RAG: cung cấp knowledge chunk relevant cho LLM
  • Semantic search: hiểu intent thay keyword
  • Image-to-image: upload ảnh, tìm visual similar
  • Recommendation: content-based semantic recommendation

Vector DB selection

  • Standalone: Pinecone, Milvus, Weaviate, Qdrant — perf tốt nhất
  • DB extension: pgvector (PostgreSQL), Atlas Vector Search (MongoDB) — giảm complexity
  • In-memory: FAISS, Annoy — small scale, low latency

6. Decision: chọn model nào?

Data của bạn thế nào?RecommendedĐại diện
Structure fixed, relationship rõ (order, user)RelationalMySQL, PostgreSQL
Structure linh hoạt, nested nhiều (content, config)DocumentMongoDB, DynamoDB
Entity relationship phức tạp, multi-hopGraphNeo4j, Amazon Neptune
Time-ordered write, time-range queryTime-seriesInfluxDB, TimescaleDB
Unstructured, cần semantic similarityVectorPinecone, Milvus, pgvector

Practical

System hiện đại thường multi-model:

  • Core business → PostgreSQL (relational)
  • User behavior log → InfluxDB (time-series)
  • AI knowledge base → Milvus / pgvector (vector)
  • Recommendation engine → Neo4j (graph)

Đừng theo đuổi "1 DB giải mọi thứ", để mỗi data tìm nhà phù hợp.

🗂️数据模型全景四种主流数据模型对比
不是所有数据都适合塞进关系型表格。社交网络的人脉关系、IoT 设备的时间流水、AI 搜索的语义向量——不同的数据形态需要不同的建模方式
📄文档模型 (Document)MongoDB / DynamoDB
数据以 JSON 文档存储,每条记录可以有不同的字段结构,天然适合嵌套、半结构化数据。
{
  "_id": "user_1001",
  "name": "张三",
  "tags": ["VIP", "活跃"],
  "address": {
    "city": "北京",
    "district": "朝阳区"
  },
  "orders": [
    { "id": "o1", "amount": 299 },
    { "id": "o2", "amount": 599 }
  ]
}
无需预定义 Schema,字段随时扩展
嵌套数据一次读取,无需 JOIN
跨文档关联查询较弱
典型场景:用户画像CMS 内容商品目录配置中心
💡选型原则:没有万能数据库。关系型(MySQL/PostgreSQL)仍是大多数业务的基石,但当数据形态明确偏向文档、图、时序或向量时,选择专用模型能获得数量级的性能提升

2026 cho VN dev

  • PostgreSQL + extensions đang là king: pgvector, TimescaleDB, PostGIS (geo), pg_trgm (fuzzy text)
  • VN context: e-commerce dùng PG + Redis chính; AI scenario nên thử pgvector trước Milvus
  • AI standard stack 2026: PostgreSQL (relational + vector) + Redis (cache + queue) + ClickHouse (analytics)
  • Document era: MongoDB still mainstream, Firestore cho serverless