2026-06-07 08:58:22 +08:00

6.8 KiB

Raw Blame History

name	description	category
table-query	Query structured spreadsheet/table data (Excel/CSV) to answer questions about values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, aggregations, lists, or any numeric/tabular lookup. Use this skill whenever the answer likely comes from uploaded tables. You locate tables, read their schema, author SQLite SQL yourself, and run it — the backend does no LLM work, so it is fast.	Data & Retrieval

Table Query

Answer table/spreadsheet questions by authoring and running SQLite SQL against the bot's uploaded Excel data. The backend is a thin, fast SQL executor — you do the thinking (rewrite the question, pick tables, write SQL). Row-level citations (__src) are produced for you.

When to use

Use table-query for: values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, aggregations (sum/avg/count), lists, person / project / product lookups, monthly/period totals, or any question whose answer comes from structured tables. For pure concept / definition / policy / explanation questions, use the rag_retrieve document tool instead.

Workflow (do this in order, once)

search-tables — rewrite the user's question into a retrieval query (core entity + attributes + synonyms), then locate candidate tables. Call this once.
get-schemas — for the relevant subset of returned tables, fetch their CREATE TABLE schema and sample rows. Never write SQL without seeing the schema.
author SQL — write a SQLite query plan as JSON (see below).
run-sql — execute the plan. It returns CSV with an __src column and a file_ref_table mapping plus citation instructions.
answer + cite — write the answer and add <CITATION ... /> tags built from __src + file_ref_table. Never print the __src column to the user.

Anti-waste rules

Call search-tables at most once per question. Do not re-locate tables you already have schemas for.
If run-sql returns an error, fix the SQL and call run-sql again (at most ~2 tries). Do NOT restart from search-tables.
If search-tables finds nothing, fall back to the rag_retrieve document tool.

Commands

# 1. locate tables
python {SKILL_DIR}/scripts/table_query.py search-tables --query "2025 April May June sales total" --top-k 20

# 2. read schema + sample rows for the tables you picked
python {SKILL_DIR}/scripts/table_query.py get-schemas --tables "sales_2025,customers"

# 3. run your authored plan — pipe the JSON plan via stdin (no temp file needed)
python {SKILL_DIR}/scripts/table_query.py run-sql <<'PLAN'
{"queries":[{"step":1,"sql":"CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" GROUP BY \"month\"","source_table_names":["sales_2025"],"destine_table_name":"final_table_step1","destine_table_type":"final","destine_table_description":"Monthly totals"}]}
PLAN

Authoring the SQL plan

The plan is a JSON object { "queries": [ ... ] } that you pass to run-sql on stdin via a quoted heredoc (<<'PLAN' ... PLAN). The quoted delimiter keeps all the double quotes, single quotes and $ in your SQL intact — no shell escaping. (You may instead write it to a file and use --plan-file path.json if a plan is very large, but stdin is the default and needs no extra step.)

Each query is one SQL step:

{
  "queries": [
    {
      "step": 1,
      "sql": "CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" WHERE \"month\" IN ('2025-04','2025-05','2025-06') GROUP BY \"month\"",
      "source_table_names": ["sales_2025"],
      "destine_table_name": "final_table_step1",
      "destine_table_type": "final",
      "destine_table_description": "Monthly sales totals for Apr-Jun 2025"
    }
  ]
}

Field meaning:

step: 1-based execution order.
sql: a SQLite statement, normally CREATE TEMP TABLE "..." AS SELECT ....
source_table_names: tables this step reads (original tables, or earlier steps' destine_table_name for multi-step plans).
destine_table_name: the temp table this step creates. Convention: intermediate_table_stepN or final_table_stepN.
destine_table_type: "final" for results the user should see, "intermediate" for helper steps. At least one final is required.
destine_table_description: short human description of the result.

SQL rules (important)

Quote every identifier with double quotes: "column name", "table name".
String literals use single quotes; escape ' as ''.
Prefer one logical result per final table. For multiple separate results, emit multiple final tables (e.g. step1, step2) — do NOT UNION unrelated results.
For row-level citations to be precise, keep final steps as simple single-table SELECTs (no JOIN / GROUP BY / aggregation). Aggregations still work but the citation degrades to file+sheet level (F1S2) instead of an exact row (F1S2R5).
Multi-step plans run in step order: build intermediate_table_stepN first, then read it in a later step. Don't reference a temp table before it is created.
Sample rows are a format hint only — never assume they represent the full data or the row count. Your SQL must scan the whole table. Use LIKE '%value%' for free text and = for enums/codes.

Result handling & citations

run-sql output begins with citation instructions, then file_ref_table, then the result CSV (with __src).
Parse __src (F1S2R5 = file_ref F1, sheet 2, row 5) and file_ref_table to build <CITATION file="..." filename="..." sheet=N rows=[...] />.
Put citations on their own line after the list/table that uses the data; combine same-(file,sheet) rows into one citation.
If the result hint says rows were truncated (Only the first N rows ...; the remaining M ...), tell the user the total (N+M), shown (N), and omitted (M).
Never expose the __src column itself to the user.

Controlling truncation

run-sql truncates results by default (total rows and per-cell characters) to keep the context manageable. If a result comes back truncated and you genuinely need more, re-run with higher limits — do not re-run search-tables:

python {SKILL_DIR}/scripts/table_query.py run-sql --max-rows 500 --cell-max 4000 <<'PLAN'
{"queries":[ ... ]}
PLAN

--max-rows: max total rows across all final tables (default from backend config, hard ceiling 2000). Prefer writing an aggregate query (SUM/COUNT/GROUP BY) over pulling thousands of detail rows.
--cell-max: max characters per cell before it is truncated with .. (default from backend config, hard ceiling 10000). Raise this when a long-text column (e.g. a description/spec field) is getting cut off.

6.8 KiB Raw Blame History