6.8 KiB
| name | description | category |
|---|---|---|
| table-query | Query structured spreadsheet/table data (Excel/CSV) to answer questions about values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, aggregations, lists, or any numeric/tabular lookup. Use this skill whenever the answer likely comes from uploaded tables. You locate tables, read their schema, author SQLite SQL yourself, and run it — the backend does no LLM work, so it is fast. | Data & Retrieval |
Table Query
Answer table/spreadsheet questions by authoring and running SQLite SQL against the
bot's uploaded Excel data. The backend is a thin, fast SQL executor — you do the
thinking (rewrite the question, pick tables, write SQL). Row-level citations
(__src) are produced for you.
When to use
Use table-query for: values, prices, quantities, inventory, specifications,
rankings, comparisons, summaries, aggregations (sum/avg/count), lists, person /
project / product lookups, monthly/period totals, or any question whose answer
comes from structured tables. For pure concept / definition / policy / explanation
questions, use the rag_retrieve document tool instead.
Workflow (do this in order, once)
- search-tables — rewrite the user's question into a retrieval query (core entity + attributes + synonyms), then locate candidate tables. Call this once.
- get-schemas — for the relevant subset of returned tables, fetch their
CREATE TABLEschema and sample rows. Never write SQL without seeing the schema. - author SQL — write a SQLite query plan as JSON (see below).
- run-sql — execute the plan. It returns CSV with an
__srccolumn and afile_ref_tablemapping plus citation instructions. - answer + cite — write the answer and add
<CITATION ... />tags built from__src+file_ref_table. Never print the__srccolumn to the user.
Anti-waste rules
- Call search-tables at most once per question. Do not re-locate tables you already have schemas for.
- If
run-sqlreturns an error, fix the SQL and call run-sql again (at most ~2 tries). Do NOT restart from search-tables. - If
search-tablesfinds nothing, fall back to therag_retrievedocument tool.
Commands
# 1. locate tables
python {SKILL_DIR}/scripts/table_query.py search-tables --query "2025 April May June sales total" --top-k 20
# 2. read schema + sample rows for the tables you picked
python {SKILL_DIR}/scripts/table_query.py get-schemas --tables "sales_2025,customers"
# 3. run your authored plan — pipe the JSON plan via stdin (no temp file needed)
python {SKILL_DIR}/scripts/table_query.py run-sql <<'PLAN'
{"queries":[{"step":1,"sql":"CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" GROUP BY \"month\"","source_table_names":["sales_2025"],"destine_table_name":"final_table_step1","destine_table_type":"final","destine_table_description":"Monthly totals"}]}
PLAN
Authoring the SQL plan
The plan is a JSON object { "queries": [ ... ] } that you pass to run-sql on
stdin via a quoted heredoc (<<'PLAN' ... PLAN). The quoted delimiter keeps all
the double quotes, single quotes and $ in your SQL intact — no shell escaping.
(You may instead write it to a file and use --plan-file path.json if a plan is very
large, but stdin is the default and needs no extra step.)
Each query is one SQL step:
{
"queries": [
{
"step": 1,
"sql": "CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" WHERE \"month\" IN ('2025-04','2025-05','2025-06') GROUP BY \"month\"",
"source_table_names": ["sales_2025"],
"destine_table_name": "final_table_step1",
"destine_table_type": "final",
"destine_table_description": "Monthly sales totals for Apr-Jun 2025"
}
]
}
Field meaning:
step: 1-based execution order.sql: a SQLite statement, normallyCREATE TEMP TABLE "..." AS SELECT ....source_table_names: tables this step reads (original tables, or earlier steps'destine_table_namefor multi-step plans).destine_table_name: the temp table this step creates. Convention:intermediate_table_stepNorfinal_table_stepN.destine_table_type:"final"for results the user should see,"intermediate"for helper steps. At least onefinalis required.destine_table_description: short human description of the result.
SQL rules (important)
- Quote every identifier with double quotes:
"column name","table name". - String literals use single quotes; escape
'as''. - Prefer one logical result per
finaltable. For multiple separate results, emit multiplefinaltables (e.g. step1, step2) — do NOTUNIONunrelated results. - For row-level citations to be precise, keep
finalsteps as simple single-tableSELECTs (noJOIN/GROUP BY/ aggregation). Aggregations still work but the citation degrades to file+sheet level (F1S2) instead of an exact row (F1S2R5). - Multi-step plans run in
steporder: buildintermediate_table_stepNfirst, then read it in a later step. Don't reference a temp table before it is created. - Sample rows are a format hint only — never assume they represent the full data
or the row count. Your SQL must scan the whole table. Use
LIKE '%value%'for free text and=for enums/codes.
Result handling & citations
run-sqloutput begins with citation instructions, thenfile_ref_table, then the result CSV (with__src).- Parse
__src(F1S2R5= file_ref F1, sheet 2, row 5) andfile_ref_tableto build<CITATION file="..." filename="..." sheet=N rows=[...] />. - Put citations on their own line after the list/table that uses the data; combine same-(file,sheet) rows into one citation.
- If the result hint says rows were truncated (
Only the first N rows ...; the remaining M ...), tell the user the total (N+M), shown (N), and omitted (M). - Never expose the
__srccolumn itself to the user.
Controlling truncation
run-sql truncates results by default (total rows and per-cell characters) to keep
the context manageable. If a result comes back truncated and you genuinely need more,
re-run with higher limits — do not re-run search-tables:
python {SKILL_DIR}/scripts/table_query.py run-sql --max-rows 500 --cell-max 4000 <<'PLAN'
{"queries":[ ... ]}
PLAN
--max-rows: max total rows across allfinaltables (default from backend config, hard ceiling 2000). Prefer writing an aggregate query (SUM/COUNT/GROUP BY) over pulling thousands of detail rows.--cell-max: max characters per cell before it is truncated with..(default from backend config, hard ceiling 10000). Raise this when a long-text column (e.g. a description/spec field) is getting cut off.