138 lines
6.8 KiB
Markdown
138 lines
6.8 KiB
Markdown
---
|
|
name: table-query
|
|
description: Query structured spreadsheet/table data (Excel/CSV) to answer questions about values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, aggregations, lists, or any numeric/tabular lookup. Use this skill whenever the answer likely comes from uploaded tables. You locate tables, read their schema, author SQLite SQL yourself, and run it — the backend does no LLM work, so it is fast.
|
|
category: Data & Retrieval
|
|
---
|
|
|
|
# Table Query
|
|
|
|
Answer table/spreadsheet questions by authoring and running SQLite SQL against the
|
|
bot's uploaded Excel data. The backend is a thin, fast SQL executor — **you** do the
|
|
thinking (rewrite the question, pick tables, write SQL). Row-level citations
|
|
(`__src`) are produced for you.
|
|
|
|
## When to use
|
|
|
|
Use `table-query` for: values, prices, quantities, inventory, specifications,
|
|
rankings, comparisons, summaries, aggregations (sum/avg/count), lists, person /
|
|
project / product lookups, monthly/period totals, or any question whose answer
|
|
comes from structured tables. For pure concept / definition / policy / explanation
|
|
questions, use the `rag_retrieve` document tool instead.
|
|
|
|
## Workflow (do this in order, once)
|
|
|
|
1. **search-tables** — rewrite the user's question into a retrieval query (core
|
|
entity + attributes + synonyms), then locate candidate tables. Call this **once**.
|
|
2. **get-schemas** — for the relevant subset of returned tables, fetch their
|
|
`CREATE TABLE` schema and sample rows. Never write SQL without seeing the schema.
|
|
3. **author SQL** — write a SQLite query plan as JSON (see below).
|
|
4. **run-sql** — execute the plan. It returns CSV with an `__src` column and a
|
|
`file_ref_table` mapping plus citation instructions.
|
|
5. **answer + cite** — write the answer and add `<CITATION ... />` tags built from
|
|
`__src` + `file_ref_table`. Never print the `__src` column to the user.
|
|
|
|
### Anti-waste rules
|
|
|
|
- Call **search-tables at most once** per question. Do not re-locate tables you
|
|
already have schemas for.
|
|
- If `run-sql` returns an error, fix the SQL and call **run-sql** again (at most ~2
|
|
tries). Do **NOT** restart from search-tables.
|
|
- If `search-tables` finds nothing, fall back to the `rag_retrieve` document tool.
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# 1. locate tables
|
|
python {SKILL_DIR}/scripts/table_query.py search-tables --query "2025 April May June sales total" --top-k 20
|
|
|
|
# 2. read schema + sample rows for the tables you picked
|
|
python {SKILL_DIR}/scripts/table_query.py get-schemas --tables "sales_2025,customers"
|
|
|
|
# 3. run your authored plan — pipe the JSON plan via stdin (no temp file needed)
|
|
python {SKILL_DIR}/scripts/table_query.py run-sql <<'PLAN'
|
|
{"queries":[{"step":1,"sql":"CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" GROUP BY \"month\"","source_table_names":["sales_2025"],"destine_table_name":"final_table_step1","destine_table_type":"final","destine_table_description":"Monthly totals"}]}
|
|
PLAN
|
|
```
|
|
|
|
## Authoring the SQL plan
|
|
|
|
The plan is a JSON object `{ "queries": [ ... ] }` that you pass to `run-sql` **on
|
|
stdin via a quoted heredoc** (`<<'PLAN' ... PLAN`). The quoted delimiter keeps all
|
|
the double quotes, single quotes and `$` in your SQL intact — no shell escaping.
|
|
(You may instead write it to a file and use `--plan-file path.json` if a plan is very
|
|
large, but stdin is the default and needs no extra step.)
|
|
|
|
Each query is one SQL step:
|
|
|
|
```json
|
|
{
|
|
"queries": [
|
|
{
|
|
"step": 1,
|
|
"sql": "CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" WHERE \"month\" IN ('2025-04','2025-05','2025-06') GROUP BY \"month\"",
|
|
"source_table_names": ["sales_2025"],
|
|
"destine_table_name": "final_table_step1",
|
|
"destine_table_type": "final",
|
|
"destine_table_description": "Monthly sales totals for Apr-Jun 2025"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Field meaning:
|
|
- `step`: 1-based execution order.
|
|
- `sql`: a SQLite statement, normally `CREATE TEMP TABLE "..." AS SELECT ...`.
|
|
- `source_table_names`: tables this step reads (original tables, or earlier steps'
|
|
`destine_table_name` for multi-step plans).
|
|
- `destine_table_name`: the temp table this step creates. Convention:
|
|
`intermediate_table_stepN` or `final_table_stepN`.
|
|
- `destine_table_type`: `"final"` for results the user should see, `"intermediate"`
|
|
for helper steps. **At least one `final` is required.**
|
|
- `destine_table_description`: short human description of the result.
|
|
|
|
### SQL rules (important)
|
|
|
|
- **Quote every identifier** with double quotes: `"column name"`, `"table name"`.
|
|
- String literals use single quotes; escape `'` as `''`.
|
|
- Prefer **one logical result per `final` table**. For multiple separate results,
|
|
emit multiple `final` tables (e.g. step1, step2) — do **NOT** `UNION` unrelated results.
|
|
- For row-level citations to be precise, keep `final` steps as simple single-table
|
|
`SELECT`s (no `JOIN` / `GROUP BY` / aggregation). Aggregations still work but the
|
|
citation degrades to file+sheet level (`F1S2`) instead of an exact row (`F1S2R5`).
|
|
- Multi-step plans run in `step` order: build `intermediate_table_stepN` first, then
|
|
read it in a later step. Don't reference a temp table before it is created.
|
|
- **Sample rows are a format hint only** — never assume they represent the full data
|
|
or the row count. Your SQL must scan the whole table. Use `LIKE '%value%'` for free
|
|
text and `=` for enums/codes.
|
|
|
|
## Result handling & citations
|
|
|
|
- `run-sql` output begins with citation instructions, then `file_ref_table`, then the
|
|
result CSV (with `__src`).
|
|
- Parse `__src` (`F1S2R5` = file_ref F1, sheet 2, row 5) and `file_ref_table` to build
|
|
`<CITATION file="..." filename="..." sheet=N rows=[...] />`.
|
|
- Put citations on their own line **after** the list/table that uses the data; combine
|
|
same-(file,sheet) rows into one citation.
|
|
- If the result hint says rows were truncated (`Only the first N rows ...; the
|
|
remaining M ...`), tell the user the total (`N+M`), shown (`N`), and omitted (`M`).
|
|
- Never expose the `__src` column itself to the user.
|
|
|
|
### Controlling truncation
|
|
|
|
`run-sql` truncates results by default (total rows and per-cell characters) to keep
|
|
the context manageable. If a result comes back truncated and you genuinely need more,
|
|
re-run with higher limits — do **not** re-run search-tables:
|
|
|
|
```bash
|
|
python {SKILL_DIR}/scripts/table_query.py run-sql --max-rows 500 --cell-max 4000 <<'PLAN'
|
|
{"queries":[ ... ]}
|
|
PLAN
|
|
```
|
|
|
|
- `--max-rows`: max total rows across all `final` tables (default from backend config,
|
|
hard ceiling 2000). Prefer writing an aggregate query (SUM/COUNT/GROUP BY) over
|
|
pulling thousands of detail rows.
|
|
- `--cell-max`: max characters per cell before it is truncated with `..` (default from
|
|
backend config, hard ceiling 10000). Raise this when a long-text column (e.g. a
|
|
description/spec field) is getting cut off.
|