Knowledge Base Assessment Templates
Assessment guidelines
Set of assessment guidelines designed to make the evaluation process smoother and more effective.
Last updated: 9/6/2025
Objectives
- Signal over style: Prioritize requirements that reveal understanding (correctness, boundaries, trade‑offs) rather than superficial polish.
- Machine‑checkable first: Phrase core requirements so they can be verified via repo tree, file contents, simple runtime probes, or CI logs.
- Deterministic scoring: Combine objective checks with a small number of LLM judgments (docs clarity, rationale quality).
Candidate Deliverables
- Public Repo: on GitHub hosting with history (no ZIP uploads).
- Entry point: (e.g., index.html or src/App.*), and a top‑level README.md.
- Run profile: one of Makefile/taskfile.yml/package.json scripts or .sln + dotnet test.
- Dev container / compose (optional but favored): .devcontainer/ or docker-compose.yml.
- Design note (DESIGN.md, 0.5–2 pages): goals, choices, trade‑offs, known gaps.
- CI file (if relevant): .github/workflows/ci.yml or equivalent.
Allowed / Disallowed (explicit)
- Allowed: languages, libraries, and versions (list).
- Disallowed: frameworks, generators, or cloud services that would trivialize the task (list exact names and how detection works).
- AI usage policy: allowed for boilerplate and syntax help; must include AI_USAGE.md with tool names + prompts used; candidate remains responsible for correctness.
Requirement Taxonomy (for consistent scoring)
- Levels: must (blocking), should (quality), nice (bonus).
- Categories: functional, structure, testing, security, performance, devops, docs, accessibility, data/api, ui/ux (as needed).
Authoring Checklist
- State the core outcome: (1–2 lines): what counts as “it works”.
- List hard constraints (must): tech limits, interfaces, endpoints, structure.
- Pick 2–4 quality signals (should): tests, CI, security headers, image non‑root.
- Add 1–3 bonus items (nice): docs polish, metrics, accessibility.
- Provide a minimal scaffold (optional): file names, sample command.
- Define acceptance probes (if any exec tests): curl scripts, make test, simple load.
- Name disallowed shortcuts: and how they will be detected.
- Timebox: how long candidates should spend; typical 1–3h build; 24–48h window.
- Disclosure policy: AI usage + help sources.
Example
Assessment Template Example
# Assessment: <TITLE>
## Objective
<One-line outcome that defines success.>
## Deliverables
- Repo hosted on GitHub (with history).
- Entry point: <file(s)>
- README.md with run + test instructions
- (Optional) DESIGN.md with design notes
## Constraints
- Must use: <languages, libs, versions>
- Must NOT use: <disallowed frameworks/shortcuts>
- Deadline: <timebox>
## Requirements (author view)
### Must
- Core functionality (endpoints, CLI commands, UI elements)
- Structural rules (files, folders, config)
- Testing & CI expectations
- Security basics
### Should
- Quality (coverage, error handling, logging, docs)
- Performance or efficiency
- DevOps/automation
### Nice
- Polish (accessibility, style, metrics, analytics)
- Bonus features
## AI Usage Policy
- AI tools may be used for boilerplate.
- Candidate must disclose prompts + tools in `AI_USAGE.md`.
- Candidate is responsible for correctness.
- Live defense may be scheduled if AI-suspected.
## Timebox
Expected effort: <X> hours within <Y> day window.
## Scoring
- Must: required
- Should: quality (weighted)
- Nice: bonus
## Anti-Cheat
- Repo history will be analyzed (commits, messages).
- AI detection is probabilistic; triggers live defense, not auto-fail.
Next Steps
Now that you've completed the basics, here are some recommended next steps:
- Create custom requirements for specialized roles and specific evaluation criteria
- Learn about AI-generated requirements to accelerate assessment creation
- Discover different ways to run evaluations from various screens and interfaces
- Understand the scoring system to interpret evaluation results effectively
Need Help?
If you run into any issues or have questions, we're here to help: