Knowledge Base Assessment Templates

Assessment guidelines

Set of assessment guidelines designed to make the evaluation process smoother and more effective.

Last updated: 9/6/2025

Objectives

  • Signal over style: Prioritize requirements that reveal understanding (correctness, boundaries, trade‑offs) rather than superficial polish.
  • Machine‑checkable first: Phrase core requirements so they can be verified via repo tree, file contents, simple runtime probes, or CI logs.
  • Deterministic scoring: Combine objective checks with a small number of LLM judgments (docs clarity, rationale quality).

Candidate Deliverables

  • Public Repo: on GitHub hosting with history (no ZIP uploads).
  • Entry point: (e.g., index.html or src/App.*), and a top‑level README.md.
  • Run profile: one of Makefile/taskfile.yml/package.json scripts or .sln + dotnet test.
  • Dev container / compose (optional but favored): .devcontainer/ or docker-compose.yml.
  • Design note (DESIGN.md, 0.5–2 pages): goals, choices, trade‑offs, known gaps.
  • CI file (if relevant): .github/workflows/ci.yml or equivalent.

Allowed / Disallowed (explicit)

  • Allowed: languages, libraries, and versions (list).
  • Disallowed: frameworks, generators, or cloud services that would trivialize the task (list exact names and how detection works).
  • AI usage policy: allowed for boilerplate and syntax help; must include AI_USAGE.md with tool names + prompts used; candidate remains responsible for correctness.

Requirement Taxonomy (for consistent scoring)

  • Levels: must (blocking), should (quality), nice (bonus).
  • Categories: functional, structure, testing, security, performance, devops, docs, accessibility, data/api, ui/ux (as needed).

Authoring Checklist

  • State the core outcome: (1–2 lines): what counts as “it works”.
  • List hard constraints (must): tech limits, interfaces, endpoints, structure.
  • Pick 2–4 quality signals (should): tests, CI, security headers, image non‑root.
  • Add 1–3 bonus items (nice): docs polish, metrics, accessibility.
  • Provide a minimal scaffold (optional): file names, sample command.
  • Define acceptance probes (if any exec tests): curl scripts, make test, simple load.
  • Name disallowed shortcuts: and how they will be detected.
  • Timebox: how long candidates should spend; typical 1–3h build; 24–48h window.
  • Disclosure policy: AI usage + help sources.

Example

Assessment Template Example

# Assessment: <TITLE>

## Objective
<One-line outcome that defines success.>

## Deliverables
- Repo hosted on GitHub (with history).
- Entry point: <file(s)>
- README.md with run + test instructions
- (Optional) DESIGN.md with design notes

## Constraints
- Must use: <languages, libs, versions>
- Must NOT use: <disallowed frameworks/shortcuts>
- Deadline: <timebox>

## Requirements (author view)
### Must
- Core functionality (endpoints, CLI commands, UI elements)
- Structural rules (files, folders, config)
- Testing & CI expectations
- Security basics

### Should
- Quality (coverage, error handling, logging, docs)
- Performance or efficiency
- DevOps/automation

### Nice
- Polish (accessibility, style, metrics, analytics)
- Bonus features

## AI Usage Policy
- AI tools may be used for boilerplate.
- Candidate must disclose prompts + tools in `AI_USAGE.md`.
- Candidate is responsible for correctness.
- Live defense may be scheduled if AI-suspected.

## Timebox
Expected effort: <X> hours within <Y> day window.

## Scoring
- Must: required
- Should: quality (weighted)
- Nice: bonus

## Anti-Cheat
- Repo history will be analyzed (commits, messages).
- AI detection is probabilistic; triggers live defense, not auto-fail.

Next Steps

Now that you've completed the basics, here are some recommended next steps:


Need Help?

If you run into any issues or have questions, we're here to help: