ScrapeIQ

Hybrid RAG and document generation for legal and regulatory work.

ScrapeIQ is a production-deployed AI document assistant that scrapes and indexes legal sources, runs hybrid (vector + keyword) search across them, and generates draft documents from your templates. Cyrillic and Latin scripts supported out of the box.

  • Hybrid RAG search

    Vector similarity combined with keyword retrieval — outperforms either alone for dense legal text.

  • Document generation

    Draft contracts, opinions, and reports from indexed sources via the /api/generator endpoint.

  • Compliance audit trail

    /api/audit logs every document access and operation; structured for SOC 2 and ISO 27001 audits.

  • Scheduled crawling

    Async job queue with cron-driven refresh, diff detection, and alerting.

  • GPU acceleration

    Optional CUDA inference path; tested in production on RTX 6000 Ada.

  • Cyrillic + Latin

    Multi-script normalisation indexes Serbian, Russian, and Latin-script documents as a single corpus.

Tech stack

  • Python
  • FastAPI
  • ChromaDB
  • Ollama
  • LangChain
  • Playwright
  • PostgreSQL
  • Docker
ScrapeIQ — Hybrid RAG and document generation for legal and regulatory work.

Who it's for

Built for document-heavy teams.

Where finding the right passage — and proving who accessed it — actually matters.

  • Legal & compliance teams

    Search statutes, filings, and precedent across scripts, and draft from your own sources.

  • Regulatory affairs

    Track regulatory change with scheduled crawling, diff detection, and alerts.

  • Knowledge & operations

    Turn a growing document backlog into answers your team can cite.

FAQ

Frequently asked questions

  • Keyword search already works for us — why change?

    Hybrid retrieval adds semantic recall on top of exact-term precision, so you miss fewer relevant passages in dense documents while still matching the exact terms keyword search catches.

  • Our documents are in Cyrillic — can it handle that?

    Yes. Multi-script normalisation indexes Cyrillic and Latin as a single corpus, so nothing fragments into separate silos.

  • Can we trust the generated drafts?

    Drafts are built from your own indexed sources and templates, not from a model's open-ended guessing, and every operation is logged for review and structured for SOC 2 / ISO 27001 evidence.

Have a project in mind?

Tell us what you want to build. We respond within one business day.