TorSearch Project
PROJECT SHOWCASE
TorSearch
A privacy-first, decoupled Tor (.onion) search engine platform for professional dark-web indexing operations.
Decoupled Two-Node Architecture
TorSearch is architected for security. By splitting the platform into a Private Local Node for crawling and an Encapsulated VPS Node for public search, operators can crawl dark-web content without exposing their infrastructure to the public internet.
Technical Stack
Core
Python 3.11+ / FastAPI
Search
OpenSearch (Read/Write Split)
Database
PostgreSQL & Redis
Network
Tor SOCKS5 & Selenium
Operational Intelligence
Advanced Crawling
- ✓ Normalized URL discovery & depth limiting
- ✓ Exponential backoff & jitter scheduling
- ✓ Auth-Crawler for "Login-Only" hidden sites
- ✓ Per-domain concurrency & rate limiting
Signal Enrichment
- ✓ Near-duplicate detection families
- ✓ PageRank & Domain Graph scoring
- ✓ Semantic embeddings (transformers) backfill
- ✓ Host authority & Uptime metrics
Trust & Safety
- ✓ ML-powered NSFW classification
- ✓ Phishing heuristics & risk scoring
- ✓ Cloaking detection & fetch comparison
- ✓ Anti-bot & Captcha telemetry dashboards
Delta Replication
- ✓ Gzip NDJSON incremental export bundles
- ✓ External versioning for safe imports
- ✓ Resumable HTTPS sync (API Key protected)
- ✓ VPS node stays hardened with zero DB bloat
Operator-Focused Admin UI
Manage queues, bulk-actions, ads, and homepage layouts from a unified, private dashboard.
Queue Browser
Domain Blocking
Auth Credentials
Ads Manager
Deploy the TorSearch Stack
Ready-to-operate infrastructure for private or public search projects.
No comments: