Micro Web Crawler in PHP & Manticore
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ghost 6068bcd985 update readme 1 year ago
src add crawl queue delay support 1 year ago
.gitignore initial commit 1 year ago
LICENSE Initial commit 1 year ago
README.md update readme 1 year ago
composer.json initial commit 1 year ago

README.md

Yo!

Micro Web Crawler in PHP & Manticore

Yo! is the next generation of YGGo! project with goal to reduce server requirements and make deployment process simpler.

Engine written in PHP and uses Manticore search engine on backend.

Default build adapted for Yggdrasil eco-system but could be used to make own search project.

Project contain:

  • CLI tools for index operations
  • JS-less frontend to make search web portal
  • API tools to make search index distributed

Features:

  • MIME-based crawler with flexible filter settings
  • Page snap history with local and remote mirrors support

CLI

Index

Init

Create initial index

php src/cli/index/init.php [reset]
  • reset - optional, reset existing index

Document

Add

php src/cli/document/add.php URL
  • URL - add new URL to the crawl queue

Crawl

php src/cli/document/crawl.php
php src/cli/document/search.php '@title "*"' [limit]
  • query - required
  • limit - optional search results limit