Micro Web Crawler in PHP & Manticore
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.1 KiB

Yo!

Micro Web Crawler in PHP & Manticore

Yo! is the next generation of YGGo! project with goal to reduce server requirements and make deployment process simpler.

Engine written in PHP and uses Manticore search engine on backend.

Default build adapted for Yggdrasil eco-system but could be used to make own search project.

Project contain:

  • CLI tools for index operations
  • JS-less frontend to make search web portal
  • API tools to make search index distributed

Features:

  • MIME-based crawler with flexible filter settings
  • Page snap history with local and remote mirrors support

CLI

Index

Init

Create initial index

php src/cli/index/init.php [reset]
  • reset - optional, reset existing index

Document

Add

php src/cli/document/add.php URL
  • URL - add new URL to the crawl queue

Crawl

php src/cli/document/crawl.php
php src/cli/document/search.php '@title "*"' [limit]
  • query - required
  • limit - optional search results limit