mirror of
https://github.com/YGGverse/Yo.git
synced 2025-01-13 08:18:12 +00:00
Micro Web Crawler in PHP & Manticore
src | ||
.gitignore | ||
composer.json | ||
LICENSE | ||
README.md |
Yo! Micro Web Crawler in PHP & Manticore
Next generation of YGGo! project with goal to reduce server requirements and make deployment process simpler
- Index model changed to the distributed cluster model, and oriented to aggregate search results from different instances trough API
- Refactored data exchange model with drop all primary keys dependencies
- Snaps now using tar.gz compression to reduce storage requirements and still supporting remote mirrors, FTP including
- Codebase following minimalism principles everywhere
Implementation
Engine written in PHP and uses Manticore on backend.
Default build inspired and adapted for Yggdrasil eco-system but could be used to make own search project.
Components
- CLI tools for index operations
- JS-less frontend to make search web portal
- API tools to make search index distributed
Features
- MIME-based crawler with flexible filter settings
- Page snap history with local and remote mirrors support
Install
- Install
composer
,php
andmanticore
- Grab latest version
git clone https://github.com/YGGverse/Yo.git
- Run
composer update
inside the project directory - Check
src/config.json
for any customizations - Run indexes init script
php src/cli/index/init.php
- Start crawling!
Documentation
CLI
Index
Init
Create initial index
php src/cli/index/init.php [reset]
reset
- optional, reset existing index
Document
Add
php src/cli/document/add.php URL
URL
- add new URL to the crawl queue
Crawl
php src/cli/document/crawl.php
Search
php src/cli/document/search.php '@title "*"' [limit]
query
- requiredlimit
- optional search results limit