mirror of https://github.com/YGGverse/Yo.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ghost
86b20cbc51
|
12 months ago | |
---|---|---|
example | 12 months ago | |
src | 12 months ago | |
.gitignore | 1 year ago | |
LICENSE | 1 year ago | |
README.md | 1 year ago | |
composer.json | 1 year ago |
README.md
Yo!
Micro Web Crawler in PHP & Manticore
Yo! is the super thin layer for Manticore search server that extends official manticoresearch-php client with CLI tools and simple JS-less WebUI.
Features
- MIME-based crawler with flexible filter settings by regular expressions, selectors, external links etc
- Page snap history with local and remote mirrors support (including FTP protocol)
- CLI tools for index administration and crontab tasks
- JS-less frontend to run local or public search web portal
- API tools to make search index distributed
Components
- Manticore Server
- PHP library for Manticore
- Symfony DOM crawler
- Symfony CSS selector
- FTP client for snap mirrors
- Hostname ident icons
- Bootstrap icons
Install
- Install
manticore
,composer
andphp
- Grab latest
Yo
versiongit clone https://github.com/YGGverse/Yo.git
- Run
composer update
inside the project directory - Copy and customize config file
cp example/config.json config.json
- Make sure
storage
folder writable - Run indexes initiation script
php src/cli/index/init.php
- Announce new URL
php src/cli/document/add.php URL
- Run crawler to grab the data
php src/cli/document/crawl.php
- Test search results
php src/cli/document/search.php '*'
Web UI
cd src/webui
php -S 127.0.0.1:8080
- open
127.0.0.1:8080
in browser
Documentation
CLI
Index
Init
Create initial index
php src/cli/index/init.php [reset]
reset
- optional, reset existing index
Document
Add
php src/cli/document/add.php URL
URL
- add new URL to the crawl queue
Crawl
php src/cli/document/crawl.php
Clean
php src/cli/document/clean.php
- remove
url
duplicates - make index optimization
Search
php src/cli/document/search.php '@title "*"' [limit]
query
- requiredlimit
- optional search results limit
Migration
YGGo
Import index from YGGo database
php src/cli/yggo/import.php 'host' 'port' 'user' 'password' 'database' [unique=off] [start=0] [limit=100]
Source DB fields required:
host
port
user
password
database
unique
- optional, check for unique URL (takes more time)start
- optional, offset to start queuelimit
- optional, limit queue
Instances
Yggdrasil
http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/