# Yo!
Micro Web Crawler in PHP & Manticore
Yo! is super thin layer for Manticore search server that extends official [manticoresearch-php ](https://github.com/manticoresoftware/manticoresearch-php ) client with CLI tools and UI for [Gemini Protocol ](https://geminiprotocol.net ).
To use `HTTP` version, please checkout [main branch ](https://github.com/YGGverse/Yo )!
## Features
* MIME-based crawler with flexible filter settings by regular expressions, selectors, external links etc
* Page snap history with local and remote mirrors support (including FTP protocol)
* CLI tools for index administration and crontab tasks
* Gemini Protocol UI (coming soon)
## Components
* [Manticore Server ](https://github.com/manticoresoftware/manticoresearch )
* [PHP library for Manticore ](https://github.com/manticoresoftware/manticoresearch-php )
* [FTP client for snap mirrors ](https://github.com/YGGverse/ftp-php )
### Install
#### Environment
##### Debian
* `wget https://repo.manticoresearch.com/manticore-repo.noarch.deb`
* `dpkg -i manticore-repo.noarch.deb`
* `apt update`
* `apt install git composer manticore manticore-extra php-fpm php-mbstring`
Yo search engine uses Manticore as the primary database. If your server sensitive to power down,
change default [binlog flush strategy ](https://manual.manticoresearch.com/Logging/Binary_logging#Binary-flushing-strategies ) to `binlog_flush = 1`
#### Deployment
* `git clone https://github.com/YGGverse/Yo.git`
* `cd Yo`
* `git checkout gemini`
* `composer update`
#### Development
* `git clone https://github.com/YGGverse/Yo.git`
* `cd Yo`
* `git checkout gemini`
* `git checkout -b pr-branch`
* `git commit -m 'new fix'`
* `git push`
#### Update
* `cd Yo`
* `git pull`
* `composer update`
#### Init
* `cp example/config.json config.json`
* `php src/cli/index/init.php`
#### Usage
* `php src/cli/document/add.php URL`
* `php src/cli/document/crawl.php`
* `php src/cli/document/search.php '*'`
#### Gemini UI
Coming soon..
## Documentation
### CLI
#### Index
##### Init
Create initial index
```
php src/cli/index/init.php [reset]
```
* `reset` - optional, reset existing index
##### Alter
Change existing index
```
php src/cli/index/alter.php {operation} {column} {type}
```
* `operation` - operation name, supported values: `add` |`drop`
* `column` - target column name
* `type` - target column type, supported values: `text` |`integer`
#### Document
##### Add
```
php src/cli/document/add.php URL
```
* `URL` - add new URL to the crawl queue
##### Crawl
```
php src/cli/document/crawl.php
```
##### Clean
Make index optimization, apply new configuration rules
```
php src/cli/document/clean.php [limit]
```
* `limit` - integer, documents quantity per queue
##### Search
```
php src/cli/document/search.php '@title "*"' [limit]
```
* `query` - required
* `limit` - optional search results limit
### Backup
#### Logical
SQL text dumps could be useful for public index distribution, but requires more computing resources.
[Read more ](https://manual.manticoresearch.com/Securing_and_compacting_a_table/Backup_and_restore#Backup-and-restore-with-mysqldump )
#### Physical
Better for infrastructure administration and includes original data binaries.
[Read more ](https://manual.manticoresearch.com/Securing_and_compacting_a_table/Backup_and_restore#Using-manticore-backup-command-line-tool )
## Instances
Coming soon..