Micro Web Crawler in PHP & Manticore
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

60 lines
1.3 KiB

1 year ago
# Yo! Micro Web Crawler in PHP & Manticore
1 year ago
1 year ago
Yo! is the next generation of [YGGo!](https://github.com/YGGverse/YGGo) project with goal to reduce server requirements and make deployment process simpler.
1 year ago
Index model changed to the distributed clustering model, and oriented to aggregate search results from different instances trough API.
## Implementation
1 year ago
Engine written in PHP and uses [Manticore](https://github.com/manticoresoftware) search engine on backend.
1 year ago
Default build inspired and adapted for [Yggdrasil](https://github.com/yggdrasil-network) eco-system but could be used to make own search project.
1 year ago
1 year ago
## Components
1 year ago
* CLI tools for index operations
* JS-less frontend to make search web portal
* API tools to make search index distributed
1 year ago
### Features
1 year ago
* MIME-based crawler with flexible filter settings
* Page snap history with local and remote mirrors support
1 year ago
### Documentation
#### CLI
1 year ago
1 year ago
##### Index
1 year ago
1 year ago
###### Init
1 year ago
Create initial index
```
php src/cli/index/init.php [reset]
```
* `reset` - optional, reset existing index
1 year ago
##### Document
1 year ago
1 year ago
###### Add
1 year ago
```
php src/cli/document/add.php URL
```
* `URL` - add new URL to the crawl queue
1 year ago
###### Crawl
1 year ago
```
php src/cli/document/crawl.php
```
1 year ago
###### Search
1 year ago
```
php src/cli/document/search.php '@title "*"' [limit]
```
* `query` - required
* `limit` - optional search results limit