YGGverse/Yo

mirror of https://github.com/YGGverse/Yo.git synced 2025-03-12 13:21:29 +00:00

Go to file

ghost 905af165fb draft webui

2023-11-24 22:26:02 +02:00

draft webui

2023-11-24 22:26:02 +02:00

.gitignore

ignore storage folder

2023-11-24 19:40:46 +02:00

composer.json

draft webui

2023-11-24 22:26:02 +02:00

LICENSE

Initial commit

2023-11-19 20:07:17 +02:00

README.md

draft webui

2023-11-24 22:26:02 +02:00

README.md

Yo! Micro Web Crawler in PHP & Manticore

Next generation of YGGo! project with goal to reduce server requirements and make deployment process simpler

Index model changed to the distributed cluster model, and oriented to aggregate search results from different instances trough API
Refactored data exchange model with drop all primary keys dependencies
Snaps now using tar.gz compression to reduce storage requirements and still supporting remote mirrors, FTP including
Minimalism everywhere

Implementation

Engine written in PHP and uses Manticore on backend.

Default build inspired and adapted for Yggdrasil but could be used to make internet search portal.

Components

CLI tools for index operations
JS-less frontend to make search web portal
API tools to make search index distributed

Features

MIME-based crawler with flexible filter settings
Page snap history with local and remote mirrors support

Install

Install composer, php and manticore
Grab latest Yo version git clone https://github.com/YGGverse/Yo.git
Run composer update inside the project directory
Check src/config.json for any customizations
Make sure storage folder writable
Run indexes init script php src/cli/index/init.php
Add new URL php src/cli/document/add.php URL
Run crawler php src/cli/document/crawl.php
Get search results php src/cli/document/search.php '*'

Web UI

cd src/webui
php -S 127.0.0.1:8080
now open 127.0.0.1:8080 in your browser!

Documentation

CLI

Index

Init

Create initial index

php src/cli/index/init.php [reset]

reset - optional, reset existing index

Document

Add

php src/cli/document/add.php URL

URL - add new URL to the crawl queue

Crawl

php src/cli/document/crawl.php

Search

php src/cli/document/search.php '@title "*"' [limit]

query - required
limit - optional search results limit