config | ||
crontab | ||
database | ||
library | ||
media | ||
public | ||
storage/cache | ||
.gitignore | ||
LICENSE | ||
README.md |
YGGo! - Distributed & Open Source Web Search Engine
Проект присвячується захисникам міста Бахмут
Written by inspiration to explore Yggdrasil ecosystem, because of last YaCy node there was discontinued. This engine also could be useful for crawling regular websites, small business resources, local networks.
The project goal - simple interface, clear architecture and lightweight server requirement.
Overview
https://github.com/YGGverse/YGGo/tree/main/media
Online instances
- http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yggo
- http://94.140.114.241/yggo/
Requirements
php8^
php-dom
php-pdo
php-curl
php-gd
php-mysql
sphinxsearch
Installation
- The web root dir is
/public
- Deploy the database using MySQL Workbench project presented in the
/database
folder - Install Sphinx Search Server
- Configuration examples are placed at
/config
folder - Make sure
/storage
folder is writable - Set up the
/crontab
scripts by following example
JSON API
Build third party applications / index distribution.
Could be enabled or disabled by API_ENABLED
option
Address
/api.php
Search
Returns search results.
Could be enabled or disabled by API_SEARCH_ENABLED
option
Request attributes
GET action=search - required
GET query={string} - optional, search request, empty if not provided
GET type={string} - optional, search type, image|default or empty
GET page={int} - optional, search results page, 1 if not provided
GET mode=SphinxQL - optional, enable extended SphinxQL syntax
Hosts distribution
Returns node hosts collected with fields provided in API_HOSTS_FIELDS
option.
Could be enabled or disabled by API_HOSTS_ENABLED
option
Request attributes
GET action=hosts - required
Application manifest
Returns node information.
Could be enabled or disabled by API_MANIFEST_ENABLED
option
Request attributes
GET action=manifest - required
Search textual filtering
Default constructions
operator OR:
hello | world
operator MAYBE:
hello MAYBE world
operator NOT:
hello -world
strict order operator (aka operator "before"):
aaa << bbb << ccc
exact form modifier:
raining =cats and =dogs
field-start and field-end modifier:
^hello world$
keyword IDF boost modifier:
boosted^1.234 boostedfieldend$^1.234
Extended syntax
https://sphinxsearch.com/docs/current.html#extended-syntax
Could be enabled with following attributes
GET m=SphinxQL
Roadmap / ideas
- Web pages full text ranking search
- Make search results pagination
- Add robots.txt support (Issue #2)
- Improve yggdrasil links detection, add .ygg domain zone support
- Make page description visible - based on the cached content dump, when website description tag not available, add condition highlights
- Images search (basically implemented but requires testing and some performance optimization)
- Index cleaner
- Crawl queue balancer, that depends from CPU available
- Implement smart queue algorithm that indexing new sites homepage in higher priority
- Implement database auto backup on crawl process completing
- Add transactions to prevent data loss on DB crashes
- JSON API
- Distributed index data sharing between the nodes trough service API
- An idea to make unique gravatars for sites without favicons, because simpler to ident, comparing to ipv6
- An idea to make some visitors counters, like in good old times?
Contributions
Please make a new branch of master|sqliteway tree for each patch in your fork before create PR
git checkout master
git checkout -b my-pr-branch-name
See also: SQLite tree
Donate to contributors
License
- Engine sources MIT License
- Home page animation by alvarotrigo
Feedback
Please, feel free to share your ideas and bug reports here or use sources for your own implementations.
Have a good time.