Browse Source

init gemini protocol implementation

gemini
yggverse 9 months ago
parent
commit
1f96ca8a2c
  1. 63
      README.md
  2. 7
      composer.json
  3. 206
      example/config.json
  4. 2
      src/cli/document/clean.php
  5. 467
      src/cli/document/crawl.php
  6. 8
      src/cli/index/init.php
  7. 177
      src/cli/yggo/import.php
  8. 271
      src/webui/api.php
  9. 563
      src/webui/explore.php
  10. 336
      src/webui/index.php
  11. 514
      src/webui/search.php

63
README.md

@ -2,26 +2,24 @@ @@ -2,26 +2,24 @@
Micro Web Crawler in PHP & Manticore
Yo! is the super thin layer for Manticore search server that extends official [manticoresearch-php](https://github.com/manticoresoftware/manticoresearch-php) client with CLI tools and simple JS-less WebUI.
Yo! Gemini is the super thin layer for Manticore search server that extends official [manticoresearch-php](https://github.com/manticoresoftware/manticoresearch-php) client with CLI tools and Gemini protocol UI.
This branch contain implementation for [Gemini Protocol](https://geminiprotocol.net).
To use `HTTP` version, please checkout [main branch](https://github.com/YGGverse/Yo)!
## Features
* MIME-based crawler with flexible filter settings by regular expressions, selectors, external links etc
* Page snap history with local and remote mirrors support (including FTP protocol)
* CLI tools for index administration and crontab tasks
* JS-less frontend to run local or public search web portal
* API tools to make search index distributed
* Gemini Protocol UI (coming soon)
## Components
* [Manticore Server](https://github.com/manticoresoftware/manticoresearch)
* [PHP library for Manticore](https://github.com/manticoresoftware/manticoresearch-php)
* [Symfony DOM crawler](https://github.com/symfony/dom-crawler)
* [Symfony CSS selector](https://github.com/symfony/css-selector)
* [FTP client for snap mirrors](https://github.com/YGGverse/ftp-php)
* [Hostname ident icons](https://github.com/dmester/jdenticon-php)
* [Captcha](https://github.com/Gregwar/Captcha)
* [Bootstrap icons](https://icons.getbootstrap.com/)
### Install
@ -32,22 +30,23 @@ Yo! is the super thin layer for Manticore search server that extends official [m @@ -32,22 +30,23 @@ Yo! is the super thin layer for Manticore search server that extends official [m
* `wget https://repo.manticoresearch.com/manticore-repo.noarch.deb`
* `dpkg -i manticore-repo.noarch.deb`
* `apt update`
* `apt install git composer manticore manticore-extra php-fpm php-curl php-mbstring php-gd`
* `apt install git composer manticore manticore-extra php-fpm php-mbstring`
Yo search engine uses Manticore as the primary database. If your server sensitive to power down,
change default [binlog flush strategy](https://manual.manticoresearch.com/Logging/Binary_logging#Binary-flushing-strategies) to `binlog_flush = 1`
#### Deployment
Project in development, to create new search project, use `dev-main` branch:
* `composer create-project yggverse/yo:dev-main`
* `git clone https://github.com/YGGverse/Yo.git`
* `cd Yo`
* `git checkout gemini`
* `composer update`
#### Development
* `git clone https://github.com/YGGverse/Yo.git`
* `cd Yo`
* `composer update`
* `git checkout gemini`
* `git checkout -b pr-branch`
* `git commit -m 'new fix'`
* `git push`
@ -69,11 +68,9 @@ Project in development, to create new search project, use `dev-main` branch: @@ -69,11 +68,9 @@ Project in development, to create new search project, use `dev-main` branch:
* `php src/cli/document/crawl.php`
* `php src/cli/document/search.php '*'`
#### Web UI
#### Gemini UI
1. `cd src/webui`
2. `php -S 127.0.0.1:8080`
3. open `http://127.0.0.1:8080` in browser
Coming soon..
## Documentation
@ -134,27 +131,6 @@ php src/cli/document/search.php '@title "*"' [limit] @@ -134,27 +131,6 @@ php src/cli/document/search.php '@title "*"' [limit]
* `query` - required
* `limit` - optional search results limit
##### Migration
###### YGGo
Import index from YGGo database
```
php src/cli/yggo/import.php 'host' 'port' 'user' 'password' 'database' [unique=off] [start=0] [limit=100]
```
Source DB fields required:
* `host`
* `port`
* `user`
* `password`
* `database`
* `unique` - optional, check for unique URL (takes more time)
* `start` - optional, offset to start queue
* `limit` - optional, limit queue
### Backup
#### Logical
@ -171,13 +147,4 @@ Better for infrastructure administration and includes original data binaries. @@ -171,13 +147,4 @@ Better for infrastructure administration and includes original data binaries.
## Instances
### [Yggdrasil](https://github.com/yggdrasil-network)
* `http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/` - IPv6 `0200::/7` addresses only | [index](http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/index.sql)
### [Alfis DNS](https://github.com/Revertron/Alfis)
* `http://yo.ygg` - `.ygg` domain zone search only | [index](http://yo.ygg/index.sql)
* `http://ygg.yo.index` - alias of `http://yo.ygg` | [index](http://ygg.yo.index/index.sql)
_*`*.yo.index` reserved for domain-oriented instances e.g. `.btn`, `.conf`, `.mirror` - feel free to request the address_
Coming soon..

7
composer.json

@ -15,11 +15,8 @@ @@ -15,11 +15,8 @@
],
"require": {
"manticoresoftware/manticoresearch-php": "^3.1",
"symfony/css-selector": "^6.3",
"symfony/dom-crawler": "^6.3",
"jdenticon/jdenticon": "^1.0",
"yggverse/ftp": "^1.0",
"gregwar/captcha": "^1.2",
"yggverse/net": "^1.2"
"yggverse/net": "^1.2",
"yggverse/gemini": "^0.4.0"
}
}

206
example/config.json

@ -21,7 +21,7 @@ @@ -21,7 +21,7 @@
}
}
},
"webui":
"gui":
{
"pagination":
{
@ -35,7 +35,7 @@ @@ -35,7 +35,7 @@
{
"url":{
"enabled":false,
"regex":"/.*/ui"
"regex":"/^gemini:\/\/.*/ui"
}
}
},
@ -59,9 +59,9 @@ @@ -59,9 +59,9 @@
"fields":
[
"url",
"title",
"description",
"keywords",
"h1",
"h2",
"h3",
"body"
],
"options":
@ -71,57 +71,6 @@ @@ -71,57 +71,6 @@
}
}
},
"footer":
{
"links":
[
{
"text":"0200::/7",
"attributes":
{
"title":"Search in 0200::/7 IPv6",
"href":"http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/"
},
"index":
[
"http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/index.sql"
]
},
{
"text":"yo.ygg",
"attributes":
{
"title":"Search in .ygg zone",
"href":"http://yo.ygg"
},
"index":
[
"http://yo.ygg/index.sql"
]
},
{
"text":"ygg.yo.index",
"attributes":
{
"title":"Search in .ygg zone",
"href":"http://ygg.yo.index"
},
"index":
[
"http://ygg.yo.index/index.sql"
]
},
{
"text":"GitHub",
"attributes":
{
"title":"Source code",
"href":"https://github.com/YGGverse/Yo"
},
"index":[]
}
]
},
"index":
{
"enabled":true
@ -161,119 +110,30 @@ @@ -161,119 +110,30 @@
"timeout":5,
"socket":
{
"201:5eb5:f061:678e:7565:6338:c02c:5251":80
"8.8.8.8":80
}
}
},
"curl":
"connection":
{
"connection":
{
"timeout":3
},
"download":
{
"size":
{
"max":10000024
}
}
"timeout":3,
"length":1048576,
"chunk":1
},
"queue":
{
"limit":1,
"delay":1
},
"selector":
{
"a:not([rel=nofollow])":
{
"attribute":"href",
"external":false,
"regex":"/.*/ui"
},
"image":
{
"attribute":"src",
"external":false,
"regex":"/.*/ui"
},
"audio":
{
"attribute":"src",
"external":false,
"regex":"/.*/ui"
},
"video":
{
"attribute":"src",
"external":false,
"regex":"/.*/ui"
},
"script":
{
"attribute":"href",
"external":false,
"regex":"/.*/ui"
}
},
"skip":
"url":
{
"stripos":
"external":true,
"regex":"/^gemini:\/\/.*/ui",
"skip":
{
"url":
"stripos":
[
"#",
"?",
"javascript:",
"mailto:",
"magnet:",
"xmpp:",
"/commit",
"/diff",
"/print",
"/raw",
"/cache",
"/download",
"/share",
"/explore",
"/register",
"/login",
"/password",
"/forgot",
"/restore",
"/account",
"/reply",
"/read",
"/compose",
"/comment",
"/add",
"/edit",
"/delete",
"/quote",
"/report",
"/export",
"/import",
"/mobile",
"/mwiki",
"/branch",
"/block",
"/transaction",
"/search",
"/tag",
"/page",
"/sort",
"/order",
"/pdf",
"/fb2",
"/mobi",
"/epub",
"/djvu",
"/_detail",
"/_media",
"/t/",
"/q/",
"/s/"
"?"
]
}
},
@ -297,28 +157,21 @@ @@ -297,28 +157,21 @@
"directory":"storage/snap",
"size":
{
"max":10000024
"max":1048576
},
"mime":
"meta":
{
"stripos":
[
"application/xhtml+xml",
"application/javascript",
"text/html",
"text/plain",
"text/css",
"image/webp",
"image/png",
"image/gif",
"image/ico"
"text/gemini",
"image/"
]
},
"url":
{
"stripos":
[
"http"
"gemini://"
]
}
},
@ -345,28 +198,21 @@ @@ -345,28 +198,21 @@
},
"size":
{
"max":10000024
"max":1048576
},
"mime":
"meta":
{
"stripos":
[
"application/xhtml+xml",
"application/javascript",
"text/html",
"text/plain",
"text/css",
"image/webp",
"image/png",
"image/gif",
"image/ico"
"text/gemini",
"image/"
]
},
"url":
{
"stripos":
[
"http"
"gemini://"
]
}
}

2
src/cli/document/clean.php

@ -39,7 +39,7 @@ $index = $client->index( @@ -39,7 +39,7 @@ $index = $client->index(
// Apply new configuration rules
echo _('apply new configuration rules...') . PHP_EOL;
foreach ($config->cli->document->crawl->skip->stripos->url as $condition)
foreach ($config->cli->document->crawl->url->skip->stripos as $condition)
{
echo sprintf(
_('cleanup documents with url that contain substring "%s"...') . PHP_EOL,

467
src/cli/document/crawl.php

@ -6,7 +6,7 @@ $microtime = microtime(true); @@ -6,7 +6,7 @@ $microtime = microtime(true);
// Load dependencies
require_once __DIR__ . '/../../../vendor/autoload.php';
// Define helpers
// Define helpers @TODO move to separated library (yo-php)
function getLastSnapTime(array $files): int
{
$time = [];
@ -37,6 +37,40 @@ function getLastSnapTime(array $files): int @@ -37,6 +37,40 @@ function getLastSnapTime(array $files): int
return 0;
}
function relative2absolute(
string $source, // current document url to grab the base
string $target, // relative or absolute link
?string &$scheme = null,
?string &$host = null,
?int &$port = null
) {
if (!parse_url($target, PHP_URL_HOST))
{
$scheme = parse_url($base, PHP_URL_SCHEME);
$host = parse_url($base, PHP_URL_HOST);
$port = parse_url($base, PHP_URL_PORT);
return $scheme . '://' . $host . ($port ? ':' . $port : null) .
'/' .
trim(
ltrim(
str_replace(
[
'./',
'../'
],
'',
$target
),
'/'
),
'.'
);
}
return $target;
}
// Init config
$config = json_decode(
file_get_contents(
@ -182,16 +216,16 @@ foreach($index->search('') @@ -182,16 +216,16 @@ foreach($index->search('')
$data =
[
'url' => $document->get('url'),
'title' => $document->get('title'),
'description' => $document->get('description'),
'keywords' => $document->get('keywords'),
'code' => $document->get('code'),
'size' => $document->get('size'),
'mime' => $document->get('mime'),
'rank' => $document->get('rank'),
'time' => $time,
'index' => 0
'url' => $document->get('url'),
'h1' => $document->get('h1'),
'h2' => $document->get('h2'),
'h3' => $document->get('h3'),
'code' => $document->get('code'),
'size' => $document->get('size'),
'meta' => $document->get('meta'),
'rank' => $document->get('rank'),
'time' => $time,
'index' => 0
];
// Debug target
@ -205,114 +239,50 @@ foreach($index->search('') @@ -205,114 +239,50 @@ foreach($index->search('')
);
}
// Update index time anyway and set reset code to 404
// Update index time anyway and set reset code to 51
$index->updateDocument(
[
'time' => time(),
'code' => 200,
'code' => 20,
'index' => 0
],
$document->getId()
);
// Request remote URL
$request = curl_init(
$request = new \Yggverse\Gemini\Client\Request(
$document->get('url')
);
// Drop URL with long response
curl_setopt(
$request,
CURLOPT_CONNECTTIMEOUT,
$config->cli->document->crawl->curl->connection->timeout
);
curl_setopt(
$request,
CURLOPT_TIMEOUT,
$config->cli->document->crawl->curl->connection->timeout
);
// Prevent huge content download e.g. media streams URL
curl_setopt(
$request,
CURLOPT_RETURNTRANSFER,
true
);
curl_setopt(
$request,
CURLOPT_NOPROGRESS,
false
);
curl_setopt(
$request,
CURLOPT_PROGRESSFUNCTION,
function(
$download,
$downloaded,
$upload,
$uploaded
) {
global $config;
global $index;
global $document;
$index->updateDocument(
[
'time' => time(),
'code' => 200,
'index' => 0
],
$document->getId()
);
return $downloaded > $config->cli->document->crawl->curl->download->size->max ? 1 : 0;
}
$response = new \Yggverse\Gemini\Client\Response(
$request->getResponse(
$config->cli->document->crawl->connection->timeout,
$config->cli->document->crawl->connection->length,
$config->cli->document->crawl->connection->chunk,
$length
)
);
// Begin request
if ($response = curl_exec($request))
if ($code = $request->getCode()) // @TODO process redirects
{
// Update HTTP code or skip on empty
if ($code = curl_getinfo($request, CURLINFO_HTTP_CODE))
{
// Delete deprecated document from index as HTTP code still not 200
/*
if ($code != 200 && !empty($data['code']) && $data['code'] != 200)
{
$index->deleteDocument(
$document->getId()
);
continue;
}
*/
$data['code'] = $code;
} else continue;
// Update status code
$data['code'] = $code;
// Update size or skip on empty
if ($size = curl_getinfo($request, CURLINFO_SIZE_DOWNLOAD))
if ($length)
{
$size = round( // float
$size
);
$data['size'] = $size;
$data['size'] = $length;
} else continue;
// Update MIME type or skip on empty
if ($type = curl_getinfo($request, CURLINFO_CONTENT_TYPE))
// Update meta or skip on empty
if ($meta = $response->getMeta())
{
$data['mime'] = $type;
$data['meta'] = $meta;
// On document charset specified
if (preg_match('/charset=([^\s;]+)/i', $type, $charset))
if (preg_match('/charset=([^\s;]+)/i', $meta, $charset))
{
if (!empty($charset[1]))
{
@ -322,10 +292,12 @@ foreach($index->search('') @@ -322,10 +292,12 @@ foreach($index->search('')
if (strtolower($charset[1]) == strtolower($encoding))
{
// Convert response to UTF-8
$response = mb_convert_encoding(
$response,
'UTF-8',
$charset[1]
$response->setBody(
mb_convert_encoding(
$response->getBody(),
'UTF-8',
$charset[1]
)
);
break;
@ -336,241 +308,102 @@ foreach($index->search('') @@ -336,241 +308,102 @@ foreach($index->search('')
} else continue;
// DOM crawler
if (
false !== stripos($type, 'text/html')
||
false !== stripos($type, 'text/xhtml')
||
false !== stripos($type, 'application/xhtml')
) {
$crawler = new Symfony\Component\DomCrawler\Crawler();
$crawler->addHtmlContent(
$response
// Gemtext parser
if (false !== stripos($response->getMeta(), 'text/gemini'))
{
$body = new \Yggverse\Gemini\Client\Gemtext\Body(
$response->getBody()
);
// Get title
foreach ($crawler->filter('head > title')->each(function($node) {
return $node->text();
}) as $value)
// Get H1
$h1 = [];
foreach ($body->getH1() as $value)
{
if (!empty($value))
{
$data['title'] = trim(
strip_tags(
html_entity_decode(
$value
)
)
);
}
$h1[] = $value;
}
// Get description
foreach ($crawler->filter('head > meta[name="description"]')->each(function($node) {
return $node->attr('content');
$data['h1'] = implode(
',',
array_unique(
$h1
)
);
}) as $value)
// Get H1
$h2 = [];
foreach ($body->getH2() as $value)
{
if (!empty($value))
{
$data['description'] = trim(
strip_tags(
html_entity_decode(
$value
)
)
);
}
$h2[] = $value;
}
// Get keywords
$keywords = [];
// Extract from meta tag
foreach ($crawler->filter('head > meta[name="keywords"]')->each(function($node) {
return $node->attr('content');
}) as $value)
{
if (!empty($value))
{
foreach ((array) explode(
',',
mb_strtolower(
strip_tags(
html_entity_decode(
$value
)
)
)
) as $keyword)
{
// Remove extra spaces
$keyword = trim(
$keyword
);
// Skip short words
if (mb_strlen($keyword) > 2)
{
$keywords[] = $keyword;
}
}
}
}
// Get keywords from headers
/* Disable keywords collection from headers as body index enabled
foreach ($crawler->filter('h1,h2,h3,h4,h5,h6')->each(function($node) {
return $node->text();
$data['h2'] = implode(
',',
array_unique(
$h2
)
);
}) as $value)
// Get H3
$h3 = [];
foreach ($body->getH3() as $value)
{
if (!empty($value))
{
foreach ((array) explode(
',',
mb_strtolower(
strip_tags(
html_entity_decode(
$value
)
)
)
) as $keyword)
{
// Remove extra spaces
$keyword = trim(
$keyword
);
// Skip short words
if (mb_strlen($keyword) > 2)
{
$keywords[] = $keyword;
}
}
}
$h3[] = $value;
}
*/
// Keep keywords unique
$keywords = array_unique(
$keywords
$data['h3'] = implode(
',',
array_unique(
$h3
)
);
// Update previous keywords when new value exists
if ($keywords)
{
$data['keywords'] = implode(',', $keywords);
}
// Save document body text to index
foreach ($crawler->filter('html > body')->each(function($node) {
return $node->html();
}) as $value)
{
if (!empty($value))
{
$data['body'] = trim(
preg_replace(
'/[\s]{2,}/', // strip extra separators
' ',
strip_tags(
str_replace( // make text separators before strip any closing tag, new line, etc
[
'<',
'>',
PHP_EOL,
],
[
' <',
'> ',
PHP_EOL . ' ',
],
preg_replace(
[
'/<script([^>]*)>([\s\S]*?)<\/script>/i', // strip js content
'/<style([^>]*)>([\s\S]*?)<\/style>/i', // strip css content
'/<pre([^>]*)>([\s\S]*?)<\/pre>/i', // strip code content
'/<code([^>]*)>([\s\S]*?)<\/code>/i',
],
'',
html_entity_decode(
$value
)
)
)
)
)
);
}
}
$data['body'] = trim(
preg_replace(
'/[\s]{2,}/', // strip extra separators
' ',
$response->getBody()
)
);
// Crawl documents
// Crawl links
$documents = [];
$scheme = parse_url($document->get('url'), PHP_URL_SCHEME);
$host = parse_url($document->get('url'), PHP_URL_HOST);
$port = parse_url($document->get('url'), PHP_URL_PORT);
foreach ($config->cli->document->crawl->selector as $selector => $settings)
foreach ($body->getLinks() as $line)
{
foreach ($crawler->filter($selector)->each(function($node) {
return $node;
$link = new \Yggverse\Gemini\Gemtext\Link(
$line
);
}) as $value) {
if ($url = $link->getAddress())
{
//Make relative links absolute
$url = relative2absolute(
$document->get('url'),
$url,
$scheme,
$host,
$port,
);
if ($url = $value->attr($settings->attribute))
// Regex rules
if (!preg_match($config->cli->document->crawl->url->regex, $url))
{
//Make relative links absolute
if (!parse_url($url, PHP_URL_HOST))
{
$url = $scheme . '://' . $host . ($port ? ':' . $port : null) .
'/' .
trim(
ltrim(
str_replace(
[
'./',
'../'
],
'',
$url
),
'/'
),
'.'
);
}
// Regex rules
if (!preg_match($settings->regex, $url))
{
continue;
}
// External host rules
if (!$settings->external && parse_url($url, PHP_URL_HOST) != $host)
{
continue;
}
continue;
}
$documents[] = $url;
// External host rules
if (!$config->cli->document->crawl->url->external && parse_url($url, PHP_URL_HOST) != $host)
{
continue;
}
$documents[] = $url;
}
}
// @TODO find document links by protocol ($body->findLinks('gemini'))
if ($documents)
{
foreach (array_unique($documents) as $url)
@ -578,7 +411,7 @@ foreach($index->search('') @@ -578,7 +411,7 @@ foreach($index->search('')
// Apply stripos condition
$skip = false;
foreach ($config->cli->document->crawl->skip->stripos->url as $condition)
foreach ($config->cli->document->crawl->url->skip->stripos as $condition)
{
if (false !== stripos($url, $condition)) {
@ -597,7 +430,7 @@ foreach($index->search('') @@ -597,7 +430,7 @@ foreach($index->search('')
date('c'),
$url,
print_r(
$config->cli->document->crawl->skip->stripos->url,
$config->cli->document->crawl->url->skip->stripos,
true
)
);
@ -701,7 +534,7 @@ foreach($index->search('') @@ -701,7 +534,7 @@ foreach($index->search('')
}
// Create snap
if ($config->cli->document->crawl->snap->enabled && $code === 200)
if ($config->cli->document->crawl->snap->enabled && $request->getCode() === 20)
{
try
{
@ -734,12 +567,12 @@ foreach($index->search('') @@ -734,12 +567,12 @@ foreach($index->search('')
$snap->addFromString(
'DATA',
$response
$response->getBody()
);
$snap->addFromString(
'MIME',
$type
'META',
$response->getMeta()
);
$snap->addFromString(
@ -767,12 +600,12 @@ foreach($index->search('') @@ -767,12 +600,12 @@ foreach($index->search('')
// Copy to local storage on enabled
if ($config->snap->storage->local->enabled)
{
// Check for mime allowed
// Check for meta allowed
$allowed = false;
foreach ($config->snap->storage->local->mime->stripos as $whitelist)
foreach ($config->snap->storage->local->meta->stripos as $whitelist)
{
if (false !== stripos($type, $whitelist))
if (false !== stripos($response->getMeta(), $whitelist))
{
$allowed = true;
break;
@ -904,12 +737,12 @@ foreach($index->search('') @@ -904,12 +737,12 @@ foreach($index->search('')
continue;
}
// Check for mime allowed
// Check for meta allowed
$allowed = false;
foreach ($ftp->mime->stripos as $whitelist)
foreach ($ftp->meta->stripos as $whitelist)
{
if (false !== stripos($type, $whitelist))
if (false !== stripos($response->getMeta(), $whitelist))
{
$allowed = true;
break;

8
src/cli/index/init.php

@ -52,15 +52,15 @@ $result = $index->create( @@ -52,15 +52,15 @@ $result = $index->create(
[
'type' => 'text'
],
'title' =>
'h1' =>
[
'type' => 'text'
],
'description' =>
'h2' =>
[
'type' => 'text'
],
'keywords' =>
'h3' =>
[
'type' => 'text'
],
@ -68,7 +68,7 @@ $result = $index->create( @@ -68,7 +68,7 @@ $result = $index->create(
[
'type' => 'text'
],
'mime' =>
'meta' =>
[
'type' => 'text'
],

177
src/cli/yggo/import.php

@ -1,177 +0,0 @@ @@ -1,177 +0,0 @@
<?php
// Load dependencies
require_once __DIR__ . '/../../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../../config.json'
)
);
// Init manticore
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Connect Yggo DB
try
{
$yggo = new PDO(
'mysql:dbname=' . $argv[5] . ';host=' . $argv[1] . ';port=' . $argv[2] . ';charset=utf8',
$argv[3],
$argv[4],
[
PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8'
]
);
$yggo->setAttribute(
PDO::ATTR_ERRMODE,
PDO::ERRMODE_EXCEPTION
);
$yggo->setAttribute(
PDO::ATTR_DEFAULT_FETCH_MODE,
PDO::FETCH_OBJ
);
$yggo->setAttribute(
PDO::ATTR_TIMEOUT,
600
);
}
catch (Exception $error)
{
var_dump(
$error
);
exit;
}
$start = isset($argv[7]) ? (int) $argv[7] : 0;
$limit = isset($argv[8]) ? (int) $argv[8] : 100;
$total = $yggo->query('SELECT COUNT(*) AS `total` FROM `hostPage`
WHERE `hostPage`.`httpCode` = 200
AND `hostPage`.`timeUpdated` IS NOT NULL
AND `hostPage`.`mime` IS NOT NULL
AND `hostPage`.`size` IS NOT NULL')->fetch()->total;
$processed = $start;
for ($i = 0; $i <= $total; $i++)
{
$query = $yggo->query('SELECT `hostPage`.`hostPageId`,
`hostPage`.`httpCode`,
`hostPage`.`mime`,
`hostPage`.`size`,
`hostPage`.`timeUpdated`,
`hostPage`.`uri`,
`host`.`scheme`,
`host`.`name`,
`host`.`port`,
(
SELECT `hostPageDescription`.`title` FROM `hostPageDescription`
WHERE `hostPageDescription`.`hostPageId` = `hostPage`.`hostPageId`
ORDER BY `hostPageDescription`.`timeAdded` DESC
LIMIT 1
) AS `title`,
(
SELECT `hostPageDescription`.`description` FROM `hostPageDescription`
WHERE `hostPageDescription`.`hostPageId` = `hostPage`.`hostPageId`
ORDER BY `hostPageDescription`.`timeAdded` DESC
LIMIT 1
) AS `description`,
(
SELECT `hostPageDescription`.`keywords` FROM `hostPageDescription`
WHERE `hostPageDescription`.`hostPageId` = `hostPage`.`hostPageId`
ORDER BY `hostPageDescription`.`timeAdded` DESC
LIMIT 1
) AS `keywords`
FROM `hostPage`
JOIN `host` ON (`host`.`hostId` = `hostPage`.`hostId`)
WHERE `hostPage`.`httpCode` = 200
AND `hostPage`.`timeUpdated` IS NOT NULL
AND `hostPage`.`mime` IS NOT NULL
AND `hostPage`.`size` IS NOT NULL
GROUP BY `hostPage`.`hostPageId`
LIMIT ' . $start . ',' . $limit);
foreach ($query->fetchAll() as $remote)
{
$url = $remote->scheme . '://' . $remote->name . ($remote->port ? ':' . $remote->port : false) . $remote->uri;
$crc32url = crc32($url);
// Check for unique URL requested
if (isset($argv[6]))
{
$local = $index->search('')
->filter('id', $crc32url)
->limit(1)
->get();
if ($local->getTotal())
{
// Result
echo sprintf(
_('[%s/%s] [skip duplicate] %s') . PHP_EOL,
$processed++,
$total,
$url
);
continue;
}
}
$index->addDocument(
[
'url' => $url,
'time' => (int) $remote->timeUpdated,
'code' => (int) $remote->httpCode,
'size' => (int) $remote->size,
'mime' => (string) $remote->mime,
'title' => (string) $remote->title,
'description' => (string) $remote->description,
'keywords' => (string) $remote->keywords
],
(int) $crc32url
);
// Result
echo sprintf(
_('[%s/%s] [add] %s') . PHP_EOL,
$processed++,
$total,
$url
);
}
// Update queue offset
$start = $start + $limit;
}
// Done
echo _('import completed!') . PHP_EOL;

271
src/webui/api.php

@ -1,271 +0,0 @@ @@ -1,271 +0,0 @@
<?php
// Debug
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Set headers
header('Content-Type: application/json; charset=utf-8');
// Action
switch (!empty($_GET['action']) ? $_GET['action'] : false) {
// Snap methods
case 'snap':
switch (!empty($_GET['method']) ? $_GET['method'] : false) {
case 'download':
// Validate required attributes
switch (false)
{
case isset($_GET['source']):
echo json_encode(
[
'status' => false,
'message' => _('valid source required')
]
);
exit;
case isset($_GET['id']) && preg_match('/^[\d]+$/', $_GET['id']):
echo json_encode(
[
'status' => false,
'message' => _('valid document identifier required')
]
);
exit;
case isset($_GET['time']) && preg_match('/^[\d]+$/', $_GET['time']):
echo json_encode(
[
'status' => false,
'message' => _('valid time required')
]
);
exit;
}
// Detect remote snap source
if (preg_match('/^[\d]+$/', $_GET['source']))
{
if (!isset($config->snap->storage->remote->ftp[$_GET['source']]) || !$config->snap->storage->remote->ftp[$_GET['source']]->enabled)
{
echo json_encode(
[
'status' => false,
'message' => _('requested source not found')
]
);
exit;
}
// Connect remote
$remote = new \Yggverse\Ftp\Client();
$connection = $remote->connect(
$config->snap->storage->remote->ftp[$_GET['source']]->connection->host,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->port,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->username,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->password,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->directory,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->timeout,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->passive
);
// Remote host connected
if ($connection) {
// Prepare snap path
$filename = sprintf(
'%s/%s.tar.gz',
implode(
'/',
str_split(
$_GET['id']
)
),
$_GET['time']
);
// Check snap exist
if (!$size = $remote->size($filename))
{
echo json_encode(
[
'status' => false,
'message' => _('requested snap not found')
]
);
exit;
}
// Set headers
header(
'Content-Type: application/tar+gzip'
);
header(
sprintf(
'Content-Length: %s',
$size
)
);
header(
sprintf(
'Content-Disposition: filename="snap.%s.%s"',
$_GET['id'],
basename(
$filename
)
)
);
// Return file
$remote->get(
$filename,
'php://output'
);
$remote->close();
}
}
// Local
else if ($config->snap->storage->local->enabled)
{
// Prefix absolute
if ('/' === substr($config->snap->storage->local->directory, 0, 1))
{
$prefix = $config->snap->storage->local->directory;
}
// Prefix relative
else
{
$prefix = __DIR__ . '/../../' . $config->snap->storage->local->directory;
}
// Prepare snap path
$filename = sprintf(
'%s/%s/%s.tar.gz',
$prefix,
implode(
'/',
str_split(
$_GET['id']
)
),
$_GET['time']
);
// Check snap exist
if (!file_exists($filename) || !is_readable($filename))
{
echo json_encode(
[
'status' => false,
'message' => _('requested snap not found')
]
);
exit;
}
// Check snap has valid size
if (!$size = filesize($filename))
{
echo json_encode(
[
'status' => false,
'message' => _('requested snap has invalid size')
]
);
exit;
}
// Set headers
header(
'Content-Type: application/tar+gzip'
);
header(
sprintf(
'Content-Length: %s',
$size
)
);
header(
sprintf(
'Content-Disposition: filename="snap.%s.%s"',
$_GET['id'],
basename(
$filename
)
)
);
readfile(
$filename
);
exit;
}
else
{
echo json_encode(
[
'status' => false,
'message' => _('requested source not found')
]
);
}
break;
default:
echo json_encode(
[
'status' => false,
'message' => _('Undefined API method')
]
);
}
break;
default:
echo json_encode(
[
'status' => false,
'message' => _('Undefined API action')
]
);
}

563
src/webui/explore.php

@ -1,563 +0,0 @@ @@ -1,563 +0,0 @@
<?php
// Debug
# ini_set('display_errors', '1');
# ini_set('display_startup_errors', '1');
# error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Show totals in placeholder
// Init
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Get totals
$total = $index->search('')
->option('cutoff', 0)
->limit(0)
->get()
->getTotal();
$placeholder = sprintf(
_('Search in %s documents %s'),
number_format(
$total
),
$config->webui->search->index->request->url->enabled ? _('or enter new address to crawl...') : false
);
// Get document data
$document = $index->getDocumentById(
isset($_GET['i']) ? $_GET['i'] : 0
);
// Get icon
$hostname = parse_url(
$document->url,
PHP_URL_HOST
);
$identicon = new \Jdenticon\Identicon();
$identicon->setValue(
$hostname
);
$identicon->setSize(36);
$identicon->setStyle(
[
'backgroundColor' => 'rgba(255, 255, 255, 0)',
'padding' => 0
]
);
$icon = $identicon->getImageDataUri('webp');
// Get snaps info
$snaps = [];
/// Prepare location
$filepath = implode(
'/',
str_split(
$document->getId()
)
);
/// Local snaps
if ($config->snap->storage->local->enabled)
{
/// absolute
if ('/' === substr($config->snap->storage->local->directory, 0, 1))
{
$prefix = $config->snap->storage->local->directory;
}
/// relative
else
{
$prefix = __DIR__ . '/../../' . $config->snap->storage->local->directory;
}
$directory = sprintf('%s/%s', $prefix, $filepath);
if (is_dir($directory))
{
foreach ((array) scandir($directory) as $filename)
{
if (!str_ends_with($filename, '.tar.gz'))
{
continue;
}
$basename = basename(
$filename
);
$time = preg_replace(
'/^([\d]+)\.tar\.gz$/',
'$1',
$basename
);
$snaps[_('Local')][] = (object)
[
'source' => 'local',
'id' => $document->getId(),
'name' => $basename,
'time' => $time,
'size' => filesize(
sprintf(
'%s/%s',
$directory,
$filename
)
),
];
}
}
}
/// Remote snaps
foreach ($config->snap->storage->remote->ftp as $i => $ftp)
{
// Resource enabled
if (!$ftp->enabled)
{
continue;
}
$remote = new \Yggverse\Ftp\Client();
$connection = $remote->connect(
$ftp->connection->host,
$ftp->connection->port,
$ftp->connection->username,
$ftp->connection->password,
$ftp->connection->directory,
$ftp->connection->timeout,
$ftp->connection->passive
);
// Remote host connected
if ($connection) {
foreach ((array) $remote->nlist($filepath) as $filename)
{
if (!str_ends_with($filename, '.tar.gz'))
{
continue;
}
$basename = basename(
$filename
);
$time = preg_replace(
'/^([\d]+)\.tar\.gz$/',
'$1',
$basename
);
$snaps[sprintf(_('Server #%s'), $i + 1)][] = (object)
[
'source' => $i,
'id' => $document->getId(),
'name' => $basename,
'time' => $time,
'size' => $remote->size($filename),
];
}
$remote->close();
}
}
// Process index request
if ($config->webui->index->enabled)
{
session_start();
if (isset($_POST['captcha']) && $_POST['captcha'] == $_SESSION['captcha'])
{
$index->updateDocument(
[
'index' => time()
],
$document->getId()
);
header(
sprintf(
'Location: explore.php?i=%d',
$document->getId()
)
);
}
$captcha = new \Gregwar\Captcha\CaptchaBuilder(
null,
new \Gregwar\Captcha\PhraseBuilder(
$config->webui->captcha->length,
$config->webui->captcha->phrase
)
);
$captcha->setBackgroundColor(
$config->webui->captcha->background->r,
$config->webui->captcha->background->g,
$config->webui->captcha->background->b
);
$captcha->build();
$_SESSION['captcha'] = $captcha->getPhrase();
}
?>
<!DOCTYPE html>
<html lang="<?php echo _('en-US'); ?>">
<head>
<title><?php echo _('Yo! explore') ?></title>
<meta charset="utf-8" />
<style>
* {
border: 0;
margin: 0;
padding: 0;
font-family: Sans-serif;
color: #ccc;
}
body {
background-color: #2e3436;
word-break: break-word;
}
header {
background-color: #34393b;
position: fixed;
top: 0;
left: 0;
right: 0;
z-index: 2;
}
main {
margin-top: 80px;
margin-bottom: 76px;
padding: 0 20px;
}
main > div {
max-width: 640px;
margin: 0 auto;
padding: 8px 0;
border-top: 1px #000 dashed;
font-size: 14px;
}
main > div > div {
margin: 8px 0;
font-size: 13px;
}
h1 {
position: fixed;
top: 2px;
left: 24px;
}
h1 > a,
h1 > a:visited,
h1 > a:active,
h1 > a:hover {
color: #fff;
font-weight: normal;
font-size: 22px;
text-decoration: none;
}
h2 {
display: block;
font-size: 15px;
font-weight: normal;
margin: 4px 0;
color: #fff;
}
h3 {
display: block;
font-size: 13px;
margin: 4px 0;
}
pre {
border-radius: 4px;
border: 1px #000 dashed;
font-size: 13px;
margin: 8px 0;
max-height: 180px;
overflow: auto;
padding: 8px;
position: relative;
white-space: pre-wrap;
}
form {
display: block;
max-width: 678px;
margin: 0 auto;
text-align: center;
}
fieldset {
width: 150px;
}
input[type="text"],
input[type="text"]:-webkit-autofill,
input[type="text"]:-webkit-autofill:focus {
transition: background-color 0s 600000s, color 0s 600000s; /* chrome */
width: 100%;
margin: 12px 0;
padding: 6px 0;
border-radius: 32px;
background-color: #000;
color: #fff;
font-size: 15px;
text-align: center;
}
input[type="text"]:hover {
background-color: #111
}
input[type="text"]:focus {
outline: none;
background-color: #111
}
input[type="text"]:focus::placeholder {
color: #090808
}
label {
font-size: 14px;
position: absolute;
right: 80px;
top: 18px;
}
label > input {
width: auto;
margin: 0 4px;
}
button {
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
background-color: #3394fb;
color: #fff;
font-size: 14px;
}
button {
background-color: #4b9df4;
height: 32px;
vertical-align: top;
}
header button {
position: fixed;
top: 12px;
right: 24px;
}
a, a:visited, a:active {
color: #9ba2ac;
font-size: 12px;
}
a:hover {
color: #54a3f7;
}
ul {
margin: 0;
padding: 0;
}
ul > li {
margin-left: 16px;
font-size: 13px;
padding: 4px 0;
}
.text-warning {
color: #db6161;
}
</style>
</head>
<body>
<header>
<form name="search" method="GET" action="search.php">
<h1><a href="./"><?php echo _('Yo!') ?></a></h1>
<input type="text" name="q" placeholder="<?php echo $placeholder ?>" value="" />
<?php if ($config->webui->search->extended->enabled) { ?>
<label for="e">
<input type="checkbox" name="e" id="e" value="true" />
<?php echo _('Extended') ?>
</label>
<?php } ?>
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="white" class="bi bi-search" viewBox="0 0 16 16">
<path d="M11.742 10.344a6.5 6.5 0 1 0-1.397 1.398h-.001c.03.04.062.078.098.115l3.85 3.85a1 1 0 0 0 1.415-1.414l-3.85-3.85a1.007 1.007 0 0 0-.115-.1zM12 6.5a5.5 5.5 0 1 1-11 0 5.5 5.5 0 0 1 11 0"/>
</svg>
</sub>
</button>
</form>
</header>
<main>
<?php if ($document) { ?>
<div>
<?php if (empty($document->time)) { ?>
<div>
<?php echo _('Document pending for crawler in queue') ?>
</div>
<?php } else { ?>
<?php if (!empty($document->title)) { ?>
<h2>
<?php echo htmlentities($document->title) ?>
</h2>
<?php } ?>
<?php if (!empty($document->description)) { ?>
<div>
<?php echo htmlentities($document->description) ?>
</div>
<?php } ?>
<?php if (!empty($document->keywords)) { ?>
<div>
<?php echo htmlentities($document->keywords) ?>
</div>
<?php } ?>
<?php } ?>
<div>
<a href="<?php echo $document->url ?>"><?php echo htmlentities(urldecode($document->url)) ?></a>
</div>
</div>
<div>
<div>
<img src="<?php echo $icon ?>" title="<?php echo $hostname ?>" alt="identicon" />
</div>
<?php if (!empty($document->code)) { ?>
<h3><?php echo _('HTTP') ?></h3>
<?php if ($document->code == 200) { ?>
<div>
<?php echo $document->code ?>
</div>
<?php } else { ?>
<div class="text-warning">
<?php echo $document->code ?>
</div>
<?php } ?>
<?php } ?>
<?php if (!empty($document->mime)) { ?>
<h3><?php echo _('MIME') ?></h3>
<div><?php echo $document->mime ?></div>
<?php } ?>
<?php if (!empty($document->size)) { ?>
<h3><?php echo _('Size') ?></h3>
<div><?php echo sprintf('%s bytes', number_format($document->size)) ?></div>
<?php } ?>
<?php if (!empty($document->time)) { ?>
<h3><?php echo _('Time') ?></h3>
<div><?php echo date('c', $document->time) ?></div>
<?php } ?>
<?php if ($snaps) { ?>
<h3><?php echo _('Snaps') ?></h3>
<ul>
<?php foreach ($snaps as $source => $snap) { ?>
<li>
<?php echo $source ?>
<ul>
<?php foreach ($snap as $file) { ?>
<li>
<a rel="nofollow" href="api.php?action=snap&method=download&source=<?php echo $file->source ?>&id=<?php echo $file->id ?>&time=<?php echo $file->time ?>">
<?php echo sprintf('%s (tar.gz / %s bytes)', date('c', $file->time), number_format($file->size)) ?>
</a>
</li>
<?php } ?>
</ul>
</li>
<?php } ?>
</ul>
<?php } ?>
<?php if (!empty($document->body)) { ?>
<h3><?php echo _('Cache') ?></h3>
<pre><?php echo htmlentities($document->body) ?></pre>
<?php } ?>
<?php if ($config->webui->index->enabled) { ?>
<h3><?php echo _('Index') ?></h3>
<div>
<?php if ($document->get('index')) { ?>
<?php echo sprintf(_('Request sent at %s'), date('c', $document->get('index'))) ?>
<?php } else { ?>
<img src="<?php echo $captcha->inline(100) ?>" alt="captcha" />
<form name="index" method="POST" action="explore.php?i=<?php echo $document->getId() ?>">
<fieldset>
<input type="text"
name="captcha"
value=""
placeholder="<?php echo _('Code on picture'); ?>"
autocomplete="off" />
<button type="submit">
<?php echo _('Request') ?>
</button>
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="white" viewBox="0 0 16 16">
<path d="M11.534 7h3.932a.25.25 0 0 1 .192.41l-1.966 2.36a.25.25 0 0 1-.384 0l-1.966-2.36a.25.25 0 0 1 .192-.41m-11 2h3.932a.25.25 0 0 0 .192-.41L2.692 6.23a.25.25 0 0 0-.384 0L.342 8.59A.25.25 0 0 0 .534 9"/>
<path fill-rule="evenodd" d="M8 3c-1.552 0-2.94.707-3.857 1.818a.5.5 0 1 1-.771-.636A6.002 6.002 0 0 1 13.917 7H12.9A5 5 0 0 0 8 3M3.1 9a5.002 5.002 0 0 0 8.757 2.182.5.5 0 1 1 .771.636A6.002 6.002 0 0 1 2.083 9z"/>
</svg>
</sub>
</button>
</fieldset>
</form>
<?php } ?>
</div>
<?php } ?>
</div>
<?php } else { ?>
<div>
<?php echo _('Index not found') ?>
</div>
<?php } ?>
</main>
</body>
</html>

336
src/webui/index.php

@ -1,336 +0,0 @@ @@ -1,336 +0,0 @@
<?php
// Debug
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Show totals in placeholder
// Init
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Get totals
$total = $index->search('')
->option('cutoff', 0)
->limit(0)
->get()
->getTotal();
$placeholder = sprintf(
_('Search in %s documents %s'),
number_format(
$total
),
$config->webui->search->index->request->url->enabled ? _('or enter new address to crawl...') : false
);
?>
<!DOCTYPE html>
<html lang="<?php echo _('en-US') ?>">
<head>
<title><?php echo _('Yo! Web Search Engine') ?></title>
<meta charset="utf-8" />
<meta name="description" content="<?php echo _('Yo! Micro Web Crawler in PHP & Manticore') ?>" />
<meta name="keywords" content="<?php echo _('web, search, engine, crawler, manticore, yggdrasil, js-less, open source') ?>" />
<style>
* {
border: 0;
margin: 0;
padding: 0;
font-family: Sans-serif;
color: #ccc;
}
body {
background-color: #2e3436;
}
h1 {
color: #fff;
font-weight: normal;
font-size: 36px;
margin: 16px 0
}
form {
display: block;
max-width: 640px;
margin: 16% auto;
text-align: center;
}
input,
input:-webkit-autofill,
input:-webkit-autofill:focus {
transition: background-color 0s 600000s, color 0s 600000s; /* chrome */
width: 100%;
margin: 8px 0;
padding: 12px 0;
border-radius: 32px;
background-color: #000;
color: #fff;
font-size: 16px;
text-align: center;
}
input:hover {
background-color: #111
}
input:focus {
outline: none;
background-color: #111
}
input:focus::placeholder {
color: #090808;
}
button {
margin: 22px 0;
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
background-color: #3394fb;
color: #fff;
font-size: 14px;
}
button:hover {
background-color: #4b9df4;
}
footer {
position: fixed;
bottom: 0;
left:0;
right: 0;
text-align: center;
padding: 24px;
color: #9ba2ac;
font-size: 12px;
}
footer > a,
footer > a:visited,
footer > a:active {
color: #9ba2ac;
font-size: 12px;
}
footer > a > svg,
footer > a:visited > svg,
footer > a:active > svg {
fill: #9ba2ac;
}
footer > a:hover {
color: #54a3f7;
}
footer > a:hover svg {
fill: #54a3f7;
}
footer > a,
footer > a:visited,
footer > a:active {
text-decoration: none;
}
/*
* CSS animation
* by https://codepen.io/alvarotrigo/pen/GRvYNax
*/
main {
background: #2e3436;
background: -webkit-linear-gradient(to left, #8f94fb, #4e54c8);
width: 100%;
}
ul {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
overflow: hidden;
z-index:-1
}
li {
position: absolute;
display: block;
list-style: none;
width: 20px;
height: 20px;
background: rgba(255, 255, 255, 0.2);
animation: animate 25s linear infinite;
bottom: -150px;
}
li:nth-child(1) {
left: 25%;
width: 80px;
height: 80px;
animation-delay: 0s;
}
li:nth-child(2) {
left: 10%;
width: 20px;
height: 20px;
animation-delay: 2s;
animation-duration: 12s;
}
li:nth-child(3) {
left: 70%;
width: 20px;
height: 20px;
animation-delay: 4s;
}
li:nth-child(4) {
left: 40%;
width: 60px;
height: 60px;
animation-delay: 0s;
animation-duration: 18s;
}
li:nth-child(5) {
left: 65%;
width: 20px;
height: 20px;
animation-delay: 0s;
}
li:nth-child(6) {
left: 75%;
width: 110px;
height: 110px;
animation-delay: 3s;
}
li:nth-child(7) {
left: 35%;
width: 150px;
height: 150px;
animation-delay: 7s;
}
li:nth-child(8) {
left: 50%;
width: 25px;
height: 25px;
animation-delay: 15s;
animation-duration: 45s;
}
li:nth-child(9) {
left: 20%;
width: 15px;
height: 15px;
animation-delay: 2s;
animation-duration: 35s;
}
li:nth-child(10) {
left: 85%;
width: 150px;
height: 150px;
animation-delay: 0s;
animation-duration: 11s;
}
@keyframes animate {
0%{
transform: translateY(0) rotate(0deg);
opacity: 1;
border-radius: 0;
}
100%{
transform: translateY(-1000px) rotate(720deg);
opacity: 0;
border-radius: 50%;
}
}
</style>
</head>
<body>
<header>
<form name="search" method="GET" action="search.php">
<h1><?php echo _('Yo!') ?></h1>
<input type="text" name="q" placeholder="<?php echo $placeholder ?>" value="" />
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="white" class="bi bi-search" viewBox="0 0 16 16">
<path d="M11.742 10.344a6.5 6.5 0 1 0-1.397 1.398h-.001c.03.04.062.078.098.115l3.85 3.85a1 1 0 0 0 1.415-1.414l-3.85-3.85a1.007 1.007 0 0 0-.115-.1zM12 6.5a5.5 5.5 0 1 1-11 0 5.5 5.5 0 0 1 11 0"/>
</svg>
</sub>
&nbsp;
<?php echo _('Search'); ?>
</button>
</form>
</header>
<!-- css animation : begin -->
<main>
<ul>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
</ul>
</main>
<!-- css animation : end -->
<footer>
<?php foreach ($config->webui->footer->links as $i => $link) { ?>
<?php if ($i) echo '|' ?>
<a <?php foreach ($link->attributes as $name => $value) { echo sprintf(' %s="%s"', $name, $value); } ?>>
<?php echo _($link->text) ?>
</a>
<?php foreach ($link->index as $index) { ?>
<a rel="nofollow" href="<?php echo $index ?>" title="<?php echo sprintf(_('Download %s database'), $link->text) ?>">
<svg xmlns="http://www.w3.org/2000/svg" width="11" height="11" viewBox="0 0 16 16">
<path d="M12.5 9a3.5 3.5 0 1 1 0 7 3.5 3.5 0 0 1 0-7m.354 5.854 1.5-1.5a.5.5 0 0 0-.708-.708l-.646.647V10.5a.5.5 0 0 0-1 0v2.793l-.646-.647a.5.5 0 0 0-.708.708l1.5 1.5a.5.5 0 0 0 .708 0ZM8 1c-1.573 0-3.022.289-4.096.777C2.875 2.245 2 2.993 2 4s.875 1.755 1.904 2.223C4.978 6.711 6.427 7 8 7s3.022-.289 4.096-.777C13.125 5.755 14 5.007 14 4s-.875-1.755-1.904-2.223C11.022 1.289 9.573 1 8 1"/>
<path d="M2 7v-.839c.457.432 1.004.751 1.49.972C4.722 7.693 6.318 8 8 8s3.278-.307 4.51-.867c.486-.22 1.033-.54 1.49-.972V7c0 .424-.155.802-.411 1.133a4.51 4.51 0 0 0-4.815 1.843A12.31 12.31 0 0 1 8 10c-1.573 0-3.022-.289-4.096-.777C2.875 8.755 2 8.007 2 7m6.257 3.998L8 11c-1.682 0-3.278-.307-4.51-.867-.486-.22-1.033-.54-1.49-.972V10c0 1.007.875 1.755 1.904 2.223C4.978 12.711 6.427 13 8 13h.027a4.552 4.552 0 0 1 .23-2.002m-.002 3L8 14c-1.682 0-3.278-.307-4.51-.867-.486-.22-1.033-.54-1.49-.972V13c0 1.007.875 1.755 1.904 2.223C4.978 15.711 6.427 16 8 16c.536 0 1.058-.034 1.555-.097a4.507 4.507 0 0 1-1.3-1.905"/>
</svg>
</a>
<?php } ?>
<?php } ?>
</footer>
</body>
</html>

514
src/webui/search.php

@ -1,514 +0,0 @@ @@ -1,514 +0,0 @@
<?php
// Debug
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Init
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Get totals
$total = $index->search('')
->option('cutoff', 0)
->limit(0)
->get()
->getTotal();
$placeholder = sprintf(
_('Search in %s documents %s'),
number_format(
$total
),
$config->webui->search->index->request->url->enabled ? _('or enter new address to crawl...') : false
);
$response = false;
// Request
$q = !empty($_GET['q']) ? trim($_GET['q']) : '';
$p = !empty($_GET['p']) ? (int) $_GET['p'] : 1;
// Register new URL by request on enabled
if ($config->webui->search->index->request->url->enabled && filter_var($q, FILTER_VALIDATE_URL))
{
if (preg_match($config->webui->search->index->request->url->regex, $q))
{
// Prepare URL
$url = $q;
$crc32url = crc32($url);
// Check URL for exist
$exist = $index->search('')
->filter('id', $crc32url)
->limit(1)
->get()
->getTotal();
if ($exist)
{
/* disable as regular search request possible
$response = sprintf(
_('URL "%s" exists in search index'),
htmlentities($q)
);
*/
}
// Add URL
else
{
// @TODO check http code
$index->addDocument(
[
'url' => $url,
'rank' => (int) mb_strlen(
(string)
urldecode(
(string)
parse_url(
$url,
PHP_URL_PATH
)
)
)
],
$crc32url
);
$response = sprintf(
_('URL "%s" added to the crawl queue!'),
htmlentities($q)
);
}
}
else {
$response = sprintf(
_('URL "%s" does not match node settings!'),
htmlentities($q)
);
}
}
// Extended corrections
switch (true)
{
// Empty query
case empty($q):
$query = $index->search('')->sort('RAND()');
break;
// URL request
case filter_var($q, FILTER_VALIDATE_URL):
$query = $index->search('')->filter('id', crc32($q));
break;
default:
// Allow raw requests on extended syntax mode requested
// http://sphinxsearch.com/docs/current/extended-syntax.html
if (isset($_GET['e']) && $config->webui->search->extended->enabled)
{
$query = $index->search($q);
}
// Regular request
else
{
$query = $index->search(
@\Manticoresearch\Utils::escape(
$q
)
);
}
}
// Apply search options (e.g. field_weights)
foreach ($config->webui->search->options as $key => $value)
{
if (is_int($value) || is_string($value))
{
$query->option(
$key,
$value
);
}
else
{
$query->option(
$key,
(array) $value
);
}
}
// Apply highlight options
if ($config->webui->search->highlight->fields)
{
$query->highlight(
(array) $config->webui->search->highlight->fields,
(array) $config->webui->search->highlight->options
);
}
// Get found
$found = empty($q) ? $total : $query->get()->getTotal();
// Search request begin
$results = $query->offset($p * $config->webui->pagination->limit - $config->webui->pagination->limit)
->limit($config->webui->pagination->limit)
->get();
?>
<!DOCTYPE html>
<html lang="<?php echo _('en-US'); ?>">
<head>
<title><?php echo sprintf(_('Yo! %s'), htmlentities($q)) ?></title>
<meta charset="utf-8" />
<meta name="keywords" content="<?php echo htmlentities($q) ?>" />
<style>
* {
border: 0;
margin: 0;
padding: 0;
font-family: Sans-serif;
color: #ccc;
}
body {
background-color: #2e3436;
word-break: break-word;
}
header {
background-color: #34393b;
position: fixed;
top: 0;
left: 0;
right: 0;
z-index: 2;
}
main {
margin-top: 80px;
margin-bottom: 76px;
padding: 0 32px;
}
main > div {
border-top: 1px #000 dashed;
font-size: 14px;
margin: 0 auto;
max-width: 620px;
padding: 8px 0;
position: relative;
}
main > div > img {
left: -24px;
position: absolute;
top: 18px;
}
main > div > div {
padding: 8px 0;
line-height: 16px;
}
main > div > div > a {
font-size: 12px;
}
h1 {
position: fixed;
top: 2px;
left: 24px;
}
h1 > a,
h1 > a:visited,
h1 > a:active,
h1 > a:hover {
color: #fff;
font-weight: normal;
font-size: 22px;
margin: 0;
text-decoration: none;
}
h2 {
display: block;
font-size: 15px;
font-weight: normal;
color: #fff;
}
form {
display: block;
max-width: 678px;
margin: 0 auto;
text-align: center;
}
input[type="checkbox"] {
accent-color: #3394fb;
}
input[type="text"],
input[type="text"]:-webkit-autofill,
input[type="text"]:-webkit-autofill:focus {
transition: background-color 0s 600000s, color 0s 600000s; /* chrome */
width: 100%;
margin: 12px 0;
padding: 6px 0;
border-radius: 32px;
background-color: #000;
color: #fff;
font-size: 15px;
text-align: center;
}
input[type="text"]:hover {
background-color: #111
}
input[type="text"]:focus {
outline: none;
background-color: #111
}
input[type="text"]:focus::placeholder {
color: #090808
}
label {
font-size: 14px;
position: absolute;
right: 80px;
top: 18px;
}
label > input {
width: auto;
margin: 0 4px;
}
button {
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
background-color: #3394fb;
color: #fff;
font-size: 14px;
position: fixed;
top: 12px;
right: 24px;
}
button:hover {
background-color: #4b9df4;
}
a, a:visited, a:active {
color: #9ba2ac;
}
a:hover {
color: #54a3f7;
}
span {
display: block;
margin: 8px 0;
}
.text-warning {
color: #db6161;
fill: #db6161;
}
</style>
</head>
<body>
<header>
<form name="search" method="GET" action="search.php">
<h1><a href="./"><?php echo _('Yo!') ?></a></h1>
<input type="text" name="q" placeholder="<?php echo $placeholder ?>" value="<?php echo htmlentities($q) ?>" />
<?php if ($config->webui->search->extended->enabled) { ?>
<label for="e">
<input type="checkbox" name="e" id="e" value="true" <?php echo isset($_GET['e']) ? 'checked="checked"': false ?>/>
<?php echo _('Extended') ?>
</label>
<?php } ?>
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="white" class="bi bi-search" viewBox="0 0 16 16">
<path d="M11.742 10.344a6.5 6.5 0 1 0-1.397 1.398h-.001c.03.04.062.078.098.115l3.85 3.85a1 1 0 0 0 1.415-1.414l-3.85-3.85a1.007 1.007 0 0 0-.115-.1zM12 6.5a5.5 5.5 0 1 1-11 0 5.5 5.5 0 0 1 11 0"/>
</svg>
</sub>
</button>
</form>
</header>
<main>
<?php if (isset($_GET['e']) && $config->webui->search->extended->enabled) { ?>
<div>
<p>
<?php echo _('Extended syntax enabled, follow') ?>
<a href="http://sphinxsearch.com/docs/current/extended-syntax.html" rel="nofollow" target="_blank"><?php echo _('Documentation') ?></a>
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="11" height="11" fill="currentColor" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M8.636 3.5a.5.5 0 0 0-.5-.5H1.5A1.5 1.5 0 0 0 0 4.5v10A1.5 1.5 0 0 0 1.5 16h10a1.5 1.5 0 0 0 1.5-1.5V7.864a.5.5 0 0 0-1 0V14.5a.5.5 0 0 1-.5.5h-10a.5.5 0 0 1-.5-.5v-10a.5.5 0 0 1 .5-.5h6.636a.5.5 0 0 0 .5-.5"/>
<path fill-rule="evenodd" d="M16 .5a.5.5 0 0 0-.5-.5h-5a.5.5 0 0 0 0 1h3.793L6.146 9.146a.5.5 0 1 0 .708.708L15 1.707V5.5a.5.5 0 0 0 1 0z"/>
</svg>
</sub>
</p>
<p>
<?php echo _('Available fields:') ?>
<i>@title</i>
<i>@description</i>
<i>@keywords</i>
<i>@mime</i>
<i>@url</i>
</p>
</div>
<?php } ?>
<?php if ($response) { ?>
<div>
<?php echo $response ?>
</div>
<?php } ?>
<div>
<?php echo sprintf(_('Found: %s'), number_format($found)) ?>
</div>
<?php foreach ($results as $result) { ?>
<div>
<?php
$hostname = parse_url(
$result->url,
PHP_URL_HOST
);
$identicon = new \Jdenticon\Identicon();
$identicon->setValue(
$hostname
);
$identicon->setSize(14);
$identicon->setStyle(
[
'backgroundColor' => 'rgba(255, 255, 255, 0)',
'padding' => 0
]
);
$icon = $identicon->getImageDataUri('webp');
?>
<img src="<?php echo $icon ?>" title="<?php echo $hostname ?>" alt="identicon" />
<?php if (!empty($result->getHighlight()['title'])) { ?>
<div>
<h2>
<?php foreach ($result->getHighlight()['title'] as $title) { ?>
<p><?php echo $title ?></p>
<?php } ?>
</h2>
</div>
<?php } else if (!empty($result->title)) { ?>
<div>
<h2><?php echo $result->title ?></h2>
</div>
<?php } ?>
<?php if (!empty($result->getHighlight()['description'])) { ?>
<div>
<?php foreach ($result->getHighlight()['description'] as $description) { ?>
<p><?php echo $description ?></p>
<?php } ?>
</div>
<?php } else if (!empty($result->description)) { ?>
<div>
<?php echo $result->description ?>
</div>
<?php } ?>
<?php if (!empty($result->getHighlight()['keywords'])) { ?>
<div>
<?php foreach ($result->getHighlight()['keywords'] as $keywords) { ?>
<p><?php echo $keywords ?></p>
<?php } ?>
</div>
<?php } else if (!empty($result->keywords)) { ?>
<div>
<?php echo $result->keywords ?>
</div>
<?php } ?>
<?php if (!empty($result->getHighlight()['body'])) { ?>
<div>
<?php foreach ($result->getHighlight()['body'] as $body) { ?>
<p><?php echo $body ?></p>
<?php } ?>
</div>
<?php } ?>
<div>
<?php if (!empty($result->getHighlight()['url'])) { ?>
<?php foreach ($result->getHighlight()['url'] as $url) { ?>
<a href="<?php echo $result->url ?>"><?php echo urldecode($url) ?></a>
<?php } ?>
<?php } else if (!empty($result->title)) { ?>
<a href="<?php echo $result->url ?>"><?php echo htmlentities(urldecode($result->url)) ?></a>
<?php } ?>
<?php if (!in_array($result->get('code'), [0, 200])) { ?>
<small>&bull;</small>
<svg xmlns="http://www.w3.org/2000/svg" width="10" height="10" class="text-warning" viewBox="0 0 16 16">
<path d="m9.97 4.88.953 3.811C10.159 8.878 9.14 9 8 9c-1.14 0-2.158-.122-2.923-.309L6.03 4.88C6.635 4.957 7.3 5 8 5s1.365-.043 1.97-.12m-.245-.978L8.97.88C8.718-.13 7.282-.13 7.03.88L6.275 3.9C6.8 3.965 7.382 4 8 4c.618 0 1.2-.036 1.725-.098zm4.396 8.613a.5.5 0 0 1 .037.96l-6 2a.5.5 0 0 1-.316 0l-6-2a.5.5 0 0 1 .037-.96l2.391-.598.565-2.257c.862.212 1.964.339 3.165.339s2.303-.127 3.165-.339l.565 2.257 2.391.598"/>
</svg>
<small><?php echo $result->get('code') ?></small>
<?php } ?>
<small>&bull;</small>
<a rel="nofollow" href="explore.php?i=<?php echo $result->getId() ?>"><?php echo _('explore') ?></a>
</div>
</div>
<?php } ?>
<?php if ($p * $config->webui->pagination->limit <= $results->getTotal()) { ?>
<div>
<div>
<a href="search.php?q=<?php echo urlencode(htmlentities($q)) ?>&p=<?php echo $p + 1 ?>">
<?php echo _('More') ?>
</a>
</div>
</div>
<?php } ?>
</main>
</body>
</html>
Loading…
Cancel
Save