Browse Source

init gemini protocol implementation

gemini
yggverse 8 months ago
parent
commit
1f96ca8a2c
  1. 63
      README.md
  2. 7
      composer.json
  3. 200
      example/config.json
  4. 2
      src/cli/document/clean.php
  5. 401
      src/cli/document/crawl.php
  6. 8
      src/cli/index/init.php
  7. 177
      src/cli/yggo/import.php
  8. 271
      src/webui/api.php
  9. 563
      src/webui/explore.php
  10. 336
      src/webui/index.php
  11. 514
      src/webui/search.php

63
README.md

@ -2,26 +2,24 @@
Micro Web Crawler in PHP & Manticore Micro Web Crawler in PHP & Manticore
Yo! is the super thin layer for Manticore search server that extends official [manticoresearch-php](https://github.com/manticoresoftware/manticoresearch-php) client with CLI tools and simple JS-less WebUI. Yo! Gemini is the super thin layer for Manticore search server that extends official [manticoresearch-php](https://github.com/manticoresoftware/manticoresearch-php) client with CLI tools and Gemini protocol UI.
This branch contain implementation for [Gemini Protocol](https://geminiprotocol.net).
To use `HTTP` version, please checkout [main branch](https://github.com/YGGverse/Yo)!
## Features ## Features
* MIME-based crawler with flexible filter settings by regular expressions, selectors, external links etc * MIME-based crawler with flexible filter settings by regular expressions, selectors, external links etc
* Page snap history with local and remote mirrors support (including FTP protocol) * Page snap history with local and remote mirrors support (including FTP protocol)
* CLI tools for index administration and crontab tasks * CLI tools for index administration and crontab tasks
* JS-less frontend to run local or public search web portal * Gemini Protocol UI (coming soon)
* API tools to make search index distributed
## Components ## Components
* [Manticore Server](https://github.com/manticoresoftware/manticoresearch) * [Manticore Server](https://github.com/manticoresoftware/manticoresearch)
* [PHP library for Manticore](https://github.com/manticoresoftware/manticoresearch-php) * [PHP library for Manticore](https://github.com/manticoresoftware/manticoresearch-php)
* [Symfony DOM crawler](https://github.com/symfony/dom-crawler)
* [Symfony CSS selector](https://github.com/symfony/css-selector)
* [FTP client for snap mirrors](https://github.com/YGGverse/ftp-php) * [FTP client for snap mirrors](https://github.com/YGGverse/ftp-php)
* [Hostname ident icons](https://github.com/dmester/jdenticon-php)
* [Captcha](https://github.com/Gregwar/Captcha)
* [Bootstrap icons](https://icons.getbootstrap.com/)
### Install ### Install
@ -32,22 +30,23 @@ Yo! is the super thin layer for Manticore search server that extends official [m
* `wget https://repo.manticoresearch.com/manticore-repo.noarch.deb` * `wget https://repo.manticoresearch.com/manticore-repo.noarch.deb`
* `dpkg -i manticore-repo.noarch.deb` * `dpkg -i manticore-repo.noarch.deb`
* `apt update` * `apt update`
* `apt install git composer manticore manticore-extra php-fpm php-curl php-mbstring php-gd` * `apt install git composer manticore manticore-extra php-fpm php-mbstring`
Yo search engine uses Manticore as the primary database. If your server sensitive to power down, Yo search engine uses Manticore as the primary database. If your server sensitive to power down,
change default [binlog flush strategy](https://manual.manticoresearch.com/Logging/Binary_logging#Binary-flushing-strategies) to `binlog_flush = 1` change default [binlog flush strategy](https://manual.manticoresearch.com/Logging/Binary_logging#Binary-flushing-strategies) to `binlog_flush = 1`
#### Deployment #### Deployment
Project in development, to create new search project, use `dev-main` branch: * `git clone https://github.com/YGGverse/Yo.git`
* `cd Yo`
* `composer create-project yggverse/yo:dev-main` * `git checkout gemini`
* `composer update`
#### Development #### Development
* `git clone https://github.com/YGGverse/Yo.git` * `git clone https://github.com/YGGverse/Yo.git`
* `cd Yo` * `cd Yo`
* `composer update` * `git checkout gemini`
* `git checkout -b pr-branch` * `git checkout -b pr-branch`
* `git commit -m 'new fix'` * `git commit -m 'new fix'`
* `git push` * `git push`
@ -69,11 +68,9 @@ Project in development, to create new search project, use `dev-main` branch:
* `php src/cli/document/crawl.php` * `php src/cli/document/crawl.php`
* `php src/cli/document/search.php '*'` * `php src/cli/document/search.php '*'`
#### Web UI #### Gemini UI
1. `cd src/webui` Coming soon..
2. `php -S 127.0.0.1:8080`
3. open `http://127.0.0.1:8080` in browser
## Documentation ## Documentation
@ -134,27 +131,6 @@ php src/cli/document/search.php '@title "*"' [limit]
* `query` - required * `query` - required
* `limit` - optional search results limit * `limit` - optional search results limit
##### Migration
###### YGGo
Import index from YGGo database
```
php src/cli/yggo/import.php 'host' 'port' 'user' 'password' 'database' [unique=off] [start=0] [limit=100]
```
Source DB fields required:
* `host`
* `port`
* `user`
* `password`
* `database`
* `unique` - optional, check for unique URL (takes more time)
* `start` - optional, offset to start queue
* `limit` - optional, limit queue
### Backup ### Backup
#### Logical #### Logical
@ -171,13 +147,4 @@ Better for infrastructure administration and includes original data binaries.
## Instances ## Instances
### [Yggdrasil](https://github.com/yggdrasil-network) Coming soon..
* `http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/` - IPv6 `0200::/7` addresses only | [index](http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/index.sql)
### [Alfis DNS](https://github.com/Revertron/Alfis)
* `http://yo.ygg` - `.ygg` domain zone search only | [index](http://yo.ygg/index.sql)
* `http://ygg.yo.index` - alias of `http://yo.ygg` | [index](http://ygg.yo.index/index.sql)
_*`*.yo.index` reserved for domain-oriented instances e.g. `.btn`, `.conf`, `.mirror` - feel free to request the address_

7
composer.json

@ -15,11 +15,8 @@
], ],
"require": { "require": {
"manticoresoftware/manticoresearch-php": "^3.1", "manticoresoftware/manticoresearch-php": "^3.1",
"symfony/css-selector": "^6.3",
"symfony/dom-crawler": "^6.3",
"jdenticon/jdenticon": "^1.0",
"yggverse/ftp": "^1.0", "yggverse/ftp": "^1.0",
"gregwar/captcha": "^1.2", "yggverse/net": "^1.2",
"yggverse/net": "^1.2" "yggverse/gemini": "^0.4.0"
} }
} }

200
example/config.json

@ -21,7 +21,7 @@
} }
} }
}, },
"webui": "gui":
{ {
"pagination": "pagination":
{ {
@ -35,7 +35,7 @@
{ {
"url":{ "url":{
"enabled":false, "enabled":false,
"regex":"/.*/ui" "regex":"/^gemini:\/\/.*/ui"
} }
} }
}, },
@ -59,9 +59,9 @@
"fields": "fields":
[ [
"url", "url",
"title", "h1",
"description", "h2",
"keywords", "h3",
"body" "body"
], ],
"options": "options":
@ -71,57 +71,6 @@
} }
} }
}, },
"footer":
{
"links":
[
{
"text":"0200::/7",
"attributes":
{
"title":"Search in 0200::/7 IPv6",
"href":"http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/"
},
"index":
[
"http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/index.sql"
]
},
{
"text":"yo.ygg",
"attributes":
{
"title":"Search in .ygg zone",
"href":"http://yo.ygg"
},
"index":
[
"http://yo.ygg/index.sql"
]
},
{
"text":"ygg.yo.index",
"attributes":
{
"title":"Search in .ygg zone",
"href":"http://ygg.yo.index"
},
"index":
[
"http://ygg.yo.index/index.sql"
]
},
{
"text":"GitHub",
"attributes":
{
"title":"Source code",
"href":"https://github.com/YGGverse/Yo"
},
"index":[]
}
]
},
"index": "index":
{ {
"enabled":true "enabled":true
@ -161,119 +110,30 @@
"timeout":5, "timeout":5,
"socket": "socket":
{ {
"201:5eb5:f061:678e:7565:6338:c02c:5251":80 "8.8.8.8":80
} }
} }
}, },
"curl":
{
"connection": "connection":
{ {
"timeout":3 "timeout":3,
}, "length":1048576,
"download": "chunk":1
{
"size":
{
"max":10000024
}
}
}, },
"queue": "queue":
{ {
"limit":1, "limit":1,
"delay":1 "delay":1
}, },
"selector": "url":
{
"a:not([rel=nofollow])":
{
"attribute":"href",
"external":false,
"regex":"/.*/ui"
},
"image":
{
"attribute":"src",
"external":false,
"regex":"/.*/ui"
},
"audio":
{
"attribute":"src",
"external":false,
"regex":"/.*/ui"
},
"video":
{
"attribute":"src",
"external":false,
"regex":"/.*/ui"
},
"script":
{ {
"attribute":"href", "external":true,
"external":false, "regex":"/^gemini:\/\/.*/ui",
"regex":"/.*/ui"
}
},
"skip": "skip":
{ {
"stripos": "stripos":
{
"url":
[ [
"#", "?"
"?",
"javascript:",
"mailto:",
"magnet:",
"xmpp:",
"/commit",
"/diff",
"/print",
"/raw",
"/cache",
"/download",
"/share",
"/explore",
"/register",
"/login",
"/password",
"/forgot",
"/restore",
"/account",
"/reply",
"/read",
"/compose",
"/comment",
"/add",
"/edit",
"/delete",
"/quote",
"/report",
"/export",
"/import",
"/mobile",
"/mwiki",
"/branch",
"/block",
"/transaction",
"/search",
"/tag",
"/page",
"/sort",
"/order",
"/pdf",
"/fb2",
"/mobi",
"/epub",
"/djvu",
"/_detail",
"/_media",
"/t/",
"/q/",
"/s/"
] ]
} }
}, },
@ -297,28 +157,21 @@
"directory":"storage/snap", "directory":"storage/snap",
"size": "size":
{ {
"max":10000024 "max":1048576
}, },
"mime": "meta":
{ {
"stripos": "stripos":
[ [
"application/xhtml+xml", "text/gemini",
"application/javascript", "image/"
"text/html",
"text/plain",
"text/css",
"image/webp",
"image/png",
"image/gif",
"image/ico"
] ]
}, },
"url": "url":
{ {
"stripos": "stripos":
[ [
"http" "gemini://"
] ]
} }
}, },
@ -345,28 +198,21 @@
}, },
"size": "size":
{ {
"max":10000024 "max":1048576
}, },
"mime": "meta":
{ {
"stripos": "stripos":
[ [
"application/xhtml+xml", "text/gemini",
"application/javascript", "image/"
"text/html",
"text/plain",
"text/css",
"image/webp",
"image/png",
"image/gif",
"image/ico"
] ]
}, },
"url": "url":
{ {
"stripos": "stripos":
[ [
"http" "gemini://"
] ]
} }
} }

2
src/cli/document/clean.php

@ -39,7 +39,7 @@ $index = $client->index(
// Apply new configuration rules // Apply new configuration rules
echo _('apply new configuration rules...') . PHP_EOL; echo _('apply new configuration rules...') . PHP_EOL;
foreach ($config->cli->document->crawl->skip->stripos->url as $condition) foreach ($config->cli->document->crawl->url->skip->stripos as $condition)
{ {
echo sprintf( echo sprintf(
_('cleanup documents with url that contain substring "%s"...') . PHP_EOL, _('cleanup documents with url that contain substring "%s"...') . PHP_EOL,

401
src/cli/document/crawl.php

@ -6,7 +6,7 @@ $microtime = microtime(true);
// Load dependencies // Load dependencies
require_once __DIR__ . '/../../../vendor/autoload.php'; require_once __DIR__ . '/../../../vendor/autoload.php';
// Define helpers // Define helpers @TODO move to separated library (yo-php)
function getLastSnapTime(array $files): int function getLastSnapTime(array $files): int
{ {
$time = []; $time = [];
@ -37,6 +37,40 @@ function getLastSnapTime(array $files): int
return 0; return 0;
} }
function relative2absolute(
string $source, // current document url to grab the base
string $target, // relative or absolute link
?string &$scheme = null,
?string &$host = null,
?int &$port = null
) {
if (!parse_url($target, PHP_URL_HOST))
{
$scheme = parse_url($base, PHP_URL_SCHEME);
$host = parse_url($base, PHP_URL_HOST);
$port = parse_url($base, PHP_URL_PORT);
return $scheme . '://' . $host . ($port ? ':' . $port : null) .
'/' .
trim(
ltrim(
str_replace(
[
'./',
'../'
],
'',
$target
),
'/'
),
'.'
);
}
return $target;
}
// Init config // Init config
$config = json_decode( $config = json_decode(
file_get_contents( file_get_contents(
@ -183,12 +217,12 @@ foreach($index->search('')
$data = $data =
[ [
'url' => $document->get('url'), 'url' => $document->get('url'),
'title' => $document->get('title'), 'h1' => $document->get('h1'),
'description' => $document->get('description'), 'h2' => $document->get('h2'),
'keywords' => $document->get('keywords'), 'h3' => $document->get('h3'),
'code' => $document->get('code'), 'code' => $document->get('code'),
'size' => $document->get('size'), 'size' => $document->get('size'),
'mime' => $document->get('mime'), 'meta' => $document->get('meta'),
'rank' => $document->get('rank'), 'rank' => $document->get('rank'),
'time' => $time, 'time' => $time,
'index' => 0 'index' => 0
@ -205,114 +239,50 @@ foreach($index->search('')
); );
} }
// Update index time anyway and set reset code to 404 // Update index time anyway and set reset code to 51
$index->updateDocument( $index->updateDocument(
[ [
'time' => time(), 'time' => time(),
'code' => 200, 'code' => 20,
'index' => 0 'index' => 0
], ],
$document->getId() $document->getId()
); );
// Request remote URL // Request remote URL
$request = curl_init( $request = new \Yggverse\Gemini\Client\Request(
$document->get('url') $document->get('url')
); );
// Drop URL with long response $response = new \Yggverse\Gemini\Client\Response(
curl_setopt( $request->getResponse(
$request, $config->cli->document->crawl->connection->timeout,
CURLOPT_CONNECTTIMEOUT, $config->cli->document->crawl->connection->length,
$config->cli->document->crawl->curl->connection->timeout $config->cli->document->crawl->connection->chunk,
); $length
)
curl_setopt(
$request,
CURLOPT_TIMEOUT,
$config->cli->document->crawl->curl->connection->timeout
);
// Prevent huge content download e.g. media streams URL
curl_setopt(
$request,
CURLOPT_RETURNTRANSFER,
true
);
curl_setopt(
$request,
CURLOPT_NOPROGRESS,
false
);
curl_setopt(
$request,
CURLOPT_PROGRESSFUNCTION,
function(
$download,
$downloaded,
$upload,
$uploaded
) {
global $config;
global $index;
global $document;
$index->updateDocument(
[
'time' => time(),
'code' => 200,
'index' => 0
],
$document->getId()
);
return $downloaded > $config->cli->document->crawl->curl->download->size->max ? 1 : 0;
}
); );
// Begin request // Begin request
if ($response = curl_exec($request)) if ($code = $request->getCode()) // @TODO process redirects
{
// Update HTTP code or skip on empty
if ($code = curl_getinfo($request, CURLINFO_HTTP_CODE))
{
// Delete deprecated document from index as HTTP code still not 200
/*
if ($code != 200 && !empty($data['code']) && $data['code'] != 200)
{ {
$index->deleteDocument( // Update status code
$document->getId()
);
continue;
}
*/
$data['code'] = $code; $data['code'] = $code;
} else continue;
// Update size or skip on empty // Update size or skip on empty
if ($size = curl_getinfo($request, CURLINFO_SIZE_DOWNLOAD)) if ($length)
{ {
$size = round( // float $data['size'] = $length;
$size
);
$data['size'] = $size;
} else continue; } else continue;
// Update MIME type or skip on empty // Update meta or skip on empty
if ($type = curl_getinfo($request, CURLINFO_CONTENT_TYPE)) if ($meta = $response->getMeta())
{ {
$data['mime'] = $type; $data['meta'] = $meta;
// On document charset specified // On document charset specified
if (preg_match('/charset=([^\s;]+)/i', $type, $charset)) if (preg_match('/charset=([^\s;]+)/i', $meta, $charset))
{ {
if (!empty($charset[1])) if (!empty($charset[1]))
{ {
@ -322,10 +292,12 @@ foreach($index->search('')
if (strtolower($charset[1]) == strtolower($encoding)) if (strtolower($charset[1]) == strtolower($encoding))
{ {
// Convert response to UTF-8 // Convert response to UTF-8
$response = mb_convert_encoding( $response->setBody(
$response, mb_convert_encoding(
$response->getBody(),
'UTF-8', 'UTF-8',
$charset[1] $charset[1]
)
); );
break; break;
@ -336,232 +308,92 @@ foreach($index->search('')
} else continue; } else continue;
// DOM crawler // Gemtext parser
if ( if (false !== stripos($response->getMeta(), 'text/gemini'))
false !== stripos($type, 'text/html')
||
false !== stripos($type, 'text/xhtml')
||
false !== stripos($type, 'application/xhtml')
) {
$crawler = new Symfony\Component\DomCrawler\Crawler();
$crawler->addHtmlContent(
$response
);
// Get title
foreach ($crawler->filter('head > title')->each(function($node) {
return $node->text();
}) as $value)
{ {
if (!empty($value)) $body = new \Yggverse\Gemini\Client\Gemtext\Body(
{ $response->getBody()
$data['title'] = trim(
strip_tags(
html_entity_decode(
$value
)
)
); );
}
}
// Get description
foreach ($crawler->filter('head > meta[name="description"]')->each(function($node) {
return $node->attr('content');
}) as $value) // Get H1
$h1 = [];
foreach ($body->getH1() as $value)
{ {
if (!empty($value)) $h1[] = $value;
{
$data['description'] = trim(
strip_tags(
html_entity_decode(
$value
)
)
);
} }
}
// Get keywords
$keywords = [];
// Extract from meta tag
foreach ($crawler->filter('head > meta[name="keywords"]')->each(function($node) {
return $node->attr('content'); $data['h1'] = implode(
}) as $value)
{
if (!empty($value))
{
foreach ((array) explode(
',', ',',
mb_strtolower( array_unique(
strip_tags( $h1
html_entity_decode(
$value
)
) )
)
) as $keyword)
{
// Remove extra spaces
$keyword = trim(
$keyword
); );
// Skip short words // Get H1
if (mb_strlen($keyword) > 2) $h2 = [];
foreach ($body->getH2() as $value)
{ {
$keywords[] = $keyword; $h2[] = $value;
} }
}
}
}
// Get keywords from headers
/* Disable keywords collection from headers as body index enabled
foreach ($crawler->filter('h1,h2,h3,h4,h5,h6')->each(function($node) {
return $node->text(); $data['h2'] = implode(
}) as $value)
{
if (!empty($value))
{
foreach ((array) explode(
',', ',',
mb_strtolower( array_unique(
strip_tags( $h2
html_entity_decode(
$value
)
) )
)
) as $keyword)
{
// Remove extra spaces
$keyword = trim(
$keyword
); );
// Skip short words // Get H3
if (mb_strlen($keyword) > 2) $h3 = [];
foreach ($body->getH3() as $value)
{ {
$keywords[] = $keyword; $h3[] = $value;
}
}
}
} }
*/
// Keep keywords unique $data['h3'] = implode(
$keywords = array_unique( ',',
$keywords array_unique(
$h3
)
); );
// Update previous keywords when new value exists
if ($keywords)
{
$data['keywords'] = implode(',', $keywords);
}
// Save document body text to index // Save document body text to index
foreach ($crawler->filter('html > body')->each(function($node) {
return $node->html();
}) as $value)
{
if (!empty($value))
{
$data['body'] = trim( $data['body'] = trim(
preg_replace( preg_replace(
'/[\s]{2,}/', // strip extra separators '/[\s]{2,}/', // strip extra separators
' ', ' ',
strip_tags( $response->getBody()
str_replace( // make text separators before strip any closing tag, new line, etc
[
'<',
'>',
PHP_EOL,
],
[
' <',
'> ',
PHP_EOL . ' ',
],
preg_replace(
[
'/<script([^>]*)>([\s\S]*?)<\/script>/i', // strip js content
'/<style([^>]*)>([\s\S]*?)<\/style>/i', // strip css content
'/<pre([^>]*)>([\s\S]*?)<\/pre>/i', // strip code content
'/<code([^>]*)>([\s\S]*?)<\/code>/i',
],
'',
html_entity_decode(
$value
)
)
)
)
) )
); );
}
}
// Crawl documents // Crawl links
$documents = []; $documents = [];
$scheme = parse_url($document->get('url'), PHP_URL_SCHEME); foreach ($body->getLinks() as $line)
$host = parse_url($document->get('url'), PHP_URL_HOST);
$port = parse_url($document->get('url'), PHP_URL_PORT);
foreach ($config->cli->document->crawl->selector as $selector => $settings)
{ {
foreach ($crawler->filter($selector)->each(function($node) { $link = new \Yggverse\Gemini\Gemtext\Link(
$line
return $node; );
}) as $value) {
if ($url = $value->attr($settings->attribute)) if ($url = $link->getAddress())
{ {
//Make relative links absolute //Make relative links absolute
if (!parse_url($url, PHP_URL_HOST)) $url = relative2absolute(
{ $document->get('url'),
$url = $scheme . '://' . $host . ($port ? ':' . $port : null) . $url,
'/' . $scheme,
trim( $host,
ltrim( $port,
str_replace(
[
'./',
'../'
],
'',
$url
),
'/'
),
'.'
); );
}
// Regex rules // Regex rules
if (!preg_match($settings->regex, $url)) if (!preg_match($config->cli->document->crawl->url->regex, $url))
{ {
continue; continue;
} }
// External host rules // External host rules
if (!$settings->external && parse_url($url, PHP_URL_HOST) != $host) if (!$config->cli->document->crawl->url->external && parse_url($url, PHP_URL_HOST) != $host)
{ {
continue; continue;
} }
@ -569,7 +401,8 @@ foreach($index->search('')
$documents[] = $url; $documents[] = $url;
} }
} }
}
// @TODO find document links by protocol ($body->findLinks('gemini'))
if ($documents) if ($documents)
{ {
@ -578,7 +411,7 @@ foreach($index->search('')
// Apply stripos condition // Apply stripos condition
$skip = false; $skip = false;
foreach ($config->cli->document->crawl->skip->stripos->url as $condition) foreach ($config->cli->document->crawl->url->skip->stripos as $condition)
{ {
if (false !== stripos($url, $condition)) { if (false !== stripos($url, $condition)) {
@ -597,7 +430,7 @@ foreach($index->search('')
date('c'), date('c'),
$url, $url,
print_r( print_r(
$config->cli->document->crawl->skip->stripos->url, $config->cli->document->crawl->url->skip->stripos,
true true
) )
); );
@ -701,7 +534,7 @@ foreach($index->search('')
} }
// Create snap // Create snap
if ($config->cli->document->crawl->snap->enabled && $code === 200) if ($config->cli->document->crawl->snap->enabled && $request->getCode() === 20)
{ {
try try
{ {
@ -734,12 +567,12 @@ foreach($index->search('')
$snap->addFromString( $snap->addFromString(
'DATA', 'DATA',
$response $response->getBody()
); );
$snap->addFromString( $snap->addFromString(
'MIME', 'META',
$type $response->getMeta()
); );
$snap->addFromString( $snap->addFromString(
@ -767,12 +600,12 @@ foreach($index->search('')
// Copy to local storage on enabled // Copy to local storage on enabled
if ($config->snap->storage->local->enabled) if ($config->snap->storage->local->enabled)
{ {
// Check for mime allowed // Check for meta allowed
$allowed = false; $allowed = false;
foreach ($config->snap->storage->local->mime->stripos as $whitelist) foreach ($config->snap->storage->local->meta->stripos as $whitelist)
{ {
if (false !== stripos($type, $whitelist)) if (false !== stripos($response->getMeta(), $whitelist))
{ {
$allowed = true; $allowed = true;
break; break;
@ -904,12 +737,12 @@ foreach($index->search('')
continue; continue;
} }
// Check for mime allowed // Check for meta allowed
$allowed = false; $allowed = false;
foreach ($ftp->mime->stripos as $whitelist) foreach ($ftp->meta->stripos as $whitelist)
{ {
if (false !== stripos($type, $whitelist)) if (false !== stripos($response->getMeta(), $whitelist))
{ {
$allowed = true; $allowed = true;
break; break;

8
src/cli/index/init.php

@ -52,15 +52,15 @@ $result = $index->create(
[ [
'type' => 'text' 'type' => 'text'
], ],
'title' => 'h1' =>
[ [
'type' => 'text' 'type' => 'text'
], ],
'description' => 'h2' =>
[ [
'type' => 'text' 'type' => 'text'
], ],
'keywords' => 'h3' =>
[ [
'type' => 'text' 'type' => 'text'
], ],
@ -68,7 +68,7 @@ $result = $index->create(
[ [
'type' => 'text' 'type' => 'text'
], ],
'mime' => 'meta' =>
[ [
'type' => 'text' 'type' => 'text'
], ],

177
src/cli/yggo/import.php

@ -1,177 +0,0 @@
<?php
// Load dependencies
require_once __DIR__ . '/../../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../../config.json'
)
);
// Init manticore
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Connect Yggo DB
try
{
$yggo = new PDO(
'mysql:dbname=' . $argv[5] . ';host=' . $argv[1] . ';port=' . $argv[2] . ';charset=utf8',
$argv[3],
$argv[4],
[
PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8'
]
);
$yggo->setAttribute(
PDO::ATTR_ERRMODE,
PDO::ERRMODE_EXCEPTION
);
$yggo->setAttribute(
PDO::ATTR_DEFAULT_FETCH_MODE,
PDO::FETCH_OBJ
);
$yggo->setAttribute(
PDO::ATTR_TIMEOUT,
600
);
}
catch (Exception $error)
{
var_dump(
$error
);
exit;
}
$start = isset($argv[7]) ? (int) $argv[7] : 0;
$limit = isset($argv[8]) ? (int) $argv[8] : 100;
$total = $yggo->query('SELECT COUNT(*) AS `total` FROM `hostPage`
WHERE `hostPage`.`httpCode` = 200
AND `hostPage`.`timeUpdated` IS NOT NULL
AND `hostPage`.`mime` IS NOT NULL
AND `hostPage`.`size` IS NOT NULL')->fetch()->total;
$processed = $start;
for ($i = 0; $i <= $total; $i++)
{
$query = $yggo->query('SELECT `hostPage`.`hostPageId`,
`hostPage`.`httpCode`,
`hostPage`.`mime`,
`hostPage`.`size`,
`hostPage`.`timeUpdated`,
`hostPage`.`uri`,
`host`.`scheme`,
`host`.`name`,
`host`.`port`,
(
SELECT `hostPageDescription`.`title` FROM `hostPageDescription`
WHERE `hostPageDescription`.`hostPageId` = `hostPage`.`hostPageId`
ORDER BY `hostPageDescription`.`timeAdded` DESC
LIMIT 1
) AS `title`,
(
SELECT `hostPageDescription`.`description` FROM `hostPageDescription`
WHERE `hostPageDescription`.`hostPageId` = `hostPage`.`hostPageId`
ORDER BY `hostPageDescription`.`timeAdded` DESC
LIMIT 1
) AS `description`,
(
SELECT `hostPageDescription`.`keywords` FROM `hostPageDescription`
WHERE `hostPageDescription`.`hostPageId` = `hostPage`.`hostPageId`
ORDER BY `hostPageDescription`.`timeAdded` DESC
LIMIT 1
) AS `keywords`
FROM `hostPage`
JOIN `host` ON (`host`.`hostId` = `hostPage`.`hostId`)
WHERE `hostPage`.`httpCode` = 200
AND `hostPage`.`timeUpdated` IS NOT NULL
AND `hostPage`.`mime` IS NOT NULL
AND `hostPage`.`size` IS NOT NULL
GROUP BY `hostPage`.`hostPageId`
LIMIT ' . $start . ',' . $limit);
foreach ($query->fetchAll() as $remote)
{
$url = $remote->scheme . '://' . $remote->name . ($remote->port ? ':' . $remote->port : false) . $remote->uri;
$crc32url = crc32($url);
// Check for unique URL requested
if (isset($argv[6]))
{
$local = $index->search('')
->filter('id', $crc32url)
->limit(1)
->get();
if ($local->getTotal())
{
// Result
echo sprintf(
_('[%s/%s] [skip duplicate] %s') . PHP_EOL,
$processed++,
$total,
$url
);
continue;
}
}
$index->addDocument(
[
'url' => $url,
'time' => (int) $remote->timeUpdated,
'code' => (int) $remote->httpCode,
'size' => (int) $remote->size,
'mime' => (string) $remote->mime,
'title' => (string) $remote->title,
'description' => (string) $remote->description,
'keywords' => (string) $remote->keywords
],
(int) $crc32url
);
// Result
echo sprintf(
_('[%s/%s] [add] %s') . PHP_EOL,
$processed++,
$total,
$url
);
}
// Update queue offset
$start = $start + $limit;
}
// Done
echo _('import completed!') . PHP_EOL;

271
src/webui/api.php

@ -1,271 +0,0 @@
<?php
// Debug
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Set headers
header('Content-Type: application/json; charset=utf-8');
// Action
switch (!empty($_GET['action']) ? $_GET['action'] : false) {
// Snap methods
case 'snap':
switch (!empty($_GET['method']) ? $_GET['method'] : false) {
case 'download':
// Validate required attributes
switch (false)
{
case isset($_GET['source']):
echo json_encode(
[
'status' => false,
'message' => _('valid source required')
]
);
exit;
case isset($_GET['id']) && preg_match('/^[\d]+$/', $_GET['id']):
echo json_encode(
[
'status' => false,
'message' => _('valid document identifier required')
]
);
exit;
case isset($_GET['time']) && preg_match('/^[\d]+$/', $_GET['time']):
echo json_encode(
[
'status' => false,
'message' => _('valid time required')
]
);
exit;
}
// Detect remote snap source
if (preg_match('/^[\d]+$/', $_GET['source']))
{
if (!isset($config->snap->storage->remote->ftp[$_GET['source']]) || !$config->snap->storage->remote->ftp[$_GET['source']]->enabled)
{
echo json_encode(
[
'status' => false,
'message' => _('requested source not found')
]
);
exit;
}
// Connect remote
$remote = new \Yggverse\Ftp\Client();
$connection = $remote->connect(
$config->snap->storage->remote->ftp[$_GET['source']]->connection->host,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->port,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->username,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->password,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->directory,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->timeout,
$config->snap->storage->remote->ftp[$_GET['source']]->connection->passive
);
// Remote host connected
if ($connection) {
// Prepare snap path
$filename = sprintf(
'%s/%s.tar.gz',
implode(
'/',
str_split(
$_GET['id']
)
),
$_GET['time']
);
// Check snap exist
if (!$size = $remote->size($filename))
{
echo json_encode(
[
'status' => false,
'message' => _('requested snap not found')
]
);
exit;
}
// Set headers
header(
'Content-Type: application/tar+gzip'
);
header(
sprintf(
'Content-Length: %s',
$size
)
);
header(
sprintf(
'Content-Disposition: filename="snap.%s.%s"',
$_GET['id'],
basename(
$filename
)
)
);
// Return file
$remote->get(
$filename,
'php://output'
);
$remote->close();
}
}
// Local
else if ($config->snap->storage->local->enabled)
{
// Prefix absolute
if ('/' === substr($config->snap->storage->local->directory, 0, 1))
{
$prefix = $config->snap->storage->local->directory;
}
// Prefix relative
else
{
$prefix = __DIR__ . '/../../' . $config->snap->storage->local->directory;
}
// Prepare snap path
$filename = sprintf(
'%s/%s/%s.tar.gz',
$prefix,
implode(
'/',
str_split(
$_GET['id']
)
),
$_GET['time']
);
// Check snap exist
if (!file_exists($filename) || !is_readable($filename))
{
echo json_encode(
[
'status' => false,
'message' => _('requested snap not found')
]
);
exit;
}
// Check snap has valid size
if (!$size = filesize($filename))
{
echo json_encode(
[
'status' => false,
'message' => _('requested snap has invalid size')
]
);
exit;
}
// Set headers
header(
'Content-Type: application/tar+gzip'
);
header(
sprintf(
'Content-Length: %s',
$size
)
);
header(
sprintf(
'Content-Disposition: filename="snap.%s.%s"',
$_GET['id'],
basename(
$filename
)
)
);
readfile(
$filename
);
exit;
}
else
{
echo json_encode(
[
'status' => false,
'message' => _('requested source not found')
]
);
}
break;
default:
echo json_encode(
[
'status' => false,
'message' => _('Undefined API method')
]
);
}
break;
default:
echo json_encode(
[
'status' => false,
'message' => _('Undefined API action')
]
);
}

563
src/webui/explore.php

@ -1,563 +0,0 @@
<?php
// Debug
# ini_set('display_errors', '1');
# ini_set('display_startup_errors', '1');
# error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Show totals in placeholder
// Init
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Get totals
$total = $index->search('')
->option('cutoff', 0)
->limit(0)
->get()
->getTotal();
$placeholder = sprintf(
_('Search in %s documents %s'),
number_format(
$total
),
$config->webui->search->index->request->url->enabled ? _('or enter new address to crawl...') : false
);
// Get document data
$document = $index->getDocumentById(
isset($_GET['i']) ? $_GET['i'] : 0
);
// Get icon
$hostname = parse_url(
$document->url,
PHP_URL_HOST
);
$identicon = new \Jdenticon\Identicon();
$identicon->setValue(
$hostname
);
$identicon->setSize(36);
$identicon->setStyle(
[
'backgroundColor' => 'rgba(255, 255, 255, 0)',
'padding' => 0
]
);
$icon = $identicon->getImageDataUri('webp');
// Get snaps info
$snaps = [];
/// Prepare location
$filepath = implode(
'/',
str_split(
$document->getId()
)
);
/// Local snaps
if ($config->snap->storage->local->enabled)
{
/// absolute
if ('/' === substr($config->snap->storage->local->directory, 0, 1))
{
$prefix = $config->snap->storage->local->directory;
}
/// relative
else
{
$prefix = __DIR__ . '/../../' . $config->snap->storage->local->directory;
}
$directory = sprintf('%s/%s', $prefix, $filepath);
if (is_dir($directory))
{
foreach ((array) scandir($directory) as $filename)
{
if (!str_ends_with($filename, '.tar.gz'))
{
continue;
}
$basename = basename(
$filename
);
$time = preg_replace(
'/^([\d]+)\.tar\.gz$/',
'$1',
$basename
);
$snaps[_('Local')][] = (object)
[
'source' => 'local',
'id' => $document->getId(),
'name' => $basename,
'time' => $time,
'size' => filesize(
sprintf(
'%s/%s',
$directory,
$filename
)
),
];
}
}
}
/// Remote snaps
foreach ($config->snap->storage->remote->ftp as $i => $ftp)
{
// Resource enabled
if (!$ftp->enabled)
{
continue;
}
$remote = new \Yggverse\Ftp\Client();
$connection = $remote->connect(
$ftp->connection->host,
$ftp->connection->port,
$ftp->connection->username,
$ftp->connection->password,
$ftp->connection->directory,
$ftp->connection->timeout,
$ftp->connection->passive
);
// Remote host connected
if ($connection) {
foreach ((array) $remote->nlist($filepath) as $filename)
{
if (!str_ends_with($filename, '.tar.gz'))
{
continue;
}
$basename = basename(
$filename
);
$time = preg_replace(
'/^([\d]+)\.tar\.gz$/',
'$1',
$basename
);
$snaps[sprintf(_('Server #%s'), $i + 1)][] = (object)
[
'source' => $i,
'id' => $document->getId(),
'name' => $basename,
'time' => $time,
'size' => $remote->size($filename),
];
}
$remote->close();
}
}
// Process index request
if ($config->webui->index->enabled)
{
session_start();
if (isset($_POST['captcha']) && $_POST['captcha'] == $_SESSION['captcha'])
{
$index->updateDocument(
[
'index' => time()
],
$document->getId()
);
header(
sprintf(
'Location: explore.php?i=%d',
$document->getId()
)
);
}
$captcha = new \Gregwar\Captcha\CaptchaBuilder(
null,
new \Gregwar\Captcha\PhraseBuilder(
$config->webui->captcha->length,
$config->webui->captcha->phrase
)
);
$captcha->setBackgroundColor(
$config->webui->captcha->background->r,
$config->webui->captcha->background->g,
$config->webui->captcha->background->b
);
$captcha->build();
$_SESSION['captcha'] = $captcha->getPhrase();
}
?>
<!DOCTYPE html>
<html lang="<?php echo _('en-US'); ?>">
<head>
<title><?php echo _('Yo! explore') ?></title>
<meta charset="utf-8" />
<style>
* {
border: 0;
margin: 0;
padding: 0;
font-family: Sans-serif;
color: #ccc;
}
body {
background-color: #2e3436;
word-break: break-word;
}
header {
background-color: #34393b;
position: fixed;
top: 0;
left: 0;
right: 0;
z-index: 2;
}
main {
margin-top: 80px;
margin-bottom: 76px;
padding: 0 20px;
}
main > div {
max-width: 640px;
margin: 0 auto;
padding: 8px 0;
border-top: 1px #000 dashed;
font-size: 14px;
}
main > div > div {
margin: 8px 0;
font-size: 13px;
}
h1 {
position: fixed;
top: 2px;
left: 24px;
}
h1 > a,
h1 > a:visited,
h1 > a:active,
h1 > a:hover {
color: #fff;
font-weight: normal;
font-size: 22px;
text-decoration: none;
}
h2 {
display: block;
font-size: 15px;
font-weight: normal;
margin: 4px 0;
color: #fff;
}
h3 {
display: block;
font-size: 13px;
margin: 4px 0;
}
pre {
border-radius: 4px;
border: 1px #000 dashed;
font-size: 13px;
margin: 8px 0;
max-height: 180px;
overflow: auto;
padding: 8px;
position: relative;
white-space: pre-wrap;
}
form {
display: block;
max-width: 678px;
margin: 0 auto;
text-align: center;
}
fieldset {
width: 150px;
}
input[type="text"],
input[type="text"]:-webkit-autofill,
input[type="text"]:-webkit-autofill:focus {
transition: background-color 0s 600000s, color 0s 600000s; /* chrome */
width: 100%;
margin: 12px 0;
padding: 6px 0;
border-radius: 32px;
background-color: #000;
color: #fff;
font-size: 15px;
text-align: center;
}
input[type="text"]:hover {
background-color: #111
}
input[type="text"]:focus {
outline: none;
background-color: #111
}
input[type="text"]:focus::placeholder {
color: #090808
}
label {
font-size: 14px;
position: absolute;
right: 80px;
top: 18px;
}
label > input {
width: auto;
margin: 0 4px;
}
button {
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
background-color: #3394fb;
color: #fff;
font-size: 14px;
}
button {
background-color: #4b9df4;
height: 32px;
vertical-align: top;
}
header button {
position: fixed;
top: 12px;
right: 24px;
}
a, a:visited, a:active {
color: #9ba2ac;
font-size: 12px;
}
a:hover {
color: #54a3f7;
}
ul {
margin: 0;
padding: 0;
}
ul > li {
margin-left: 16px;
font-size: 13px;
padding: 4px 0;
}
.text-warning {
color: #db6161;
}
</style>
</head>
<body>
<header>
<form name="search" method="GET" action="search.php">
<h1><a href="./"><?php echo _('Yo!') ?></a></h1>
<input type="text" name="q" placeholder="<?php echo $placeholder ?>" value="" />
<?php if ($config->webui->search->extended->enabled) { ?>
<label for="e">
<input type="checkbox" name="e" id="e" value="true" />
<?php echo _('Extended') ?>
</label>
<?php } ?>
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="white" class="bi bi-search" viewBox="0 0 16 16">
<path d="M11.742 10.344a6.5 6.5 0 1 0-1.397 1.398h-.001c.03.04.062.078.098.115l3.85 3.85a1 1 0 0 0 1.415-1.414l-3.85-3.85a1.007 1.007 0 0 0-.115-.1zM12 6.5a5.5 5.5 0 1 1-11 0 5.5 5.5 0 0 1 11 0"/>
</svg>
</sub>
</button>
</form>
</header>
<main>
<?php if ($document) { ?>
<div>
<?php if (empty($document->time)) { ?>
<div>
<?php echo _('Document pending for crawler in queue') ?>
</div>
<?php } else { ?>
<?php if (!empty($document->title)) { ?>
<h2>
<?php echo htmlentities($document->title) ?>
</h2>
<?php } ?>
<?php if (!empty($document->description)) { ?>
<div>
<?php echo htmlentities($document->description) ?>
</div>
<?php } ?>
<?php if (!empty($document->keywords)) { ?>
<div>
<?php echo htmlentities($document->keywords) ?>
</div>
<?php } ?>
<?php } ?>
<div>
<a href="<?php echo $document->url ?>"><?php echo htmlentities(urldecode($document->url)) ?></a>
</div>
</div>
<div>
<div>
<img src="<?php echo $icon ?>" title="<?php echo $hostname ?>" alt="identicon" />
</div>
<?php if (!empty($document->code)) { ?>
<h3><?php echo _('HTTP') ?></h3>
<?php if ($document->code == 200) { ?>
<div>
<?php echo $document->code ?>
</div>
<?php } else { ?>
<div class="text-warning">
<?php echo $document->code ?>
</div>
<?php } ?>
<?php } ?>
<?php if (!empty($document->mime)) { ?>
<h3><?php echo _('MIME') ?></h3>
<div><?php echo $document->mime ?></div>
<?php } ?>
<?php if (!empty($document->size)) { ?>
<h3><?php echo _('Size') ?></h3>
<div><?php echo sprintf('%s bytes', number_format($document->size)) ?></div>
<?php } ?>
<?php if (!empty($document->time)) { ?>
<h3><?php echo _('Time') ?></h3>
<div><?php echo date('c', $document->time) ?></div>
<?php } ?>
<?php if ($snaps) { ?>
<h3><?php echo _('Snaps') ?></h3>
<ul>
<?php foreach ($snaps as $source => $snap) { ?>
<li>
<?php echo $source ?>
<ul>
<?php foreach ($snap as $file) { ?>
<li>
<a rel="nofollow" href="api.php?action=snap&method=download&source=<?php echo $file->source ?>&id=<?php echo $file->id ?>&time=<?php echo $file->time ?>">
<?php echo sprintf('%s (tar.gz / %s bytes)', date('c', $file->time), number_format($file->size)) ?>
</a>
</li>
<?php } ?>
</ul>
</li>
<?php } ?>
</ul>
<?php } ?>
<?php if (!empty($document->body)) { ?>
<h3><?php echo _('Cache') ?></h3>
<pre><?php echo htmlentities($document->body) ?></pre>
<?php } ?>
<?php if ($config->webui->index->enabled) { ?>
<h3><?php echo _('Index') ?></h3>
<div>
<?php if ($document->get('index')) { ?>
<?php echo sprintf(_('Request sent at %s'), date('c', $document->get('index'))) ?>
<?php } else { ?>
<img src="<?php echo $captcha->inline(100) ?>" alt="captcha" />
<form name="index" method="POST" action="explore.php?i=<?php echo $document->getId() ?>">
<fieldset>
<input type="text"
name="captcha"
value=""
placeholder="<?php echo _('Code on picture'); ?>"
autocomplete="off" />
<button type="submit">
<?php echo _('Request') ?>
</button>
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="white" viewBox="0 0 16 16">
<path d="M11.534 7h3.932a.25.25 0 0 1 .192.41l-1.966 2.36a.25.25 0 0 1-.384 0l-1.966-2.36a.25.25 0 0 1 .192-.41m-11 2h3.932a.25.25 0 0 0 .192-.41L2.692 6.23a.25.25 0 0 0-.384 0L.342 8.59A.25.25 0 0 0 .534 9"/>
<path fill-rule="evenodd" d="M8 3c-1.552 0-2.94.707-3.857 1.818a.5.5 0 1 1-.771-.636A6.002 6.002 0 0 1 13.917 7H12.9A5 5 0 0 0 8 3M3.1 9a5.002 5.002 0 0 0 8.757 2.182.5.5 0 1 1 .771.636A6.002 6.002 0 0 1 2.083 9z"/>
</svg>
</sub>
</button>
</fieldset>
</form>
<?php } ?>
</div>
<?php } ?>
</div>
<?php } else { ?>
<div>
<?php echo _('Index not found') ?>
</div>
<?php } ?>
</main>
</body>
</html>

336
src/webui/index.php

@ -1,336 +0,0 @@
<?php
// Debug
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Show totals in placeholder
// Init
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Get totals
$total = $index->search('')
->option('cutoff', 0)
->limit(0)
->get()
->getTotal();
$placeholder = sprintf(
_('Search in %s documents %s'),
number_format(
$total
),
$config->webui->search->index->request->url->enabled ? _('or enter new address to crawl...') : false
);
?>
<!DOCTYPE html>
<html lang="<?php echo _('en-US') ?>">
<head>
<title><?php echo _('Yo! Web Search Engine') ?></title>
<meta charset="utf-8" />
<meta name="description" content="<?php echo _('Yo! Micro Web Crawler in PHP & Manticore') ?>" />
<meta name="keywords" content="<?php echo _('web, search, engine, crawler, manticore, yggdrasil, js-less, open source') ?>" />
<style>
* {
border: 0;
margin: 0;
padding: 0;
font-family: Sans-serif;
color: #ccc;
}
body {
background-color: #2e3436;
}
h1 {
color: #fff;
font-weight: normal;
font-size: 36px;
margin: 16px 0
}
form {
display: block;
max-width: 640px;
margin: 16% auto;
text-align: center;
}
input,
input:-webkit-autofill,
input:-webkit-autofill:focus {
transition: background-color 0s 600000s, color 0s 600000s; /* chrome */
width: 100%;
margin: 8px 0;
padding: 12px 0;
border-radius: 32px;
background-color: #000;
color: #fff;
font-size: 16px;
text-align: center;
}
input:hover {
background-color: #111
}
input:focus {
outline: none;
background-color: #111
}
input:focus::placeholder {
color: #090808;
}
button {
margin: 22px 0;
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
background-color: #3394fb;
color: #fff;
font-size: 14px;
}
button:hover {
background-color: #4b9df4;
}
footer {
position: fixed;
bottom: 0;
left:0;
right: 0;
text-align: center;
padding: 24px;
color: #9ba2ac;
font-size: 12px;
}
footer > a,
footer > a:visited,
footer > a:active {
color: #9ba2ac;
font-size: 12px;
}
footer > a > svg,
footer > a:visited > svg,
footer > a:active > svg {
fill: #9ba2ac;
}
footer > a:hover {
color: #54a3f7;
}
footer > a:hover svg {
fill: #54a3f7;
}
footer > a,
footer > a:visited,
footer > a:active {
text-decoration: none;
}
/*
* CSS animation
* by https://codepen.io/alvarotrigo/pen/GRvYNax
*/
main {
background: #2e3436;
background: -webkit-linear-gradient(to left, #8f94fb, #4e54c8);
width: 100%;
}
ul {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
overflow: hidden;
z-index:-1
}
li {
position: absolute;
display: block;
list-style: none;
width: 20px;
height: 20px;
background: rgba(255, 255, 255, 0.2);
animation: animate 25s linear infinite;
bottom: -150px;
}
li:nth-child(1) {
left: 25%;
width: 80px;
height: 80px;
animation-delay: 0s;
}
li:nth-child(2) {
left: 10%;
width: 20px;
height: 20px;
animation-delay: 2s;
animation-duration: 12s;
}
li:nth-child(3) {
left: 70%;
width: 20px;
height: 20px;
animation-delay: 4s;
}
li:nth-child(4) {
left: 40%;
width: 60px;
height: 60px;
animation-delay: 0s;
animation-duration: 18s;
}
li:nth-child(5) {
left: 65%;
width: 20px;
height: 20px;
animation-delay: 0s;
}
li:nth-child(6) {
left: 75%;
width: 110px;
height: 110px;
animation-delay: 3s;
}
li:nth-child(7) {
left: 35%;
width: 150px;
height: 150px;
animation-delay: 7s;
}
li:nth-child(8) {
left: 50%;
width: 25px;
height: 25px;
animation-delay: 15s;
animation-duration: 45s;
}
li:nth-child(9) {
left: 20%;
width: 15px;
height: 15px;
animation-delay: 2s;
animation-duration: 35s;
}
li:nth-child(10) {
left: 85%;
width: 150px;
height: 150px;
animation-delay: 0s;
animation-duration: 11s;
}
@keyframes animate {
0%{
transform: translateY(0) rotate(0deg);
opacity: 1;
border-radius: 0;
}
100%{
transform: translateY(-1000px) rotate(720deg);
opacity: 0;
border-radius: 50%;
}
}
</style>
</head>
<body>
<header>
<form name="search" method="GET" action="search.php">
<h1><?php echo _('Yo!') ?></h1>
<input type="text" name="q" placeholder="<?php echo $placeholder ?>" value="" />
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="white" class="bi bi-search" viewBox="0 0 16 16">
<path d="M11.742 10.344a6.5 6.5 0 1 0-1.397 1.398h-.001c.03.04.062.078.098.115l3.85 3.85a1 1 0 0 0 1.415-1.414l-3.85-3.85a1.007 1.007 0 0 0-.115-.1zM12 6.5a5.5 5.5 0 1 1-11 0 5.5 5.5 0 0 1 11 0"/>
</svg>
</sub>
&nbsp;
<?php echo _('Search'); ?>
</button>
</form>
</header>
<!-- css animation : begin -->
<main>
<ul>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
</ul>
</main>
<!-- css animation : end -->
<footer>
<?php foreach ($config->webui->footer->links as $i => $link) { ?>
<?php if ($i) echo '|' ?>
<a <?php foreach ($link->attributes as $name => $value) { echo sprintf(' %s="%s"', $name, $value); } ?>>
<?php echo _($link->text) ?>
</a>
<?php foreach ($link->index as $index) { ?>
<a rel="nofollow" href="<?php echo $index ?>" title="<?php echo sprintf(_('Download %s database'), $link->text) ?>">
<svg xmlns="http://www.w3.org/2000/svg" width="11" height="11" viewBox="0 0 16 16">
<path d="M12.5 9a3.5 3.5 0 1 1 0 7 3.5 3.5 0 0 1 0-7m.354 5.854 1.5-1.5a.5.5 0 0 0-.708-.708l-.646.647V10.5a.5.5 0 0 0-1 0v2.793l-.646-.647a.5.5 0 0 0-.708.708l1.5 1.5a.5.5 0 0 0 .708 0ZM8 1c-1.573 0-3.022.289-4.096.777C2.875 2.245 2 2.993 2 4s.875 1.755 1.904 2.223C4.978 6.711 6.427 7 8 7s3.022-.289 4.096-.777C13.125 5.755 14 5.007 14 4s-.875-1.755-1.904-2.223C11.022 1.289 9.573 1 8 1"/>
<path d="M2 7v-.839c.457.432 1.004.751 1.49.972C4.722 7.693 6.318 8 8 8s3.278-.307 4.51-.867c.486-.22 1.033-.54 1.49-.972V7c0 .424-.155.802-.411 1.133a4.51 4.51 0 0 0-4.815 1.843A12.31 12.31 0 0 1 8 10c-1.573 0-3.022-.289-4.096-.777C2.875 8.755 2 8.007 2 7m6.257 3.998L8 11c-1.682 0-3.278-.307-4.51-.867-.486-.22-1.033-.54-1.49-.972V10c0 1.007.875 1.755 1.904 2.223C4.978 12.711 6.427 13 8 13h.027a4.552 4.552 0 0 1 .23-2.002m-.002 3L8 14c-1.682 0-3.278-.307-4.51-.867-.486-.22-1.033-.54-1.49-.972V13c0 1.007.875 1.755 1.904 2.223C4.978 15.711 6.427 16 8 16c.536 0 1.058-.034 1.555-.097a4.507 4.507 0 0 1-1.3-1.905"/>
</svg>
</a>
<?php } ?>
<?php } ?>
</footer>
</body>
</html>

514
src/webui/search.php

@ -1,514 +0,0 @@
<?php
// Debug
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
// Load dependencies
require_once __DIR__ . '/../../vendor/autoload.php';
// Init config
$config = json_decode(
file_get_contents(
__DIR__ . '/../../config.json'
)
);
// Init
$client = new \Manticoresearch\Client(
[
'host' => $config->manticore->server->host,
'port' => $config->manticore->server->port,
]
);
// Init index
$index = $client->index(
$config->manticore->index->document->name
);
// Get totals
$total = $index->search('')
->option('cutoff', 0)
->limit(0)
->get()
->getTotal();
$placeholder = sprintf(
_('Search in %s documents %s'),
number_format(
$total
),
$config->webui->search->index->request->url->enabled ? _('or enter new address to crawl...') : false
);
$response = false;
// Request
$q = !empty($_GET['q']) ? trim($_GET['q']) : '';
$p = !empty($_GET['p']) ? (int) $_GET['p'] : 1;
// Register new URL by request on enabled
if ($config->webui->search->index->request->url->enabled && filter_var($q, FILTER_VALIDATE_URL))
{
if (preg_match($config->webui->search->index->request->url->regex, $q))
{
// Prepare URL
$url = $q;
$crc32url = crc32($url);
// Check URL for exist
$exist = $index->search('')
->filter('id', $crc32url)
->limit(1)
->get()
->getTotal();
if ($exist)
{
/* disable as regular search request possible
$response = sprintf(
_('URL "%s" exists in search index'),
htmlentities($q)
);
*/
}
// Add URL
else
{
// @TODO check http code
$index->addDocument(
[
'url' => $url,
'rank' => (int) mb_strlen(
(string)
urldecode(
(string)
parse_url(
$url,
PHP_URL_PATH
)
)
)
],
$crc32url
);
$response = sprintf(
_('URL "%s" added to the crawl queue!'),
htmlentities($q)
);
}
}
else {
$response = sprintf(
_('URL "%s" does not match node settings!'),
htmlentities($q)
);
}
}
// Extended corrections
switch (true)
{
// Empty query
case empty($q):
$query = $index->search('')->sort('RAND()');
break;
// URL request
case filter_var($q, FILTER_VALIDATE_URL):
$query = $index->search('')->filter('id', crc32($q));
break;
default:
// Allow raw requests on extended syntax mode requested
// http://sphinxsearch.com/docs/current/extended-syntax.html
if (isset($_GET['e']) && $config->webui->search->extended->enabled)
{
$query = $index->search($q);
}
// Regular request
else
{
$query = $index->search(
@\Manticoresearch\Utils::escape(
$q
)
);
}
}
// Apply search options (e.g. field_weights)
foreach ($config->webui->search->options as $key => $value)
{
if (is_int($value) || is_string($value))
{
$query->option(
$key,
$value
);
}
else
{
$query->option(
$key,
(array) $value
);
}
}
// Apply highlight options
if ($config->webui->search->highlight->fields)
{
$query->highlight(
(array) $config->webui->search->highlight->fields,
(array) $config->webui->search->highlight->options
);
}
// Get found
$found = empty($q) ? $total : $query->get()->getTotal();
// Search request begin
$results = $query->offset($p * $config->webui->pagination->limit - $config->webui->pagination->limit)
->limit($config->webui->pagination->limit)
->get();
?>
<!DOCTYPE html>
<html lang="<?php echo _('en-US'); ?>">
<head>
<title><?php echo sprintf(_('Yo! %s'), htmlentities($q)) ?></title>
<meta charset="utf-8" />
<meta name="keywords" content="<?php echo htmlentities($q) ?>" />
<style>
* {
border: 0;
margin: 0;
padding: 0;
font-family: Sans-serif;
color: #ccc;
}
body {
background-color: #2e3436;
word-break: break-word;
}
header {
background-color: #34393b;
position: fixed;
top: 0;
left: 0;
right: 0;
z-index: 2;
}
main {
margin-top: 80px;
margin-bottom: 76px;
padding: 0 32px;
}
main > div {
border-top: 1px #000 dashed;
font-size: 14px;
margin: 0 auto;
max-width: 620px;
padding: 8px 0;
position: relative;
}
main > div > img {
left: -24px;
position: absolute;
top: 18px;
}
main > div > div {
padding: 8px 0;
line-height: 16px;
}
main > div > div > a {
font-size: 12px;
}
h1 {
position: fixed;
top: 2px;
left: 24px;
}
h1 > a,
h1 > a:visited,
h1 > a:active,
h1 > a:hover {
color: #fff;
font-weight: normal;
font-size: 22px;
margin: 0;
text-decoration: none;
}
h2 {
display: block;
font-size: 15px;
font-weight: normal;
color: #fff;
}
form {
display: block;
max-width: 678px;
margin: 0 auto;
text-align: center;
}
input[type="checkbox"] {
accent-color: #3394fb;
}
input[type="text"],
input[type="text"]:-webkit-autofill,
input[type="text"]:-webkit-autofill:focus {
transition: background-color 0s 600000s, color 0s 600000s; /* chrome */
width: 100%;
margin: 12px 0;
padding: 6px 0;
border-radius: 32px;
background-color: #000;
color: #fff;
font-size: 15px;
text-align: center;
}
input[type="text"]:hover {
background-color: #111
}
input[type="text"]:focus {
outline: none;
background-color: #111
}
input[type="text"]:focus::placeholder {
color: #090808
}
label {
font-size: 14px;
position: absolute;
right: 80px;
top: 18px;
}
label > input {
width: auto;
margin: 0 4px;
}
button {
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
background-color: #3394fb;
color: #fff;
font-size: 14px;
position: fixed;
top: 12px;
right: 24px;
}
button:hover {
background-color: #4b9df4;
}
a, a:visited, a:active {
color: #9ba2ac;
}
a:hover {
color: #54a3f7;
}
span {
display: block;
margin: 8px 0;
}
.text-warning {
color: #db6161;
fill: #db6161;
}
</style>
</head>
<body>
<header>
<form name="search" method="GET" action="search.php">
<h1><a href="./"><?php echo _('Yo!') ?></a></h1>
<input type="text" name="q" placeholder="<?php echo $placeholder ?>" value="<?php echo htmlentities($q) ?>" />
<?php if ($config->webui->search->extended->enabled) { ?>
<label for="e">
<input type="checkbox" name="e" id="e" value="true" <?php echo isset($_GET['e']) ? 'checked="checked"': false ?>/>
<?php echo _('Extended') ?>
</label>
<?php } ?>
<button type="submit">
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="white" class="bi bi-search" viewBox="0 0 16 16">
<path d="M11.742 10.344a6.5 6.5 0 1 0-1.397 1.398h-.001c.03.04.062.078.098.115l3.85 3.85a1 1 0 0 0 1.415-1.414l-3.85-3.85a1.007 1.007 0 0 0-.115-.1zM12 6.5a5.5 5.5 0 1 1-11 0 5.5 5.5 0 0 1 11 0"/>
</svg>
</sub>
</button>
</form>
</header>
<main>
<?php if (isset($_GET['e']) && $config->webui->search->extended->enabled) { ?>
<div>
<p>
<?php echo _('Extended syntax enabled, follow') ?>
<a href="http://sphinxsearch.com/docs/current/extended-syntax.html" rel="nofollow" target="_blank"><?php echo _('Documentation') ?></a>
<sub>
<svg xmlns="http://www.w3.org/2000/svg" width="11" height="11" fill="currentColor" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M8.636 3.5a.5.5 0 0 0-.5-.5H1.5A1.5 1.5 0 0 0 0 4.5v10A1.5 1.5 0 0 0 1.5 16h10a1.5 1.5 0 0 0 1.5-1.5V7.864a.5.5 0 0 0-1 0V14.5a.5.5 0 0 1-.5.5h-10a.5.5 0 0 1-.5-.5v-10a.5.5 0 0 1 .5-.5h6.636a.5.5 0 0 0 .5-.5"/>
<path fill-rule="evenodd" d="M16 .5a.5.5 0 0 0-.5-.5h-5a.5.5 0 0 0 0 1h3.793L6.146 9.146a.5.5 0 1 0 .708.708L15 1.707V5.5a.5.5 0 0 0 1 0z"/>
</svg>
</sub>
</p>
<p>
<?php echo _('Available fields:') ?>
<i>@title</i>
<i>@description</i>
<i>@keywords</i>
<i>@mime</i>
<i>@url</i>
</p>
</div>
<?php } ?>
<?php if ($response) { ?>
<div>
<?php echo $response ?>
</div>
<?php } ?>
<div>
<?php echo sprintf(_('Found: %s'), number_format($found)) ?>
</div>
<?php foreach ($results as $result) { ?>
<div>
<?php
$hostname = parse_url(
$result->url,
PHP_URL_HOST
);
$identicon = new \Jdenticon\Identicon();
$identicon->setValue(
$hostname
);
$identicon->setSize(14);
$identicon->setStyle(
[
'backgroundColor' => 'rgba(255, 255, 255, 0)',
'padding' => 0
]
);
$icon = $identicon->getImageDataUri('webp');
?>
<img src="<?php echo $icon ?>" title="<?php echo $hostname ?>" alt="identicon" />
<?php if (!empty($result->getHighlight()['title'])) { ?>
<div>
<h2>
<?php foreach ($result->getHighlight()['title'] as $title) { ?>
<p><?php echo $title ?></p>
<?php } ?>
</h2>
</div>
<?php } else if (!empty($result->title)) { ?>
<div>
<h2><?php echo $result->title ?></h2>
</div>
<?php } ?>
<?php if (!empty($result->getHighlight()['description'])) { ?>
<div>
<?php foreach ($result->getHighlight()['description'] as $description) { ?>
<p><?php echo $description ?></p>
<?php } ?>
</div>
<?php } else if (!empty($result->description)) { ?>
<div>
<?php echo $result->description ?>
</div>
<?php } ?>
<?php if (!empty($result->getHighlight()['keywords'])) { ?>
<div>
<?php foreach ($result->getHighlight()['keywords'] as $keywords) { ?>
<p><?php echo $keywords ?></p>
<?php } ?>
</div>
<?php } else if (!empty($result->keywords)) { ?>
<div>
<?php echo $result->keywords ?>
</div>
<?php } ?>
<?php if (!empty($result->getHighlight()['body'])) { ?>
<div>
<?php foreach ($result->getHighlight()['body'] as $body) { ?>
<p><?php echo $body ?></p>
<?php } ?>
</div>
<?php } ?>
<div>
<?php if (!empty($result->getHighlight()['url'])) { ?>
<?php foreach ($result->getHighlight()['url'] as $url) { ?>
<a href="<?php echo $result->url ?>"><?php echo urldecode($url) ?></a>
<?php } ?>
<?php } else if (!empty($result->title)) { ?>
<a href="<?php echo $result->url ?>"><?php echo htmlentities(urldecode($result->url)) ?></a>
<?php } ?>
<?php if (!in_array($result->get('code'), [0, 200])) { ?>
<small>&bull;</small>
<svg xmlns="http://www.w3.org/2000/svg" width="10" height="10" class="text-warning" viewBox="0 0 16 16">
<path d="m9.97 4.88.953 3.811C10.159 8.878 9.14 9 8 9c-1.14 0-2.158-.122-2.923-.309L6.03 4.88C6.635 4.957 7.3 5 8 5s1.365-.043 1.97-.12m-.245-.978L8.97.88C8.718-.13 7.282-.13 7.03.88L6.275 3.9C6.8 3.965 7.382 4 8 4c.618 0 1.2-.036 1.725-.098zm4.396 8.613a.5.5 0 0 1 .037.96l-6 2a.5.5 0 0 1-.316 0l-6-2a.5.5 0 0 1 .037-.96l2.391-.598.565-2.257c.862.212 1.964.339 3.165.339s2.303-.127 3.165-.339l.565 2.257 2.391.598"/>
</svg>
<small><?php echo $result->get('code') ?></small>
<?php } ?>
<small>&bull;</small>
<a rel="nofollow" href="explore.php?i=<?php echo $result->getId() ?>"><?php echo _('explore') ?></a>
</div>
</div>
<?php } ?>
<?php if ($p * $config->webui->pagination->limit <= $results->getTotal()) { ?>
<div>
<div>
<a href="search.php?q=<?php echo urlencode(htmlentities($q)) ?>&p=<?php echo $p + 1 ?>">
<?php echo _('More') ?>
</a>
</div>
</div>
<?php } ?>
</main>
</body>
</html>
Loading…
Cancel
Save