Commit Graph

25 Commits

Author SHA1 Message Date
ghost
830e96b03d increase minimum requirements 2023-06-13 03:16:29 +03:00
ghost
8726512cf0 change morphology from stem_enru to lemmatize_ru_all/lemmatize_en_all 2023-06-05 18:20:49 +03:00
ghost
8bc8a943e7 add lemmatize_de_all 2023-06-05 18:13:31 +03:00
ghost
17f69b9661 add min_word_len, min_prefix_len, html_strip, index_exact_words presets example 2023-06-05 13:36:15 +03:00
ghost
1e2736d67b skip empty mime type index 2023-06-04 18:10:59 +03:00
ghost
d98b8f5c94 remove hostPageToHostPage.quantity field because of implements wrong duplicates counting on reindex 2023-05-13 06:30:40 +03:00
ghost
566d3b442e make mime details grouped 2023-05-10 23:37:24 +03:00
ghost
746cc228a9 update page rank query 2023-05-10 15:42:48 +03:00
ghost
db0e66c846 refactor to mime-based content index #1 2023-05-10 12:47:36 +03:00
ghost
e7c5e2ca9d GROUP_CONCAT host image descriptions 2023-05-09 16:27:31 +03:00
ghost
23ead4e12c update page / image description models, implement history snap crawling 2023-05-09 08:19:49 +03:00
ghost
77bd25f587 add line separators 2023-05-09 01:39:56 +03:00
ghost
0e9d29675f implement host page description history crawling 2023-05-09 01:29:32 +03:00
ghost
b6605b9132 implement not reachable resources ban feature with timeout to prevent extra http requests 2023-05-06 08:45:37 +03:00
ghost
63b51f71c6 fix space offset 2023-05-04 04:20:54 +03:00
ghost
f980b6318c add page meta to the image index 2023-05-04 04:20:20 +03:00
ghost
baf78e2bf5 add hostImage examples to sphinx configuration 2023-05-04 01:34:12 +03:00
ghost
74dd15e544 add page rank sort order attribute 2023-04-25 17:07:57 +03:00
ghost
d20487acfd add stem_enru, stem_cz, stem_ar morphology support 2023-04-25 16:10:44 +03:00
ghost
2a79671cf1 add missed option example 2023-04-25 16:09:38 +03:00
ghost
8d102ecdf7 index hosts with enabled status only 2023-04-08 18:23:48 +03:00
ghost
0b12e872a3 add host name to the search index 2023-04-08 18:22:53 +03:00
ghost
e98146b78b index only 200 http code pages 2023-04-07 05:34:45 +03:00
ghost
0f2b772fa8 remove not indexed pages from the search index 2023-04-07 04:50:01 +03:00
ghost
2495a2bbc7 implement MySQL/Sphinx data model #3, add basical robots.txt support #2 2023-04-07 04:04:24 +03:00