Commit Graph

96 Commits

Author SHA1 Message Date
ghost
7c407e0d1f update crontab example 2023-08-03 08:34:51 +03:00
ghost
5df59661d8 add page rank update optional in the crawl queue 2023-08-02 21:21:23 +03:00
ghost
d119756a41 fix index size 2023-08-01 16:23:40 +03:00
ghost
662351cc46 make meta fields index separated, set search priority by document title 2023-08-01 14:15:14 +03:00
ghost
5791877a4e update Filter::searchQuery method, fix search by URL 2023-08-01 13:50:07 +03:00
ghost
3235133cd0 extract keywords from URI 2023-07-31 22:42:49 +03:00
ghost
2ef9948342 change default CRAWL_PAGE_HOME_SECONDS_OFFSET value to 1 month 2023-07-31 22:04:27 +03:00
ghost
9c0f361601 refactor snap storage 2023-07-31 13:33:30 +03:00
ghost
000b9ad8dd add FS cleaning features, lock execution on active crontab tasks, disable hostPageSnap/localhost untested constructions 2023-07-30 21:53:30 +03:00
ghost
3e3b7ee2ef optimize snaps, delete unused constructions 2023-07-30 19:09:41 +03:00
ghost
b13293988a add search index by host and host page URL 2023-07-30 12:39:41 +03:00
ghost
712d67f6bf implement unlimited snap storage mirrors, delete megaCMD integration 2023-07-29 14:37:01 +03:00
ghost
1dd0a8ee2c make page rank procedural, optimize performance 2023-07-28 12:49:43 +03:00
ghost
4a4394fb27 add memcached support 2023-07-27 17:53:36 +03:00
ghost
2e2501b437 implement sitemap support 2023-07-27 11:44:42 +03:00
ghost
3218add372 add custom home page reindex settings 2023-06-30 13:28:22 +03:00
ghost
5346b13602 implement custom hostPageDom elements index 2023-06-25 22:10:47 +03:00
ghost
c07d6af52f add new mime preset 2023-06-13 21:57:01 +03:00
ghost
830e96b03d increase minimum requirements 2023-06-13 03:16:29 +03:00
ghost
dd736c7923 crontab schedule optimization 2023-06-10 00:19:27 +03:00
ghost
8726512cf0 change morphology from stem_enru to lemmatize_ru_all/lemmatize_en_all 2023-06-05 18:20:49 +03:00
ghost
8bc8a943e7 add lemmatize_de_all 2023-06-05 18:13:31 +03:00
ghost
17f69b9661 add min_word_len, min_prefix_len, html_strip, index_exact_words presets example 2023-06-05 13:36:15 +03:00
ghost
1e2736d67b skip empty mime type index 2023-06-04 18:10:59 +03:00
ghost
4fa33afe40 prevent infinitive connection on streaming resources detected 2023-06-04 17:02:32 +03:00
ghost
982be2a949 add the description text source 2023-05-30 21:46:52 +03:00
ghost
cb60d52a0b update documentation 2023-05-29 22:36:13 +03:00
ghost
45c4f7b7b0 add database optimization settings 2023-05-29 22:13:41 +03:00
ghost
2853db6207 fix mimes separator 2023-05-15 17:18:33 +03:00
ghost
f827c37691 add MEGAcmd/FTP launch examples 2023-05-15 11:51:27 +03:00
ghost
81f7ea1e1e implement multi-storage snap downloads 2023-05-15 09:18:18 +03:00
ghost
1969707eeb integrate optional MEGA/cmd snap storage 2023-05-14 19:41:20 +03:00
ghost
0d19004e86 make local snap storage optimization 2023-05-14 01:45:55 +03:00
ghost
2f7d99079d implement local snaps 2023-05-13 10:15:07 +03:00
ghost
d98b8f5c94 remove hostPageToHostPage.quantity field because of implements wrong duplicates counting on reindex 2023-05-13 06:30:40 +03:00
ghost
28e8bcf8d7 add audio/video media crawl support 2023-05-13 01:23:09 +03:00
ghost
566d3b442e make mime details grouped 2023-05-10 23:37:24 +03:00
ghost
746cc228a9 update page rank query 2023-05-10 15:42:48 +03:00
ghost
db0e66c846 refactor to mime-based content index #1 2023-05-10 12:47:36 +03:00
ghost
e7c5e2ca9d GROUP_CONCAT host image descriptions 2023-05-09 16:27:31 +03:00
ghost
28bf526d53 add host nsfw settings 2023-05-09 13:26:19 +03:00
ghost
d186fff48f skip curl download on response data size reached 2023-05-09 10:21:37 +03:00
ghost
23ead4e12c update page / image description models, implement history snap crawling 2023-05-09 08:19:49 +03:00
ghost
77bd25f587 add line separators 2023-05-09 01:39:56 +03:00
ghost
0e9d29675f implement host page description history crawling 2023-05-09 01:29:32 +03:00
ghost
e9d5137dfe allow svg images mime content type 2023-05-08 13:00:37 +03:00
ghost
25b6bce2ec add crawler/cleaner logs 2023-05-08 11:04:59 +03:00
ghost
fdd18de373 remove abstraction 2023-05-06 14:03:43 +03:00
ghost
4801360a51 update api version 2023-05-06 13:55:05 +03:00
ghost
b6605b9132 implement not reachable resources ban feature with timeout to prevent extra http requests 2023-05-06 08:45:37 +03:00