104 Commits (ab6c0379c855be7630ff1e9c99eaa275d1efcf35)

Author SHA1 Message Date
ghost ab6c0379c8 implement hosts crawl queue, move robots, sitemaps, manifests to this task 1 year ago
ghost 71724ae33f refactor manifest crawling 1 year ago
ghost cb37c57bc4 rename example files 1 year ago
ghost 68d5820f30 reserve one hour for huge load operations 1 year ago
ghost 282a6d609d update manifest API 1 year ago
ghost b24d31f360 refactor cleaner, delegate tasks to crawler, init hostSetting table 1 year ago
ghost 02612d098b delete getFoundHostPage method, update API version 1 year ago
ghost 772975059c add mysql conf example 1 year ago
ghost 7c407e0d1f update crontab example 1 year ago
ghost 5df59661d8 add page rank update optional in the crawl queue 1 year ago
ghost d119756a41 fix index size 1 year ago
ghost 662351cc46 make meta fields index separated, set search priority by document title 1 year ago
ghost 5791877a4e update Filter::searchQuery method, fix search by URL 1 year ago
ghost 3235133cd0 extract keywords from URI 1 year ago
ghost 2ef9948342 change default CRAWL_PAGE_HOME_SECONDS_OFFSET value to 1 month 1 year ago
ghost 9c0f361601 refactor snap storage 1 year ago
ghost 000b9ad8dd add FS cleaning features, lock execution on active crontab tasks, disable hostPageSnap/localhost untested constructions 1 year ago
ghost 3e3b7ee2ef optimize snaps, delete unused constructions 1 year ago
ghost b13293988a add search index by host and host page URL 1 year ago
ghost 712d67f6bf implement unlimited snap storage mirrors, delete megaCMD integration 1 year ago
ghost 1dd0a8ee2c make page rank procedural, optimize performance 1 year ago
ghost 4a4394fb27 add memcached support 1 year ago
ghost 2e2501b437 implement sitemap support 1 year ago
ghost 3218add372 add custom home page reindex settings 2 years ago
ghost 5346b13602 implement custom hostPageDom elements index 2 years ago
ghost c07d6af52f add new mime preset 2 years ago
ghost 830e96b03d increase minimum requirements 2 years ago
ghost dd736c7923 crontab schedule optimization 2 years ago
ghost 8726512cf0 change morphology from stem_enru to lemmatize_ru_all/lemmatize_en_all 2 years ago
ghost 8bc8a943e7 add lemmatize_de_all 2 years ago
ghost 17f69b9661 add min_word_len, min_prefix_len, html_strip, index_exact_words presets example 2 years ago
ghost 1e2736d67b skip empty mime type index 2 years ago
ghost 4fa33afe40 prevent infinitive connection on streaming resources detected 2 years ago
ghost 982be2a949 add the description text source 2 years ago
ghost cb60d52a0b update documentation 2 years ago
ghost 45c4f7b7b0 add database optimization settings 2 years ago
ghost 2853db6207 fix mimes separator 2 years ago
ghost f827c37691 add MEGAcmd/FTP launch examples 2 years ago
ghost 81f7ea1e1e implement multi-storage snap downloads 2 years ago
ghost 1969707eeb integrate optional MEGA/cmd snap storage 2 years ago
ghost 0d19004e86 make local snap storage optimization 2 years ago
ghost 2f7d99079d implement local snaps 2 years ago
ghost d98b8f5c94 remove `hostPageToHostPage`.`quantity` field because of implements wrong duplicates counting on reindex 2 years ago
ghost 28e8bcf8d7 add audio/video media crawl support 2 years ago
ghost 566d3b442e make mime details grouped 2 years ago
ghost 746cc228a9 update page rank query 2 years ago
ghost db0e66c846 refactor to mime-based content index #1 2 years ago
ghost e7c5e2ca9d GROUP_CONCAT host image descriptions 2 years ago
ghost 28bf526d53 add host nsfw settings 2 years ago
ghost d186fff48f skip curl download on response data size reached 2 years ago