28 Commits (0bda87fbe664db0e87c0349fd05a9b68ee546ba4)

Author SHA1 Message Date
ghost 3235133cd0 extract keywords from URI 1 year ago
ghost b13293988a add search index by host and host page URL 1 year ago
ghost 1dd0a8ee2c make page rank procedural, optimize performance 1 year ago
ghost 830e96b03d increase minimum requirements 2 years ago
ghost 8726512cf0 change morphology from stem_enru to lemmatize_ru_all/lemmatize_en_all 2 years ago
ghost 8bc8a943e7 add lemmatize_de_all 2 years ago
ghost 17f69b9661 add min_word_len, min_prefix_len, html_strip, index_exact_words presets example 2 years ago
ghost 1e2736d67b skip empty mime type index 2 years ago
ghost d98b8f5c94 remove `hostPageToHostPage`.`quantity` field because of implements wrong duplicates counting on reindex 2 years ago
ghost 566d3b442e make mime details grouped 2 years ago
ghost 746cc228a9 update page rank query 2 years ago
ghost db0e66c846 refactor to mime-based content index #1 2 years ago
ghost e7c5e2ca9d GROUP_CONCAT host image descriptions 2 years ago
ghost 23ead4e12c update page / image description models, implement history snap crawling 2 years ago
ghost 77bd25f587 add line separators 2 years ago
ghost 0e9d29675f implement host page description history crawling 2 years ago
ghost b6605b9132 implement not reachable resources ban feature with timeout to prevent extra http requests 2 years ago
ghost 63b51f71c6 fix space offset 2 years ago
ghost f980b6318c add page meta to the image index 2 years ago
ghost baf78e2bf5 add hostImage examples to sphinx configuration 2 years ago
ghost 74dd15e544 add page rank sort order attribute 2 years ago
ghost d20487acfd add stem_enru, stem_cz, stem_ar morphology support 2 years ago
ghost 2a79671cf1 add missed option example 2 years ago
ghost 8d102ecdf7 index hosts with enabled status only 2 years ago
ghost 0b12e872a3 add host name to the search index 2 years ago
ghost e98146b78b index only 200 http code pages 2 years ago
ghost 0f2b772fa8 remove not indexed pages from the search index 2 years ago
ghost 2495a2bbc7 implement MySQL/Sphinx data model #3, add basical robots.txt support #2 2 years ago