382 Commits (ebc1a573dc9b92c365a1fba08b84a215c9bdb038)
 

Author SHA1 Message Date
ghost ebc1a573dc initiate CLI tool 2 years ago
ghost 5346b13602 implement custom hostPageDom elements index 2 years ago
ghost 5df598a1d4 fix variable name 2 years ago
ghost 1c5346bc07 remove single char words 2 years ago
ghost e16a7b8171 fix HY000/1366 error processing 2 years ago
ghost dc2d971ba0 clean up banned pages extra data 2 years ago
ghost a657d31e1d fix enum data type 2 years ago
ghost d96abb8ea8 ban host page on encoding not detected 2 years ago
ghost d2469e9adc fix meta variables overwrite 2 years ago
ghost 0949d7f871 set default encoding 2 years ago
ghost 1d5d5ead5d fix DomDocument initiation without encoding provided 2 years ago
ghost fcda6b9885 remove MIME filters from explorer search form 2 years ago
ghost f3475035c2 show page size in explorer view, hide not available data 2 years ago
ghost 8a747de341 fix HTML/multimedia content detection 2 years ago
ghost 93c6067fd9 fix host page mime detection 2 years ago
ghost c07d6af52f add new mime preset 2 years ago
ghost 052b08ea26 show results quantity in the mime filter titles 2 years ago
ghost 80d3912bc7 allow x-raw-image links 2 years ago
ghost b23f550a1b skip magnet links 2 years ago
ghost acba2816e2 remove transaction from tables optimization case 2 years ago
ghost be81299c84 update readme 2 years ago
ghost b2cf9fc6a5 do table optimization in separated transaction 2 years ago
ghost ab78e17ca8 add hostPage.size collection 2 years ago
ghost 830e96b03d increase minimum requirements 2 years ago
ghost 7892784f5c add httpCode column to hostPageSnapDownload table 2 years ago
ghost 20726fca45 update readme 2 years ago
ghost dd736c7923 crontab schedule optimization 2 years ago
ghost a79993a94b add an mk 2 years ago
ghost edec590e09 fix MAYBE filter in the default search mode 2 years ago
ghost e1fb7f8c17 change query separators to the MAYBE operator in default search mode 2 years ago
ghost 9379809261 colorize meow 2 years ago
ghost 0af5d165d3 remove logCrawler column not in use 2 years ago
ghost 4b16b41440 make transaction for each item in crawl queue 2 years ago
ghost b585b16d31 fix datatype error detection 2 years ago
ghost 8726512cf0 change morphology from stem_enru to lemmatize_ru_all/lemmatize_en_all 2 years ago
ghost 8bc8a943e7 add lemmatize_de_all 2 years ago
ghost 3d3fcdda87 update readme 2 years ago
ghost a1249859a8 update readme 2 years ago
ghost eef23cc830 update readme 2 years ago
ghost 17f69b9661 add min_word_len, min_prefix_len, html_strip, index_exact_words presets example 2 years ago
ghost 1e2736d67b skip empty mime type index 2 years ago
ghost c5e25d17fb prevent page ban when it MIME in the whitelist, skip steps below only (make multimedia/streaming resources visible in search results) 2 years ago
ghost 4fa33afe40 prevent infinitive connection on streaming resources detected 2 years ago
ghost 345c59b5f4 collect target location links on page redirect available 2 years ago
ghost 5d7f2bf68c fix snap foreign keys deletion 2 years ago
ghost 242e0abd86 ban pages only on data type error codes only 2 years ago
ghost 62a4f33b53 load missed dependency 2 years ago
ghost 512bd56056 ban page that throws the error and stuck the crawl queue 2 years ago
ghost 5a47c66e55 fix readme description 2 years ago
ghost f49076bb0c index homepages and shorter URL with higher priority 2 years ago