135 Commits (6e03a76ed83e40ecf8f88131a3b9eb4b20acb04f)

Author SHA1 Message Date
ghost 004a5336de remove htmls pages ban on title tag not available 1 year ago
ghost de28d85a71 add connection exceptions 1 year ago
ghost d46c4921c5 add page break 1 year ago
ghost d024ffd770 implement unlimited settings customization for each host 1 year ago
ghost ab6c0379c8 implement hosts crawl queue, move robots, sitemaps, manifests to this task 1 year ago
ghost 6ee5e53ef4 show sitemaps processed debug 1 year ago
ghost 71724ae33f refactor manifest crawling 1 year ago
ghost efbbf19601 fix multimedia snaps 1 year ago
ghost b24d31f360 refactor cleaner, delegate tasks to crawler, init hostSetting table 1 year ago
ghost fd90e2d517 keep banned pages data 1 year ago
ghost 11e02da66d memory usage optimization, rename methods, remove memchached dependency from the model 1 year ago
ghost cbabea595b rename method name 1 year ago
ghost 1249e8d29c fix CRAWL_PAGE_RANK_UPDATE condition 1 year ago
ghost 5df59661d8 add page rank update optional in the crawl queue 1 year ago
ghost 1d7deffc4c update PR generation, delegate PR value from redirecting pages, update method names 1 year ago
ghost 1655ec63b2 skip xmpp links 1 year ago
ghost 06c136f05c fix meta/nofollow attribute processing 1 year ago
ghost 43776b5ff4 fix semaphores 1 year ago
ghost fd3444a379 change timestamp sort order 1 year ago
ghost 9c0f361601 refactor snap storage 1 year ago
ghost 547cd6717b prevent scheduled execution on cli/yggo running 1 year ago
ghost 3e3b7ee2ef optimize snaps, delete unused constructions 1 year ago
ghost 307eb03600 build host/host page URL in SQL query 1 year ago
ghost 1f33205236 add script tag support 1 year ago
ghost b433fa6b3c add link tag support 1 year ago
ghost 79d07dd1a5 fix snap fs init 1 year ago
ghost 6eb45fdad2 fix snap crc32name index 1 year ago
ghost 712d67f6bf implement unlimited snap storage mirrors, delete megaCMD integration 1 year ago
ghost 2c17c93e2f fix broken snaps autodelection 1 year ago
ghost 1dd0a8ee2c make page rank procedural, optimize performance 1 year ago
ghost 2e2501b437 implement sitemap support 1 year ago
ghost 4cb27f563f fix meta index/nofollow processing 1 year ago
ghost b7c415a8b0 crawl host page DOM selectors on meta robots:index/follow condition enabled only 1 year ago
ghost 443eaec64e autodelete failed snaps 1 year ago
ghost 4298203cab make paths absolute 2 years ago
ghost 3218add372 add custom home page reindex settings 2 years ago
ghost d912caeb0c fix variable name 2 years ago
ghost 5346b13602 implement custom hostPageDom elements index 2 years ago
ghost 5df598a1d4 fix variable name 2 years ago
ghost e16a7b8171 fix HY000/1366 error processing 2 years ago
ghost dc2d971ba0 clean up banned pages extra data 2 years ago
ghost d96abb8ea8 ban host page on encoding not detected 2 years ago
ghost d2469e9adc fix meta variables overwrite 2 years ago
ghost 1d5d5ead5d fix DomDocument initiation without encoding provided 2 years ago
ghost 8a747de341 fix HTML/multimedia content detection 2 years ago
ghost 93c6067fd9 fix host page mime detection 2 years ago
ghost 80d3912bc7 allow x-raw-image links 2 years ago
ghost b23f550a1b skip magnet links 2 years ago
ghost acba2816e2 remove transaction from tables optimization case 2 years ago
ghost b2cf9fc6a5 do table optimization in separated transaction 2 years ago