37 Commits (ca92db88264fc19e7fae58b447ae2a5306437688)

Author SHA1 Message Date
ghost eccb7ea241 refactor hostPageDom tables, add multiple selectors and children values support 1 year ago
ghost 2b49ff5f6a move hostPageDescription.data field data to hostPageDom.value 1 year ago
ghost d024ffd770 implement unlimited settings customization for each host 1 year ago
ghost 71724ae33f refactor manifest crawling 1 year ago
ghost b24d31f360 refactor cleaner, delegate tasks to crawler, init hostSetting table 1 year ago
ghost 3e3b7ee2ef optimize snaps, delete unused constructions 1 year ago
ghost 712d67f6bf implement unlimited snap storage mirrors, delete megaCMD integration 1 year ago
ghost 1dd0a8ee2c make page rank procedural, optimize performance 1 year ago
ghost 5346b13602 implement custom hostPageDom elements index 2 years ago
ghost 0949d7f871 set default encoding 2 years ago
ghost ab78e17ca8 add hostPage.size collection 2 years ago
ghost 7892784f5c add httpCode column to hostPageSnapDownload table 2 years ago
ghost 0af5d165d3 remove logCrawler column not in use 2 years ago
ghost 81f7ea1e1e implement multi-storage snap downloads 2 years ago
ghost 1969707eeb integrate optional MEGA/cmd snap storage 2 years ago
ghost 0d19004e86 make local snap storage optimization 2 years ago
ghost 2f7d99079d implement local snaps 2 years ago
ghost d98b8f5c94 remove `hostPageToHostPage`.`quantity` field because of implements wrong duplicates counting on reindex 2 years ago
ghost db0e66c846 refactor to mime-based content index #1 2 years ago
ghost 2c5ca1b630 fix image description duplicate 2 years ago
ghost 1c7cca1446 fix UNIQUE index relation 2 years ago
ghost 28bf526d53 add host nsfw settings 2 years ago
ghost 23ead4e12c update page / image description models, implement history snap crawling 2 years ago
ghost 0e9d29675f implement host page description history crawling 2 years ago
ghost 25b6bce2ec add crawler/cleaner logs 2 years ago
ghost b6605b9132 implement not reachable resources ban feature with timeout to prevent extra http requests 2 years ago
ghost 702a14b634 add mime content type crawling #1 2 years ago
ghost 5999fb3a73 add distributed hosts crawling using yggo nodes manifest 2 years ago
ghost d4f66c83e7 fix image crawling errors 2 years ago
ghost 68581960a3 add image.data field 2 years ago
ghost 0741a3e9ef implement image crawler 2 years ago
ghost 78931ebc74 normalize host image description storage 2 years ago
ghost db617f9939 refactor image storage model 2 years ago
ghost 6d8f4f4882 create manifests registry 2 years ago
ghost 11aa404807 add metaYggo field index 2 years ago
ghost df6f2a1869 implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5 2 years ago
ghost 2495a2bbc7 implement MySQL/Sphinx data model #3, add basical robots.txt support #2 2 years ago