138 Commits (c5ae6974bd09837e68a92ea0e8f17fed4666653c)

Author SHA1 Message Date
ghost f4bf6b9fa4 fix crawl queue message 2 years ago
ghost 29197ab904 remove testing construction 2 years ago
ghost ed240d53b0 show available snaps only 2 years ago
ghost 5346b13602 implement custom hostPageDom elements index 2 years ago
ghost 1c5346bc07 remove single char words 2 years ago
ghost dc2d971ba0 clean up banned pages extra data 2 years ago
ghost a657d31e1d fix enum data type 2 years ago
ghost f3475035c2 show page size in explorer view, hide not available data 2 years ago
ghost ab78e17ca8 add hostPage.size collection 2 years ago
ghost 7892784f5c add httpCode column to hostPageSnapDownload table 2 years ago
ghost edec590e09 fix MAYBE filter in the default search mode 2 years ago
ghost e1fb7f8c17 change query separators to the MAYBE operator in default search mode 2 years ago
ghost 0af5d165d3 remove logCrawler column not in use 2 years ago
ghost 4fa33afe40 prevent infinitive connection on streaming resources detected 2 years ago
ghost 345c59b5f4 collect target location links on page redirect available 2 years ago
ghost f49076bb0c index homepages and shorter URL with higher priority 2 years ago
ghost 81f7ea1e1e implement multi-storage snap downloads 2 years ago
ghost 1969707eeb integrate optional MEGA/cmd snap storage 2 years ago
ghost 50c9066f62 add tables optimization to the cron/cleaner task 2 years ago
ghost 0d19004e86 make local snap storage optimization 2 years ago
ghost 2f7d99079d implement local snaps 2 years ago
ghost d98b8f5c94 remove `hostPageToHostPage`.`quantity` field because of implements wrong duplicates counting on reindex 2 years ago
ghost eeeb3dceac implement index explorer 2 years ago
ghost 377b519a2c implement host page info mode 2 years ago
ghost 371670fadf add media referrers info 2 years ago
ghost 4486bdc215 show mime type options that match search results only 2 years ago
ghost 307ebcf0b1 add page description on title | description | keywords not empty, remove deprecated constructions 2 years ago
ghost 7c5ba050b2 fix media crawling 2 years ago
ghost 0fed16621a fix mime content type update 2 years ago
ghost db0e66c846 refactor to mime-based content index #1 2 years ago
ghost 0ffcee1efb fix image description updates timing 2 years ago
ghost 2c5ca1b630 fix image description duplicate 2 years ago
ghost 28bf526d53 add host nsfw settings 2 years ago
ghost 8ce0324e94 convert page data to string 2 years ago
ghost dfca5570c6 remove unused construction 2 years ago
ghost d186fff48f skip curl download on response data size reached 2 years ago
ghost ef4de6b245 fix image search page errors 2 years ago
ghost 23ead4e12c update page / image description models, implement history snap crawling 2 years ago
ghost 0e9d29675f implement host page description history crawling 2 years ago
ghost 32d0f390d3 update http code and mime type on page/image ban event 2 years ago
ghost 8fbd7f3516 count totals using sphinx index instead of database 2 years ago
ghost 25b6bce2ec add crawler/cleaner logs 2 years ago
ghost ea04220de3 add curl requests debug 2 years ago
ghost 6c41dd5831 fix ban time update / count affected rows only 2 years ago
ghost b6605b9132 implement not reachable resources ban feature with timeout to prevent extra http requests 2 years ago
ghost 702a14b634 add mime content type crawling #1 2 years ago
ghost f88d2ee9ff implement MIME content-type crawler filter 2 years ago
ghost bed5d3f149 fix offset out of bounds error 2 years ago
ghost 5999fb3a73 add distributed hosts crawling using yggo nodes manifest 2 years ago
ghost f0b2eb1613 show images total instead of pages in placeholder on image search page 2 years ago