Commit Graph

86 Commits

Author SHA1 Message Date
ghost
db0e66c846 refactor to mime-based content index #1 2023-05-10 12:47:36 +03:00
ghost
0ffcee1efb fix image description updates timing 2023-05-09 15:53:21 +03:00
ghost
2c5ca1b630 fix image description duplicate 2023-05-09 15:23:32 +03:00
ghost
28bf526d53 add host nsfw settings 2023-05-09 13:26:19 +03:00
ghost
dfca5570c6 remove unused construction 2023-05-09 12:10:42 +03:00
ghost
ef4de6b245 fix image search page errors 2023-05-09 08:53:33 +03:00
ghost
23ead4e12c update page / image description models, implement history snap crawling 2023-05-09 08:19:49 +03:00
ghost
0e9d29675f implement host page description history crawling 2023-05-09 01:29:32 +03:00
ghost
32d0f390d3 update http code and mime type on page/image ban event 2023-05-08 14:13:53 +03:00
ghost
8fbd7f3516 count totals using sphinx index instead of database 2023-05-08 12:28:49 +03:00
ghost
25b6bce2ec add crawler/cleaner logs 2023-05-08 11:04:59 +03:00
ghost
6c41dd5831 fix ban time update / count affected rows only 2023-05-06 10:11:25 +03:00
ghost
b6605b9132 implement not reachable resources ban feature with timeout to prevent extra http requests 2023-05-06 08:45:37 +03:00
ghost
702a14b634 add mime content type crawling #1 2023-05-06 07:25:54 +03:00
ghost
5999fb3a73 add distributed hosts crawling using yggo nodes manifest 2023-05-05 05:26:53 +03:00
ghost
f0b2eb1613 show images total instead of pages in placeholder on image search page 2023-05-05 01:42:44 +03:00
ghost
297563d4a5 display related pages in priority to the unique host by rank, rand() order 2023-05-04 10:53:37 +03:00
ghost
34b7291228 add related to image hostpages limit 2023-05-04 10:17:47 +03:00
ghost
adc791f378 fix updateTime init 2023-05-04 10:11:13 +03:00
ghost
d4f66c83e7 fix image crawling errors 2023-05-04 08:51:45 +03:00
ghost
73f212e3d7 set crawler queue order priority to item rank, rand() 2023-05-04 06:55:05 +03:00
ghost
9ed8411d2f add image queue crawler 2023-05-04 06:45:04 +03:00
ghost
d905e33b4f update host images info on search requests 2023-05-04 06:12:51 +03:00
ghost
68581960a3 add image.data field 2023-05-04 05:19:29 +03:00
ghost
250e20bbcd remove separator 2023-05-04 04:19:38 +03:00
ghost
6b18202588 implement proxied image search #1 2023-05-04 03:48:57 +03:00
ghost
0741a3e9ef implement image crawler 2023-05-04 01:04:39 +03:00
ghost
6d8f4f4882 create manifests registry 2023-05-03 09:22:14 +03:00
ghost
a5f5541395 skip robots:noindex page without extra actions 2023-04-29 08:58:48 +03:00
ghost
11aa404807 add metaYggo field index 2023-04-25 21:10:59 +03:00
ghost
8671fc4bde implement page ranking 2023-04-25 16:54:01 +03:00
ghost
9916fb701f implement basic api 2023-04-23 03:01:51 +03:00
ghost
5c8d299a4a add meta:robots tag support #2 2023-04-09 03:28:31 +03:00
ghost
8e8d89db0e implement database cleaner 2023-04-09 00:06:28 +03:00
ghost
df6f2a1869 implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5 2023-04-08 22:28:31 +03:00
ghost
2495a2bbc7 implement MySQL/Sphinx data model #3, add basical robots.txt support #2 2023-04-07 04:04:24 +03:00