ghost
|
7c5ba050b2
|
fix media crawling
|
2023-05-10 18:35:18 +03:00 |
|
ghost
|
0fed16621a
|
fix mime content type update
|
2023-05-10 14:47:33 +03:00 |
|
ghost
|
db0e66c846
|
refactor to mime-based content index #1
|
2023-05-10 12:47:36 +03:00 |
|
ghost
|
0ffcee1efb
|
fix image description updates timing
|
2023-05-09 15:53:21 +03:00 |
|
ghost
|
2c5ca1b630
|
fix image description duplicate
|
2023-05-09 15:23:32 +03:00 |
|
ghost
|
28bf526d53
|
add host nsfw settings
|
2023-05-09 13:26:19 +03:00 |
|
ghost
|
8ce0324e94
|
convert page data to string
|
2023-05-09 12:52:07 +03:00 |
|
ghost
|
d186fff48f
|
skip curl download on response data size reached
|
2023-05-09 10:21:37 +03:00 |
|
ghost
|
d7a5f7ef84
|
remove content filter, snap raw the data
|
2023-05-09 09:02:17 +03:00 |
|
ghost
|
23ead4e12c
|
update page / image description models, implement history snap crawling
|
2023-05-09 08:19:49 +03:00 |
|
ghost
|
0e9d29675f
|
implement host page description history crawling
|
2023-05-09 01:29:32 +03:00 |
|
ghost
|
6371def666
|
fix attributes passing
|
2023-05-08 17:52:17 +03:00 |
|
ghost
|
32d0f390d3
|
update http code and mime type on page/image ban event
|
2023-05-08 14:13:53 +03:00 |
|
ghost
|
84dcecf50b
|
add svg images support, fix mime validation
|
2023-05-08 13:12:16 +03:00 |
|
ghost
|
bf1eeb332c
|
fix page/image mime content type detection
|
2023-05-08 12:10:57 +03:00 |
|
ghost
|
25b6bce2ec
|
add crawler/cleaner logs
|
2023-05-08 11:04:59 +03:00 |
|
ghost
|
dcdc2c50ad
|
update debug string names
|
2023-05-08 08:31:34 +03:00 |
|
ghost
|
ea04220de3
|
add curl requests debug
|
2023-05-08 08:27:21 +03:00 |
|
ghost
|
1aba060d34
|
fix variable name
|
2023-05-08 07:23:50 +03:00 |
|
ghost
|
fdd18de373
|
remove abstraction
|
2023-05-06 14:03:43 +03:00 |
|
ghost
|
6c41dd5831
|
fix ban time update / count affected rows only
|
2023-05-06 10:11:25 +03:00 |
|
ghost
|
20514c455f
|
add banned items counters
|
2023-05-06 08:50:41 +03:00 |
|
ghost
|
b6605b9132
|
implement not reachable resources ban feature with timeout to prevent extra http requests
|
2023-05-06 08:45:37 +03:00 |
|
ghost
|
702a14b634
|
add mime content type crawling #1
|
2023-05-06 07:25:54 +03:00 |
|
ghost
|
0bd95d7f4d
|
fix comments
|
2023-05-05 21:39:48 +03:00 |
|
ghost
|
f88d2ee9ff
|
implement MIME content-type crawler filter
|
2023-05-05 21:25:57 +03:00 |
|
ghost
|
5999fb3a73
|
add distributed hosts crawling using yggo nodes manifest
|
2023-05-05 05:26:53 +03:00 |
|
ghost
|
5297e6e918
|
fix condition error
|
2023-05-04 11:35:22 +03:00 |
|
ghost
|
0cc712f24e
|
fix variable definition
|
2023-05-04 09:24:21 +03:00 |
|
ghost
|
d4f66c83e7
|
fix image crawling errors
|
2023-05-04 08:51:45 +03:00 |
|
ghost
|
baa8b0d2f0
|
fix data type formatting
|
2023-05-04 07:58:07 +03:00 |
|
ghost
|
79878d17fe
|
add crawler / proxy user agent settings
|
2023-05-04 07:38:22 +03:00 |
|
ghost
|
9ed8411d2f
|
add image queue crawler
|
2023-05-04 06:45:04 +03:00 |
|
ghost
|
d905e33b4f
|
update host images info on search requests
|
2023-05-04 06:12:51 +03:00 |
|
ghost
|
0741a3e9ef
|
implement image crawler
|
2023-05-04 01:04:39 +03:00 |
|
ghost
|
1ee2ac4f0b
|
add yggo:manifest namespace
|
2023-05-03 09:38:58 +03:00 |
|
ghost
|
f8e0a50db6
|
add manifest url filter
|
2023-05-03 09:26:48 +03:00 |
|
ghost
|
6d8f4f4882
|
create manifests registry
|
2023-05-03 09:22:14 +03:00 |
|
ghost
|
eb3e70a7b7
|
fix robots.txt conditions
|
2023-05-03 04:17:58 +03:00 |
|
ghost
|
a5f5541395
|
skip robots:noindex page without extra actions
|
2023-04-29 08:58:48 +03:00 |
|
ghost
|
e418ddcd32
|
fix data type
|
2023-04-25 21:20:35 +03:00 |
|
ghost
|
11aa404807
|
add metaYggo field index
|
2023-04-25 21:10:59 +03:00 |
|
ghost
|
5875dd58c9
|
fix PR update condition
|
2023-04-25 18:19:22 +03:00 |
|
ghost
|
8671fc4bde
|
implement page ranking
|
2023-04-25 16:54:01 +03:00 |
|
ghost
|
5936fa9a30
|
fix quota check condition
|
2023-04-23 04:31:32 +03:00 |
|
ghost
|
8dbb4a06af
|
add disk quota validation
|
2023-04-23 04:05:00 +03:00 |
|
ghost
|
dfbc6132c9
|
fix robots:noindex condition, add robots:nofollow attribute support
|
2023-04-09 15:25:15 +03:00 |
|
ghost
|
5c8d299a4a
|
add meta:robots tag support #2
|
2023-04-09 03:28:31 +03:00 |
|
ghost
|
0484d43482
|
fix trim path levels in the relative links
|
2023-04-08 23:52:46 +03:00 |
|
ghost
|
df6f2a1869
|
implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5
|
2023-04-08 22:28:31 +03:00 |
|