ghost
|
d4f66c83e7
|
fix image crawling errors
|
2023-05-04 08:51:45 +03:00 |
|
ghost
|
baa8b0d2f0
|
fix data type formatting
|
2023-05-04 07:58:07 +03:00 |
|
ghost
|
79878d17fe
|
add crawler / proxy user agent settings
|
2023-05-04 07:38:22 +03:00 |
|
ghost
|
9ed8411d2f
|
add image queue crawler
|
2023-05-04 06:45:04 +03:00 |
|
ghost
|
d905e33b4f
|
update host images info on search requests
|
2023-05-04 06:12:51 +03:00 |
|
ghost
|
0741a3e9ef
|
implement image crawler
|
2023-05-04 01:04:39 +03:00 |
|
ghost
|
1ee2ac4f0b
|
add yggo:manifest namespace
|
2023-05-03 09:38:58 +03:00 |
|
ghost
|
f8e0a50db6
|
add manifest url filter
|
2023-05-03 09:26:48 +03:00 |
|
ghost
|
6d8f4f4882
|
create manifests registry
|
2023-05-03 09:22:14 +03:00 |
|
ghost
|
eb3e70a7b7
|
fix robots.txt conditions
|
2023-05-03 04:17:58 +03:00 |
|
ghost
|
a5f5541395
|
skip robots:noindex page without extra actions
|
2023-04-29 08:58:48 +03:00 |
|
ghost
|
e418ddcd32
|
fix data type
|
2023-04-25 21:20:35 +03:00 |
|
ghost
|
11aa404807
|
add metaYggo field index
|
2023-04-25 21:10:59 +03:00 |
|
ghost
|
5875dd58c9
|
fix PR update condition
|
2023-04-25 18:19:22 +03:00 |
|
ghost
|
8671fc4bde
|
implement page ranking
|
2023-04-25 16:54:01 +03:00 |
|
ghost
|
5936fa9a30
|
fix quota check condition
|
2023-04-23 04:31:32 +03:00 |
|
ghost
|
8dbb4a06af
|
add disk quota validation
|
2023-04-23 04:05:00 +03:00 |
|
ghost
|
dfbc6132c9
|
fix robots:noindex condition, add robots:nofollow attribute support
|
2023-04-09 15:25:15 +03:00 |
|
ghost
|
5c8d299a4a
|
add meta:robots tag support #2
|
2023-04-09 03:28:31 +03:00 |
|
ghost
|
8e8d89db0e
|
implement database cleaner
|
2023-04-09 00:06:28 +03:00 |
|
ghost
|
0484d43482
|
fix trim path levels in the relative links
|
2023-04-08 23:52:46 +03:00 |
|
ghost
|
df6f2a1869
|
implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5
|
2023-04-08 22:28:31 +03:00 |
|
ghost
|
b3c668706b
|
trim path levels in the relative links
|
2023-04-08 19:14:04 +03:00 |
|
ghost
|
71a3e7dd0e
|
skip x-raw-image links crawl
|
2023-04-08 19:11:12 +03:00 |
|
ghost
|
9b9d40a97c
|
skip javascript/mailto links index
|
2023-04-07 05:19:32 +03:00 |
|
ghost
|
2a843449e0
|
add process locked notice to the debug output
|
2023-04-07 04:58:56 +03:00 |
|
ghost
|
ce509ec0a8
|
remove debug row
|
2023-04-07 04:39:25 +03:00 |
|
ghost
|
2495a2bbc7
|
implement MySQL/Sphinx data model #3, add basical robots.txt support #2
|
2023-04-07 04:04:24 +03:00 |
|
ghost
|
79663c84db
|
add CRAWL_META_ONLY option
|
2023-04-03 03:07:54 +03:00 |
|
ghost
|
04dbbc3adf
|
make url/src column ukeys digital by using crc32
|
2023-04-02 18:56:56 +03:00 |
|
ghost
|
b218b8bbc3
|
make url/src columns unique keys, add insert/ignore construction
|
2023-04-02 18:09:44 +03:00 |
|
ghost
|
1485983b3a
|
lock multi-thread execution
|
2023-04-02 00:27:33 +03:00 |
|
ghost
|
72985eaf9e
|
initial commit
|
2023-04-01 19:29:39 +03:00 |
|