Commit Graph

581 Commits

Author SHA1 Message Date
ghost
1c4904d333 update readme 2023-04-23 03:08:49 +03:00
ghost
14ba97f46a update readme 2023-04-23 03:04:01 +03:00
ghost
5b16d83ca1 update readme 2023-04-23 03:03:27 +03:00
ghost
9916fb701f implement basic api 2023-04-23 03:01:51 +03:00
ghost
81cb970248 add options documentation 2023-04-23 01:54:10 +03:00
ghost
8da150b295 add options documentation 2023-04-23 01:46:34 +03:00
ghost
8f09db5045 add options documentation 2023-04-23 01:32:34 +03:00
ghost
c4dfb58fe3 add options documentation 2023-04-23 01:14:31 +03:00
ghost
24472ea452 update readme 2023-04-12 13:11:09 +03:00
ghost
921317c667 update readme 2023-04-12 13:09:46 +03:00
ghost
7104cf19b7 update readme 2023-04-12 12:53:51 +03:00
ghost
fb18a9b955 update README.md 2023-04-10 03:24:10 +03:00
ghost
352466ad03 update host.robotsPostfix registry 2023-04-10 03:19:08 +03:00
ghost
e6b1e8029c add missed regex replacement rule 2023-04-10 03:18:50 +03:00
ghost
dfbc6132c9 fix robots:noindex condition, add robots:nofollow attribute support 2023-04-09 15:25:15 +03:00
ghost
5c8d299a4a add meta:robots tag support #2 2023-04-09 03:28:31 +03:00
ghost
6550eb310f update host.robotsPostfix rules 2023-04-09 03:10:42 +03:00
ghost
6cee58214e update host.robotsPostfix rules 2023-04-09 03:05:43 +03:00
ghost
6f4daf7a25 update host.robotsPostfix rule 2023-04-09 02:19:07 +03:00
ghost
f4db66d53f add new host.robotsPostfix rules 2023-04-09 02:14:13 +03:00
ghost
9018acd0e2 update meta tags 2023-04-09 01:22:36 +03:00
ghost
139e2c88eb add robots.txt 2023-04-09 01:16:53 +03:00
ghost
e505c76aaa update roadmap item by #5 answer 2023-04-09 00:37:19 +03:00
ghost
be7eae501b add host.status registry #1, #5 2023-04-09 00:28:51 +03:00
ghost
bee5086f22 add crontab configuration example, check roadmap item 2023-04-09 00:07:06 +03:00
ghost
8e8d89db0e implement database cleaner 2023-04-09 00:06:28 +03:00
ghost
3c9bc1adaa add required user-agent construction #5 2023-04-09 00:02:31 +03:00
ghost
0484d43482 fix trim path levels in the relative links 2023-04-08 23:52:46 +03:00
ghost
b819fda025 init yggdrasil robots.txt registry #5 2023-04-08 22:29:33 +03:00
ghost
df6f2a1869 implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5 2023-04-08 22:28:31 +03:00
ghost
505544c8c9 add affiliate link 2023-04-08 20:13:13 +03:00
ghost
b3c668706b trim path levels in the relative links 2023-04-08 19:14:04 +03:00
ghost
71a3e7dd0e skip x-raw-image links crawl 2023-04-08 19:11:12 +03:00
ghost
50b6e90380 Merge branch 'main' of https://github.com/YGGverse/YGGo into main 2023-04-08 18:23:51 +03:00
ghost
8d102ecdf7 index hosts with enabled status only 2023-04-08 18:23:48 +03:00
ghost
0b12e872a3 add host name to the search index 2023-04-08 18:22:53 +03:00
d47081
a29d6d5d0a
Update README.md 2023-04-07 18:24:50 +03:00
ghost
ab71b3823a update readme 2023-04-07 15:03:00 +03:00
ghost
e98146b78b index only 200 http code pages 2023-04-07 05:34:45 +03:00
ghost
9b9d40a97c skip javascript/mailto links index 2023-04-07 05:19:32 +03:00
ghost
2a843449e0 add process locked notice to the debug output 2023-04-07 04:58:56 +03:00
ghost
0f2b772fa8 remove not indexed pages from the search index 2023-04-07 04:50:01 +03:00
ghost
ce509ec0a8 remove debug row 2023-04-07 04:39:25 +03:00
ghost
2495a2bbc7 implement MySQL/Sphinx data model #3, add basical robots.txt support #2 2023-04-07 04:04:24 +03:00
d47081
a14d18fedb
Update README.md 2023-04-05 19:28:58 +03:00
d47081
4bb3e26c7b
Update README.md 2023-04-05 19:22:58 +03:00
d47081
9b8bd6d277
Update README.md 2023-04-05 19:20:51 +03:00
d47081
f25e95cb79
Update README.md 2023-04-05 19:19:39 +03:00
d47081
ceed482bd4
Update README.md 2023-04-05 19:18:51 +03:00
d47081
006460381b
Update README.md 2023-04-05 17:54:46 +03:00