ghost
|
0484d43482
|
fix trim path levels in the relative links
|
2023-04-08 23:52:46 +03:00 |
|
ghost
|
df6f2a1869
|
implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5
|
2023-04-08 22:28:31 +03:00 |
|
ghost
|
b3c668706b
|
trim path levels in the relative links
|
2023-04-08 19:14:04 +03:00 |
|
ghost
|
71a3e7dd0e
|
skip x-raw-image links crawl
|
2023-04-08 19:11:12 +03:00 |
|
ghost
|
9b9d40a97c
|
skip javascript/mailto links index
|
2023-04-07 05:19:32 +03:00 |
|
ghost
|
2a843449e0
|
add process locked notice to the debug output
|
2023-04-07 04:58:56 +03:00 |
|
ghost
|
ce509ec0a8
|
remove debug row
|
2023-04-07 04:39:25 +03:00 |
|
ghost
|
2495a2bbc7
|
implement MySQL/Sphinx data model #3, add basical robots.txt support #2
|
2023-04-07 04:04:24 +03:00 |
|
ghost
|
79663c84db
|
add CRAWL_META_ONLY option
|
2023-04-03 03:07:54 +03:00 |
|
ghost
|
04dbbc3adf
|
make url/src column ukeys digital by using crc32
|
2023-04-02 18:56:56 +03:00 |
|
ghost
|
b218b8bbc3
|
make url/src columns unique keys, add insert/ignore construction
|
2023-04-02 18:09:44 +03:00 |
|
ghost
|
1485983b3a
|
lock multi-thread execution
|
2023-04-02 00:27:33 +03:00 |
|
ghost
|
72985eaf9e
|
initial commit
|
2023-04-01 19:29:39 +03:00 |
|