24 Commits (6d8f4f4882b47d0d071b849416d03684abfeaca7)

Author SHA1 Message Date
ghost 6d8f4f4882 create manifests registry 2 years ago
ghost eb3e70a7b7 fix robots.txt conditions 2 years ago
ghost a5f5541395 skip robots:noindex page without extra actions 2 years ago
ghost e418ddcd32 fix data type 2 years ago
ghost 11aa404807 add metaYggo field index 2 years ago
ghost 5875dd58c9 fix PR update condition 2 years ago
ghost 8671fc4bde implement page ranking 2 years ago
ghost 5936fa9a30 fix quota check condition 2 years ago
ghost 8dbb4a06af add disk quota validation 2 years ago
ghost dfbc6132c9 fix robots:noindex condition, add robots:nofollow attribute support 2 years ago
ghost 5c8d299a4a add meta:robots tag support #2 2 years ago
ghost 0484d43482 fix trim path levels in the relative links 2 years ago
ghost df6f2a1869 implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5 2 years ago
ghost b3c668706b trim path levels in the relative links 2 years ago
ghost 71a3e7dd0e skip x-raw-image links crawl 2 years ago
ghost 9b9d40a97c skip javascript/mailto links index 2 years ago
ghost 2a843449e0 add process locked notice to the debug output 2 years ago
ghost ce509ec0a8 remove debug row 2 years ago
ghost 2495a2bbc7 implement MySQL/Sphinx data model #3, add basical robots.txt support #2 2 years ago
ghost 79663c84db add CRAWL_META_ONLY option 2 years ago
ghost 04dbbc3adf make url/src column ukeys digital by using crc32 2 years ago
ghost b218b8bbc3 make url/src columns unique keys, add insert/ignore construction 2 years ago
ghost 1485983b3a lock multi-thread execution 2 years ago
ghost 72985eaf9e initial commit 2 years ago