Commit Graph

129 Commits

Author SHA1 Message Date
ghost
e953c01eaa update debug message 2023-08-05 21:55:37 +03:00
ghost
bd212edb97 update debug message 2023-08-05 21:52:26 +03:00
ghost
1b287c8d28 update debug message 2023-08-05 21:40:59 +03:00
ghost
562b97ba8f update debug message 2023-08-05 21:39:44 +03:00
ghost
7ddb47619a update debug message 2023-08-05 21:17:05 +03:00
ghost
513addc7af add query totals counting, update crawler debug 2023-08-05 21:03:45 +03:00
ghost
004a5336de remove htmls pages ban on title tag not available 2023-08-05 20:01:31 +03:00
ghost
de28d85a71 add connection exceptions 2023-08-05 19:39:49 +03:00
ghost
d46c4921c5 add page break 2023-08-05 19:24:32 +03:00
ghost
d024ffd770 implement unlimited settings customization for each host 2023-08-05 19:06:39 +03:00
ghost
ab6c0379c8 implement hosts crawl queue, move robots, sitemaps, manifests to this task 2023-08-04 09:32:12 +03:00
ghost
6ee5e53ef4 show sitemaps processed debug 2023-08-04 09:07:46 +03:00
ghost
71724ae33f refactor manifest crawling 2023-08-04 09:00:03 +03:00
ghost
efbbf19601 fix multimedia snaps 2023-08-03 17:41:55 +03:00
ghost
b24d31f360 refactor cleaner, delegate tasks to crawler, init hostSetting table 2023-08-03 15:25:38 +03:00
ghost
11e02da66d memory usage optimization, rename methods, remove memchached dependency from the model 2023-08-03 10:48:27 +03:00
ghost
cbabea595b rename method name 2023-08-03 10:26:37 +03:00
ghost
1249e8d29c fix CRAWL_PAGE_RANK_UPDATE condition 2023-08-02 21:22:31 +03:00
ghost
5df59661d8 add page rank update optional in the crawl queue 2023-08-02 21:21:23 +03:00
ghost
1d7deffc4c update PR generation, delegate PR value from redirecting pages, update method names 2023-08-02 15:43:44 +03:00
ghost
1655ec63b2 skip xmpp links 2023-08-02 11:57:54 +03:00
ghost
06c136f05c fix meta/nofollow attribute processing 2023-08-02 10:56:25 +03:00
ghost
43776b5ff4 fix semaphores 2023-08-01 17:53:14 +03:00
ghost
fd3444a379 change timestamp sort order 2023-07-31 14:25:38 +03:00
ghost
9c0f361601 refactor snap storage 2023-07-31 13:33:30 +03:00
ghost
547cd6717b prevent scheduled execution on cli/yggo running 2023-07-30 21:47:09 +03:00
ghost
3e3b7ee2ef optimize snaps, delete unused constructions 2023-07-30 19:09:41 +03:00
ghost
307eb03600 build host/host page URL in SQL query 2023-07-30 13:02:24 +03:00
ghost
1f33205236 add script tag support 2023-07-30 00:52:55 +03:00
ghost
b433fa6b3c add link tag support 2023-07-30 00:17:28 +03:00
ghost
79d07dd1a5 fix snap fs init 2023-07-29 19:53:31 +03:00
ghost
6eb45fdad2 fix snap crc32name index 2023-07-29 19:38:09 +03:00
ghost
712d67f6bf implement unlimited snap storage mirrors, delete megaCMD integration 2023-07-29 14:37:01 +03:00
ghost
1dd0a8ee2c make page rank procedural, optimize performance 2023-07-28 12:49:43 +03:00
ghost
2e2501b437 implement sitemap support 2023-07-27 11:44:42 +03:00
ghost
4cb27f563f fix meta index/nofollow processing 2023-07-12 12:27:30 +03:00
ghost
b7c415a8b0 crawl host page DOM selectors on meta robots:index/follow condition enabled only 2023-07-12 12:16:26 +03:00
ghost
4298203cab make paths absolute 2023-06-30 14:38:29 +03:00
ghost
3218add372 add custom home page reindex settings 2023-06-30 13:28:22 +03:00
ghost
5346b13602 implement custom hostPageDom elements index 2023-06-25 22:10:47 +03:00
ghost
5df598a1d4 fix variable name 2023-06-24 15:21:47 +03:00
ghost
e16a7b8171 fix HY000/1366 error processing 2023-06-17 11:33:32 +03:00
ghost
d96abb8ea8 ban host page on encoding not detected 2023-06-16 13:23:52 +03:00
ghost
d2469e9adc fix meta variables overwrite 2023-06-14 02:53:14 +03:00
ghost
1d5d5ead5d fix DomDocument initiation without encoding provided 2023-06-14 02:20:00 +03:00
ghost
8a747de341 fix HTML/multimedia content detection 2023-06-13 23:09:44 +03:00
ghost
93c6067fd9 fix host page mime detection 2023-06-13 22:29:28 +03:00
ghost
80d3912bc7 allow x-raw-image links 2023-06-13 20:26:17 +03:00
ghost
b23f550a1b skip magnet links 2023-06-13 20:25:37 +03:00
ghost
ab78e17ca8 add hostPage.size collection 2023-06-13 12:45:12 +03:00