Commit Graph

113 Commits

Author SHA1 Message Date
ghost
307eb03600 build host/host page URL in SQL query 2023-07-30 13:02:24 +03:00
ghost
1f33205236 add script tag support 2023-07-30 00:52:55 +03:00
ghost
b433fa6b3c add link tag support 2023-07-30 00:17:28 +03:00
ghost
79d07dd1a5 fix snap fs init 2023-07-29 19:53:31 +03:00
ghost
6eb45fdad2 fix snap crc32name index 2023-07-29 19:38:09 +03:00
ghost
712d67f6bf implement unlimited snap storage mirrors, delete megaCMD integration 2023-07-29 14:37:01 +03:00
ghost
2c17c93e2f fix broken snaps autodelection 2023-07-28 12:54:15 +03:00
ghost
1dd0a8ee2c make page rank procedural, optimize performance 2023-07-28 12:49:43 +03:00
ghost
2e2501b437 implement sitemap support 2023-07-27 11:44:42 +03:00
ghost
4cb27f563f fix meta index/nofollow processing 2023-07-12 12:27:30 +03:00
ghost
b7c415a8b0 crawl host page DOM selectors on meta robots:index/follow condition enabled only 2023-07-12 12:16:26 +03:00
ghost
443eaec64e autodelete failed snaps 2023-07-07 12:30:07 +03:00
ghost
4298203cab make paths absolute 2023-06-30 14:38:29 +03:00
ghost
3218add372 add custom home page reindex settings 2023-06-30 13:28:22 +03:00
ghost
d912caeb0c fix variable name 2023-06-27 13:01:46 +03:00
ghost
5346b13602 implement custom hostPageDom elements index 2023-06-25 22:10:47 +03:00
ghost
5df598a1d4 fix variable name 2023-06-24 15:21:47 +03:00
ghost
e16a7b8171 fix HY000/1366 error processing 2023-06-17 11:33:32 +03:00
ghost
dc2d971ba0 clean up banned pages extra data 2023-06-16 16:53:14 +03:00
ghost
d96abb8ea8 ban host page on encoding not detected 2023-06-16 13:23:52 +03:00
ghost
d2469e9adc fix meta variables overwrite 2023-06-14 02:53:14 +03:00
ghost
1d5d5ead5d fix DomDocument initiation without encoding provided 2023-06-14 02:20:00 +03:00
ghost
8a747de341 fix HTML/multimedia content detection 2023-06-13 23:09:44 +03:00
ghost
93c6067fd9 fix host page mime detection 2023-06-13 22:29:28 +03:00
ghost
80d3912bc7 allow x-raw-image links 2023-06-13 20:26:17 +03:00
ghost
b23f550a1b skip magnet links 2023-06-13 20:25:37 +03:00
ghost
acba2816e2 remove transaction from tables optimization case 2023-06-13 17:45:02 +03:00
ghost
b2cf9fc6a5 do table optimization in separated transaction 2023-06-13 16:51:16 +03:00
ghost
ab78e17ca8 add hostPage.size collection 2023-06-13 12:45:12 +03:00
ghost
0af5d165d3 remove logCrawler column not in use 2023-06-05 22:06:55 +03:00
ghost
4b16b41440 make transaction for each item in crawl queue 2023-06-05 22:01:22 +03:00
ghost
b585b16d31 fix datatype error detection 2023-06-05 21:02:18 +03:00
ghost
c5e25d17fb prevent page ban when it MIME in the whitelist, skip steps below only (make multimedia/streaming resources visible in search results) 2023-06-04 17:44:09 +03:00
ghost
4fa33afe40 prevent infinitive connection on streaming resources detected 2023-06-04 17:02:32 +03:00
ghost
345c59b5f4 collect target location links on page redirect available 2023-06-04 14:58:33 +03:00
ghost
5d7f2bf68c fix snap foreign keys deletion 2023-06-04 13:39:47 +03:00
ghost
242e0abd86 ban pages only on data type error codes only 2023-06-04 13:10:32 +03:00
ghost
62a4f33b53 load missed dependency 2023-06-04 12:27:20 +03:00
ghost
512bd56056 ban page that throws the error and stuck the crawl queue 2023-06-04 12:04:41 +03:00
ghost
45c4f7b7b0 add database optimization settings 2023-05-29 22:13:41 +03:00
ghost
81f7ea1e1e implement multi-storage snap downloads 2023-05-15 09:18:18 +03:00
ghost
1969707eeb integrate optional MEGA/cmd snap storage 2023-05-14 19:41:20 +03:00
ghost
bd99dcb023 add leading zero to mkdir access code 2023-05-14 05:43:03 +03:00
ghost
48664f0caf fix zip close, loop brake condition 2023-05-14 04:33:35 +03:00
ghost
50c9066f62 add tables optimization to the cron/cleaner task 2023-05-14 02:39:32 +03:00
ghost
0d19004e86 make local snap storage optimization 2023-05-14 01:45:55 +03:00
ghost
efc66d5dab update local snap storage paths 2023-05-13 11:06:40 +03:00
ghost
2f7d99079d implement local snaps 2023-05-13 10:15:07 +03:00
ghost
9477d87b2e change strpos to stripos 2023-05-13 01:28:50 +03:00
ghost
28e8bcf8d7 add audio/video media crawl support 2023-05-13 01:23:09 +03:00