ghost
|
b24d31f360
|
refactor cleaner, delegate tasks to crawler, init hostSetting table
|
2023-08-03 15:25:38 +03:00 |
|
ghost
|
fd90e2d517
|
keep banned pages data
|
2023-08-03 14:31:06 +03:00 |
|
ghost
|
11e02da66d
|
memory usage optimization, rename methods, remove memchached dependency from the model
|
2023-08-03 10:48:27 +03:00 |
|
ghost
|
cbabea595b
|
rename method name
|
2023-08-03 10:26:37 +03:00 |
|
ghost
|
1249e8d29c
|
fix CRAWL_PAGE_RANK_UPDATE condition
|
2023-08-02 21:22:31 +03:00 |
|
ghost
|
5df59661d8
|
add page rank update optional in the crawl queue
|
2023-08-02 21:21:23 +03:00 |
|
ghost
|
1d7deffc4c
|
update PR generation, delegate PR value from redirecting pages, update method names
|
2023-08-02 15:43:44 +03:00 |
|
ghost
|
1655ec63b2
|
skip xmpp links
|
2023-08-02 11:57:54 +03:00 |
|
ghost
|
06c136f05c
|
fix meta/nofollow attribute processing
|
2023-08-02 10:56:25 +03:00 |
|
ghost
|
43776b5ff4
|
fix semaphores
|
2023-08-01 17:53:14 +03:00 |
|
ghost
|
fd3444a379
|
change timestamp sort order
|
2023-07-31 14:25:38 +03:00 |
|
ghost
|
9c0f361601
|
refactor snap storage
|
2023-07-31 13:33:30 +03:00 |
|
ghost
|
547cd6717b
|
prevent scheduled execution on cli/yggo running
|
2023-07-30 21:47:09 +03:00 |
|
ghost
|
3e3b7ee2ef
|
optimize snaps, delete unused constructions
|
2023-07-30 19:09:41 +03:00 |
|
ghost
|
307eb03600
|
build host/host page URL in SQL query
|
2023-07-30 13:02:24 +03:00 |
|
ghost
|
1f33205236
|
add script tag support
|
2023-07-30 00:52:55 +03:00 |
|
ghost
|
b433fa6b3c
|
add link tag support
|
2023-07-30 00:17:28 +03:00 |
|
ghost
|
79d07dd1a5
|
fix snap fs init
|
2023-07-29 19:53:31 +03:00 |
|
ghost
|
6eb45fdad2
|
fix snap crc32name index
|
2023-07-29 19:38:09 +03:00 |
|
ghost
|
712d67f6bf
|
implement unlimited snap storage mirrors, delete megaCMD integration
|
2023-07-29 14:37:01 +03:00 |
|
ghost
|
2c17c93e2f
|
fix broken snaps autodelection
|
2023-07-28 12:54:15 +03:00 |
|
ghost
|
1dd0a8ee2c
|
make page rank procedural, optimize performance
|
2023-07-28 12:49:43 +03:00 |
|
ghost
|
2e2501b437
|
implement sitemap support
|
2023-07-27 11:44:42 +03:00 |
|
ghost
|
4cb27f563f
|
fix meta index/nofollow processing
|
2023-07-12 12:27:30 +03:00 |
|
ghost
|
b7c415a8b0
|
crawl host page DOM selectors on meta robots:index/follow condition enabled only
|
2023-07-12 12:16:26 +03:00 |
|
ghost
|
443eaec64e
|
autodelete failed snaps
|
2023-07-07 12:30:07 +03:00 |
|
ghost
|
4298203cab
|
make paths absolute
|
2023-06-30 14:38:29 +03:00 |
|
ghost
|
3218add372
|
add custom home page reindex settings
|
2023-06-30 13:28:22 +03:00 |
|
ghost
|
d912caeb0c
|
fix variable name
|
2023-06-27 13:01:46 +03:00 |
|
ghost
|
5346b13602
|
implement custom hostPageDom elements index
|
2023-06-25 22:10:47 +03:00 |
|
ghost
|
5df598a1d4
|
fix variable name
|
2023-06-24 15:21:47 +03:00 |
|
ghost
|
e16a7b8171
|
fix HY000/1366 error processing
|
2023-06-17 11:33:32 +03:00 |
|
ghost
|
dc2d971ba0
|
clean up banned pages extra data
|
2023-06-16 16:53:14 +03:00 |
|
ghost
|
d96abb8ea8
|
ban host page on encoding not detected
|
2023-06-16 13:23:52 +03:00 |
|
ghost
|
d2469e9adc
|
fix meta variables overwrite
|
2023-06-14 02:53:14 +03:00 |
|
ghost
|
1d5d5ead5d
|
fix DomDocument initiation without encoding provided
|
2023-06-14 02:20:00 +03:00 |
|
ghost
|
8a747de341
|
fix HTML/multimedia content detection
|
2023-06-13 23:09:44 +03:00 |
|
ghost
|
93c6067fd9
|
fix host page mime detection
|
2023-06-13 22:29:28 +03:00 |
|
ghost
|
80d3912bc7
|
allow x-raw-image links
|
2023-06-13 20:26:17 +03:00 |
|
ghost
|
b23f550a1b
|
skip magnet links
|
2023-06-13 20:25:37 +03:00 |
|
ghost
|
acba2816e2
|
remove transaction from tables optimization case
|
2023-06-13 17:45:02 +03:00 |
|
ghost
|
b2cf9fc6a5
|
do table optimization in separated transaction
|
2023-06-13 16:51:16 +03:00 |
|
ghost
|
ab78e17ca8
|
add hostPage.size collection
|
2023-06-13 12:45:12 +03:00 |
|
ghost
|
0af5d165d3
|
remove logCrawler column not in use
|
2023-06-05 22:06:55 +03:00 |
|
ghost
|
4b16b41440
|
make transaction for each item in crawl queue
|
2023-06-05 22:01:22 +03:00 |
|
ghost
|
b585b16d31
|
fix datatype error detection
|
2023-06-05 21:02:18 +03:00 |
|
ghost
|
c5e25d17fb
|
prevent page ban when it MIME in the whitelist, skip steps below only (make multimedia/streaming resources visible in search results)
|
2023-06-04 17:44:09 +03:00 |
|
ghost
|
4fa33afe40
|
prevent infinitive connection on streaming resources detected
|
2023-06-04 17:02:32 +03:00 |
|
ghost
|
345c59b5f4
|
collect target location links on page redirect available
|
2023-06-04 14:58:33 +03:00 |
|
ghost
|
5d7f2bf68c
|
fix snap foreign keys deletion
|
2023-06-04 13:39:47 +03:00 |
|