Commit Graph

517 Commits

Author SHA1 Message Date
ghost
b3ec1d42a7 fix empty URI processing 2023-08-05 21:31:33 +03:00
ghost
7ddb47619a update debug message 2023-08-05 21:17:05 +03:00
ghost
9fe33a3b2c update CLI roadmap 2023-08-05 21:16:09 +03:00
ghost
6e069a86e5 update readme 2023-08-05 21:11:40 +03:00
ghost
513addc7af add query totals counting, update crawler debug 2023-08-05 21:03:45 +03:00
ghost
6e03a76ed8 add CURLOPT_SSL_VERIFYHOST/CURLOPT_SSL_VERIFYPEER options 2023-08-05 20:24:47 +03:00
ghost
004a5336de remove htmls pages ban on title tag not available 2023-08-05 20:01:31 +03:00
ghost
f9774f2431 add innodb_buffer_pool_size default value 2023-08-05 19:51:30 +03:00
ghost
de28d85a71 add connection exceptions 2023-08-05 19:39:49 +03:00
ghost
142d496108 fix SQL syntax error 2023-08-05 19:31:29 +03:00
ghost
d46c4921c5 add page break 2023-08-05 19:24:32 +03:00
ghost
80b33f619c fix PAGES_LIMIT condition 2023-08-05 19:24:21 +03:00
ghost
d024ffd770 implement unlimited settings customization for each host 2023-08-05 19:06:39 +03:00
ghost
ab6c0379c8 implement hosts crawl queue, move robots, sitemaps, manifests to this task 2023-08-04 09:32:12 +03:00
ghost
6ee5e53ef4 show sitemaps processed debug 2023-08-04 09:07:46 +03:00
ghost
71724ae33f refactor manifest crawling 2023-08-04 09:00:03 +03:00
ghost
cb37c57bc4 rename example files 2023-08-03 18:49:29 +03:00
ghost
68d5820f30 reserve one hour for huge load operations 2023-08-03 18:47:39 +03:00
ghost
efbbf19601 fix multimedia snaps 2023-08-03 17:41:55 +03:00
ghost
6862fb35cd update readme 2023-08-03 15:33:34 +03:00
ghost
282a6d609d update manifest API 2023-08-03 15:31:57 +03:00
ghost
b24d31f360 refactor cleaner, delegate tasks to crawler, init hostSetting table 2023-08-03 15:25:38 +03:00
ghost
fd90e2d517 keep banned pages data 2023-08-03 14:31:06 +03:00
ghost
ab8b6f6315 rename variables 2023-08-03 14:24:37 +03:00
ghost
02612d098b delete getFoundHostPage method, update API version 2023-08-03 14:08:45 +03:00
ghost
11e02da66d memory usage optimization, rename methods, remove memchached dependency from the model 2023-08-03 10:48:27 +03:00
ghost
cbabea595b rename method name 2023-08-03 10:26:37 +03:00
ghost
7e3248ca2c rename method name 2023-08-03 10:26:14 +03:00
ghost
772975059c add mysql conf example 2023-08-03 09:25:43 +03:00
ghost
7c407e0d1f update crontab example 2023-08-03 08:34:51 +03:00
ghost
1249e8d29c fix CRAWL_PAGE_RANK_UPDATE condition 2023-08-02 21:22:31 +03:00
ghost
5df59661d8 add page rank update optional in the crawl queue 2023-08-02 21:21:23 +03:00
ghost
a5a2ec233e unify mime-based search results template 2023-08-02 17:29:02 +03:00
ghost
6d5901c101 display shortened page URL instead of host address, change column name 2023-08-02 15:47:44 +03:00
ghost
1d7deffc4c update PR generation, delegate PR value from redirecting pages, update method names 2023-08-02 15:43:44 +03:00
ghost
bba718c901 remove host pages total column 2023-08-02 15:36:26 +03:00
ghost
b7a48b905e update method names 2023-08-02 14:25:48 +03:00
ghost
e65c24f6f3 uodate roadmap 2023-08-02 12:44:10 +03:00
ghost
1655ec63b2 skip xmpp links 2023-08-02 11:57:54 +03:00
ghost
06c136f05c fix meta/nofollow attribute processing 2023-08-02 10:56:25 +03:00
ghost
39ba77fce5 fix page info conditions 2023-08-01 22:17:54 +03:00
ghost
ef170f62f3 update cli 2023-08-01 21:55:18 +03:00
ghost
43776b5ff4 fix semaphores 2023-08-01 17:53:14 +03:00
ghost
48e0482dbd update Filter::searchQuery method 2023-08-01 17:20:42 +03:00
ghost
cc0cca346b allow empty search queries 2023-08-01 16:47:39 +03:00
ghost
d119756a41 fix index size 2023-08-01 16:23:40 +03:00
ghost
662351cc46 make meta fields index separated, set search priority by document title 2023-08-01 14:15:14 +03:00
ghost
5791877a4e update Filter::searchQuery method, fix search by URL 2023-08-01 13:50:07 +03:00
ghost
0bda87fbe6 fix priority calculation on zero value in PR 2023-08-01 11:17:29 +03:00
ghost
bf69d894ca change search results piority, add PR to the page weight 2023-08-01 11:13:06 +03:00