87 Commits

Author SHA1 Message Date
yggverse
2257ce771f apply cleaner to the current url configuration 2024-03-20 20:18:55 +02:00
yggverse
3884f375d4 save document body text to index 2024-03-20 19:31:56 +02:00
ghost
1f27a7e105 trim extra spaces before query escape 2024-02-25 09:11:12 +02:00
ghost
d6b5f8b210 build combined search query 2024-02-25 09:07:57 +02:00
ghost
1c2e8dafb2 collect keywords from document headers 2024-01-23 02:49:52 +02:00
ghost
cfbc84cbaf sort queue by rank asc 2024-01-23 02:19:35 +02:00
ghost
db9dc8d4ba force results to string 2024-01-23 01:55:28 +02:00
ghost
ff8461835d calculate initial rank 2024-01-22 23:03:33 +02:00
ghost
50dc9d315a add rank field 2024-01-22 22:56:36 +02:00
ghost
6f4abe4729 set crc32url as document id 2024-01-22 22:52:37 +02:00
ghost
93baed4b90 delete deprecated documents with HTTP code not 200 on second scan 2023-12-20 08:44:35 +02:00
ghost
17d6171d95 fix directory existion check #2 2023-12-13 00:36:50 +02:00
ghost
100806af02 complete local snaps feature #2 2023-12-13 00:29:34 +02:00
ghost
33cc778999 crawl newest pages by rand in queue 2023-12-10 00:29:18 +02:00
ghost
811c700049 add http code notice 2023-12-03 01:14:06 +02:00
ghost
35ad144a9e add stripos url rules for crawl snaps 2023-12-02 22:15:44 +02:00
ghost
0e06ff3c0f fix debug message 2023-12-02 21:18:57 +02:00
ghost
e066223bd2 fix link container 2023-12-02 20:59:40 +02:00
ghost
51d52dea7d fix destination name 2023-12-02 20:12:03 +02:00
ghost
87ca594860 add debug levels 2023-12-02 16:04:22 +02:00
ghost
33d657cb72 apply sleep on timeout value provided only 2023-12-02 15:03:51 +02:00
ghost
bc00f0c851 make tmp subfolders storage optimization 2023-12-02 14:39:11 +02:00
ghost
f613b44d3f disable sort by RAND() in crawler queue 2023-12-02 14:22:50 +02:00
ghost
fa3c0491e2 fix chromium -webkit-autofill input colors 2023-12-01 23:56:57 +02:00
ghost
9087c4b0d7 add footer links settings, implement nodes registry with database download list 2023-12-01 23:47:15 +02:00
ghost
4cec81c893 make extended search mode disabled by default #7 2023-12-01 21:26:12 +02:00
ghost
f0da3caaf5 add extended search mode option 2023-12-01 20:05:38 +02:00
ghost
d3f8d1c0e3 fix result output 2023-11-30 02:59:07 +02:00
ghost
86b20cbc51 add debug output on skip condition 2023-11-30 02:36:25 +02:00
ghost
3306dc1961 add skip url filter by stripos condition 2023-11-30 02:24:02 +02:00
ghost
ee074b684a add semaphore namespace prefix 2023-11-30 00:51:42 +02:00
ghost
880764aa49 make paths relative 2023-11-29 23:13:16 +02:00
ghost
24904f667e add Utils::escape note 2023-11-29 22:51:14 +02:00
ghost
27946ff27c define missed crc32url field value 2023-11-27 21:03:38 +02:00
ghost
38fbc32151 fix document fields update 2023-11-27 20:55:10 +02:00
ghost
08995e6199 randomize new pages queue 2023-11-27 20:24:46 +02:00
ghost
3f7eb2f0e3 show random results on empty search request 2023-11-27 20:18:43 +02:00
ghost
6a9117757b reset http code to 404 on page index initiation 2023-11-27 19:44:14 +02:00
ghost
015221eafb fix semaphore condition #5 2023-11-27 19:34:14 +02:00
ghost
a499c363f6 prevent multi-thread execution #5 2023-11-27 19:31:03 +02:00
ghost
2961045c76 implement index cleaner tool #5 2023-11-27 19:29:17 +02:00
ghost
7ea9cbffcd fix found totals 2023-11-27 17:27:40 +02:00
ghost
02dd3649a7 add CURL options that prevent crawl queue stuck 2023-11-27 16:54:26 +02:00
ghost
783c60fd25 update snaps list 2023-11-27 15:55:31 +02:00
ghost
d247338c13 update download snap filename 2023-11-26 22:25:59 +02:00
ghost
82de3c73ab remove unixtime 2023-11-26 22:16:20 +02:00
ghost
5b166a6245 implement remote snap download API #2 2023-11-26 22:14:28 +02:00
ghost
94a5a82a56 add remote snaps list #2 2023-11-26 21:45:04 +02:00
ghost
349f26f5ea update option name 2023-11-26 21:33:34 +02:00
ghost
133548a98c fix url check conditions 2023-11-26 20:53:31 +02:00