68 Commits

Author SHA1 Message Date
yggverse
8651eb4d97 fix argument check condition 2024-03-21 23:24:32 +02:00
yggverse
cd88a971e7 validate arguments 2024-03-21 23:22:50 +02:00
yggverse
e01ad2ebdb remove webui features from cli 2024-03-21 23:20:31 +02:00
yggverse
7390178376 apply search options 2024-03-21 22:09:11 +02:00
yggverse
e9d745a932 make search query by substring 2024-03-21 19:07:46 +02:00
yggverse
2a68c959e6 remove exact condition 2024-03-21 19:01:39 +02:00
yggverse
3f1d19ad18 add random expression 2024-03-21 19:00:38 +02:00
yggverse
be7c63e68a make sure document contain exact substring in URL 2024-03-21 18:58:30 +02:00
yggverse
22c73fb922 update snap location check 2024-03-21 18:50:14 +02:00
yggverse
5e4494c9e8 use PHP 8 str_starts_with function 2024-03-21 18:47:11 +02:00
yggverse
79b82d46e1 add cleanup limit argument 2024-03-21 18:41:33 +02:00
yggverse
e635ec6dc9 enable cleanup on configuration update, delete snap match cleanup conditions 2024-03-21 18:23:07 +02:00
yggverse
900e3a453f Disable keywords collection from headers as body index enabled 2024-03-21 03:46:58 +02:00
yggverse
1f3ee435e9 fix custom encoding conversion 2024-03-21 03:38:46 +02:00
yggverse
e09440b44a strip code content 2024-03-21 00:38:24 +02:00
yggverse
b5cd219f47 strip css content from index 2024-03-21 00:34:25 +02:00
yggverse
b440e6edff disable configuration changes cleanup 2024-03-20 22:41:12 +02:00
yggverse
ad3fd31f67 update cleanup condition 2024-03-20 22:35:33 +02:00
yggverse
dd914e0e1b fix cleanup query 2024-03-20 22:33:11 +02:00
yggverse
36972cab19 implement alter index tool 2024-03-20 21:06:18 +02:00
yggverse
2257ce771f apply cleaner to the current url configuration 2024-03-20 20:18:55 +02:00
yggverse
3884f375d4 save document body text to index 2024-03-20 19:31:56 +02:00
ghost
1c2e8dafb2 collect keywords from document headers 2024-01-23 02:49:52 +02:00
ghost
cfbc84cbaf sort queue by rank asc 2024-01-23 02:19:35 +02:00
ghost
db9dc8d4ba force results to string 2024-01-23 01:55:28 +02:00
ghost
50dc9d315a add rank field 2024-01-22 22:56:36 +02:00
ghost
6f4abe4729 set crc32url as document id 2024-01-22 22:52:37 +02:00
ghost
93baed4b90 delete deprecated documents with HTTP code not 200 on second scan 2023-12-20 08:44:35 +02:00
ghost
33cc778999 crawl newest pages by rand in queue 2023-12-10 00:29:18 +02:00
ghost
35ad144a9e add stripos url rules for crawl snaps 2023-12-02 22:15:44 +02:00
ghost
0e06ff3c0f fix debug message 2023-12-02 21:18:57 +02:00
ghost
51d52dea7d fix destination name 2023-12-02 20:12:03 +02:00
ghost
87ca594860 add debug levels 2023-12-02 16:04:22 +02:00
ghost
33d657cb72 apply sleep on timeout value provided only 2023-12-02 15:03:51 +02:00
ghost
bc00f0c851 make tmp subfolders storage optimization 2023-12-02 14:39:11 +02:00
ghost
f613b44d3f disable sort by RAND() in crawler queue 2023-12-02 14:22:50 +02:00
ghost
d3f8d1c0e3 fix result output 2023-11-30 02:59:07 +02:00
ghost
86b20cbc51 add debug output on skip condition 2023-11-30 02:36:25 +02:00
ghost
3306dc1961 add skip url filter by stripos condition 2023-11-30 02:24:02 +02:00
ghost
ee074b684a add semaphore namespace prefix 2023-11-30 00:51:42 +02:00
ghost
27946ff27c define missed crc32url field value 2023-11-27 21:03:38 +02:00
ghost
38fbc32151 fix document fields update 2023-11-27 20:55:10 +02:00
ghost
08995e6199 randomize new pages queue 2023-11-27 20:24:46 +02:00
ghost
6a9117757b reset http code to 404 on page index initiation 2023-11-27 19:44:14 +02:00
ghost
015221eafb fix semaphore condition #5 2023-11-27 19:34:14 +02:00
ghost
a499c363f6 prevent multi-thread execution #5 2023-11-27 19:31:03 +02:00
ghost
2961045c76 implement index cleaner tool #5 2023-11-27 19:29:17 +02:00
ghost
02dd3649a7 add CURL options that prevent crawl queue stuck 2023-11-27 16:54:26 +02:00
ghost
349f26f5ea update option name 2023-11-26 21:33:34 +02:00
ghost
133548a98c fix url check conditions 2023-11-26 20:53:31 +02:00