yggverse
|
7cf10079c6
|
update mime on progress function event
|
2024-03-23 03:31:27 +02:00 |
|
yggverse
|
3a28bf5967
|
reset index time
|
2024-03-23 03:26:25 +02:00 |
|
yggverse
|
722de9175a
|
reset index time
|
2024-03-23 03:25:36 +02:00 |
|
yggverse
|
62149220b9
|
update http code even progress function fails
|
2024-03-23 03:16:01 +02:00 |
|
yggverse
|
34fe26fcf9
|
disable document autodelete
|
2024-03-23 03:15:01 +02:00 |
|
yggverse
|
c4df3f3237
|
improve notice level debug
|
2024-03-23 01:00:49 +02:00 |
|
yggverse
|
3a9efeabc5
|
add snaps update by timeout feature
|
2024-03-23 00:47:08 +02:00 |
|
yggverse
|
ebeef559ba
|
rename index action dependencies
|
2024-03-22 23:50:39 +02:00 |
|
yggverse
|
fef2b1abec
|
implement reindex by request feature
|
2024-03-22 22:50:52 +02:00 |
|
yggverse
|
fae43d54e5
|
enable xhtml parser
|
2024-03-22 19:11:27 +02:00 |
|
yggverse
|
f2dbd1599c
|
fix tags replacement condition
|
2024-03-22 03:02:57 +02:00 |
|
yggverse
|
5e4494c9e8
|
use PHP 8 str_starts_with function
|
2024-03-21 18:47:11 +02:00 |
|
yggverse
|
900e3a453f
|
Disable keywords collection from headers as body index enabled
|
2024-03-21 03:46:58 +02:00 |
|
yggverse
|
1f3ee435e9
|
fix custom encoding conversion
|
2024-03-21 03:38:46 +02:00 |
|
yggverse
|
e09440b44a
|
strip code content
|
2024-03-21 00:38:24 +02:00 |
|
yggverse
|
b5cd219f47
|
strip css content from index
|
2024-03-21 00:34:25 +02:00 |
|
yggverse
|
3884f375d4
|
save document body text to index
|
2024-03-20 19:31:56 +02:00 |
|
ghost
|
1c2e8dafb2
|
collect keywords from document headers
|
2024-01-23 02:49:52 +02:00 |
|
ghost
|
cfbc84cbaf
|
sort queue by rank asc
|
2024-01-23 02:19:35 +02:00 |
|
ghost
|
db9dc8d4ba
|
force results to string
|
2024-01-23 01:55:28 +02:00 |
|
ghost
|
50dc9d315a
|
add rank field
|
2024-01-22 22:56:36 +02:00 |
|
ghost
|
6f4abe4729
|
set crc32url as document id
|
2024-01-22 22:52:37 +02:00 |
|
ghost
|
93baed4b90
|
delete deprecated documents with HTTP code not 200 on second scan
|
2023-12-20 08:44:35 +02:00 |
|
ghost
|
33cc778999
|
crawl newest pages by rand in queue
|
2023-12-10 00:29:18 +02:00 |
|
ghost
|
35ad144a9e
|
add stripos url rules for crawl snaps
|
2023-12-02 22:15:44 +02:00 |
|
ghost
|
0e06ff3c0f
|
fix debug message
|
2023-12-02 21:18:57 +02:00 |
|
ghost
|
51d52dea7d
|
fix destination name
|
2023-12-02 20:12:03 +02:00 |
|
ghost
|
87ca594860
|
add debug levels
|
2023-12-02 16:04:22 +02:00 |
|
ghost
|
33d657cb72
|
apply sleep on timeout value provided only
|
2023-12-02 15:03:51 +02:00 |
|
ghost
|
bc00f0c851
|
make tmp subfolders storage optimization
|
2023-12-02 14:39:11 +02:00 |
|
ghost
|
f613b44d3f
|
disable sort by RAND() in crawler queue
|
2023-12-02 14:22:50 +02:00 |
|
ghost
|
d3f8d1c0e3
|
fix result output
|
2023-11-30 02:59:07 +02:00 |
|
ghost
|
86b20cbc51
|
add debug output on skip condition
|
2023-11-30 02:36:25 +02:00 |
|
ghost
|
3306dc1961
|
add skip url filter by stripos condition
|
2023-11-30 02:24:02 +02:00 |
|
ghost
|
ee074b684a
|
add semaphore namespace prefix
|
2023-11-30 00:51:42 +02:00 |
|
ghost
|
27946ff27c
|
define missed crc32url field value
|
2023-11-27 21:03:38 +02:00 |
|
ghost
|
38fbc32151
|
fix document fields update
|
2023-11-27 20:55:10 +02:00 |
|
ghost
|
08995e6199
|
randomize new pages queue
|
2023-11-27 20:24:46 +02:00 |
|
ghost
|
6a9117757b
|
reset http code to 404 on page index initiation
|
2023-11-27 19:44:14 +02:00 |
|
ghost
|
02dd3649a7
|
add CURL options that prevent crawl queue stuck
|
2023-11-27 16:54:26 +02:00 |
|
ghost
|
349f26f5ea
|
update option name
|
2023-11-26 21:33:34 +02:00 |
|
ghost
|
133548a98c
|
fix url check conditions
|
2023-11-26 20:53:31 +02:00 |
|
ghost
|
dfb2c06738
|
add crc32url filter
|
2023-11-25 18:10:23 +02:00 |
|
ghost
|
192e45103d
|
add index settings support
|
2023-11-25 16:01:46 +02:00 |
|
ghost
|
875382c56e
|
implement FTP snaps
|
2023-11-25 03:19:54 +02:00 |
|
ghost
|
72f2fdaeca
|
change config location
|
2023-11-25 00:16:08 +02:00 |
|
ghost
|
c6e9ba9d09
|
implement local storage feature with tar.gz compression
|
2023-11-24 19:51:43 +02:00 |
|
ghost
|
01753b0557
|
add crawl queue delay support
|
2023-11-20 00:06:17 +02:00 |
|
ghost
|
13cf61b42c
|
fix debug output
|
2023-11-19 23:34:13 +02:00 |
|
ghost
|
7dfc800a67
|
initial commit
|
2023-11-19 23:00:51 +02:00 |
|