Commit Graph

128 Commits

Author SHA1 Message Date
yggverse
e09440b44a strip code content 2024-03-21 00:38:24 +02:00
yggverse
b5cd219f47 strip css content from index 2024-03-21 00:34:25 +02:00
yggverse
5588668728 update link rules 2024-03-21 00:33:46 +02:00
yggverse
2909091e72 update link rules 2024-03-20 23:04:05 +02:00
yggverse
19272733e4 update url rules 2024-03-20 22:53:36 +02:00
yggverse
b440e6edff disable configuration changes cleanup 2024-03-20 22:41:12 +02:00
yggverse
ad3fd31f67 update cleanup condition 2024-03-20 22:35:33 +02:00
yggverse
dd914e0e1b fix cleanup query 2024-03-20 22:33:11 +02:00
yggverse
25fed9f1dc add new link rules 2024-03-20 21:31:12 +02:00
yggverse
3c62dc0fd5 add new url blacklist rule 2024-03-20 21:08:00 +02:00
yggverse
36972cab19 implement alter index tool 2024-03-20 21:06:18 +02:00
yggverse
44e2836de4 add new link rules 2024-03-20 20:33:08 +02:00
yggverse
2257ce771f apply cleaner to the current url configuration 2024-03-20 20:18:55 +02:00
yggverse
d9bc24c8f8 add url substrings skip rules 2024-03-20 19:46:43 +02:00
yggverse
3884f375d4 save document body text to index 2024-03-20 19:31:56 +02:00
ghost
1f27a7e105 trim extra spaces before query escape 2024-02-25 09:11:12 +02:00
ghost
d6b5f8b210 build combined search query 2024-02-25 09:07:57 +02:00
ghost
1c2e8dafb2 collect keywords from document headers 2024-01-23 02:49:52 +02:00
ghost
cfbc84cbaf sort queue by rank asc 2024-01-23 02:19:35 +02:00
ghost
db9dc8d4ba force results to string 2024-01-23 01:55:28 +02:00
ghost
ff8461835d calculate initial rank 2024-01-22 23:03:33 +02:00
ghost
50dc9d315a add rank field 2024-01-22 22:56:36 +02:00
ghost
6f4abe4729 set crc32url as document id 2024-01-22 22:52:37 +02:00
ghost
93baed4b90 delete deprecated documents with HTTP code not 200 on second scan 2023-12-20 08:44:35 +02:00
ghost
17d6171d95 fix directory existion check #2 2023-12-13 00:36:50 +02:00
ghost
100806af02 complete local snaps feature #2 2023-12-13 00:29:34 +02:00
ghost
3be2f3ce09 ignore all config files in this folder 2023-12-12 23:30:15 +02:00
ghost
33cc778999 crawl newest pages by rand in queue 2023-12-10 00:29:18 +02:00
ghost
811c700049 add http code notice 2023-12-03 01:14:06 +02:00
ghost
35ad144a9e add stripos url rules for crawl snaps 2023-12-02 22:15:44 +02:00
ghost
0e06ff3c0f fix debug message 2023-12-02 21:18:57 +02:00
ghost
e066223bd2 fix link container 2023-12-02 20:59:40 +02:00
ghost
51d52dea7d fix destination name 2023-12-02 20:12:03 +02:00
ghost
87ca594860 add debug levels 2023-12-02 16:04:22 +02:00
ghost
33d657cb72 apply sleep on timeout value provided only 2023-12-02 15:03:51 +02:00
ghost
bc00f0c851 make tmp subfolders storage optimization 2023-12-02 14:39:11 +02:00
ghost
f613b44d3f disable sort by RAND() in crawler queue 2023-12-02 14:22:50 +02:00
ghost
646269c4d9 fix link name 2023-12-02 00:34:49 +02:00
ghost
761cac9f3e remove target="_blank" 2023-12-02 00:22:01 +02:00
ghost
fa3c0491e2 fix chromium -webkit-autofill input colors 2023-12-01 23:56:57 +02:00
ghost
9087c4b0d7 add footer links settings, implement nodes registry with database download list 2023-12-01 23:47:15 +02:00
ghost
4cec81c893 make extended search mode disabled by default #7 2023-12-01 21:26:12 +02:00
ghost
f0da3caaf5 add extended search mode option 2023-12-01 20:05:38 +02:00
ghost
2f2eea6821 fix registry 2023-11-30 21:00:31 +02:00
ghost
5a730c09fc update readme 2023-11-30 20:55:01 +02:00
ghost
aa24abf005 update readme 2023-11-30 20:54:04 +02:00
ghost
d37856fb1d add index download links 2023-11-30 16:18:44 +02:00
ghost
1ea44573ea update readme 2023-11-30 14:38:58 +02:00
ghost
25bbd94b74 update readme 2023-11-30 14:37:54 +02:00
ghost
16ae4dadaa update readme 2023-11-30 14:33:57 +02:00