Commit Graph

485 Commits

Author SHA1 Message Date
ghost
ed240d53b0 show available snaps only 2023-06-25 23:29:30 +03:00
ghost
2c5128382b fix semaphore ID 2023-06-25 22:16:05 +03:00
ghost
a79943dbae update readme 2023-06-25 22:12:07 +03:00
ghost
ebc1a573dc initiate CLI tool 2023-06-25 22:11:49 +03:00
ghost
5346b13602 implement custom hostPageDom elements index 2023-06-25 22:10:47 +03:00
ghost
5df598a1d4 fix variable name 2023-06-24 15:21:47 +03:00
ghost
1c5346bc07 remove single char words 2023-06-22 13:37:12 +03:00
ghost
e16a7b8171 fix HY000/1366 error processing 2023-06-17 11:33:32 +03:00
ghost
dc2d971ba0 clean up banned pages extra data 2023-06-16 16:53:14 +03:00
ghost
a657d31e1d fix enum data type 2023-06-16 16:32:46 +03:00
ghost
d96abb8ea8 ban host page on encoding not detected 2023-06-16 13:23:52 +03:00
ghost
d2469e9adc fix meta variables overwrite 2023-06-14 02:53:14 +03:00
ghost
0949d7f871 set default encoding 2023-06-14 02:20:09 +03:00
ghost
1d5d5ead5d fix DomDocument initiation without encoding provided 2023-06-14 02:20:00 +03:00
ghost
fcda6b9885 remove MIME filters from explorer search form 2023-06-13 23:25:17 +03:00
ghost
f3475035c2 show page size in explorer view, hide not available data 2023-06-13 23:20:22 +03:00
ghost
8a747de341 fix HTML/multimedia content detection 2023-06-13 23:09:44 +03:00
ghost
93c6067fd9 fix host page mime detection 2023-06-13 22:29:28 +03:00
ghost
c07d6af52f add new mime preset 2023-06-13 21:57:01 +03:00
ghost
052b08ea26 show results quantity in the mime filter titles 2023-06-13 21:20:02 +03:00
ghost
80d3912bc7 allow x-raw-image links 2023-06-13 20:26:17 +03:00
ghost
b23f550a1b skip magnet links 2023-06-13 20:25:37 +03:00
ghost
acba2816e2 remove transaction from tables optimization case 2023-06-13 17:45:02 +03:00
ghost
be81299c84 update readme 2023-06-13 17:35:47 +03:00
ghost
b2cf9fc6a5 do table optimization in separated transaction 2023-06-13 16:51:16 +03:00
ghost
ab78e17ca8 add hostPage.size collection 2023-06-13 12:45:12 +03:00
ghost
830e96b03d increase minimum requirements 2023-06-13 03:16:29 +03:00
ghost
7892784f5c add httpCode column to hostPageSnapDownload table 2023-06-12 13:34:25 +03:00
ghost
20726fca45 update readme 2023-06-10 00:20:39 +03:00
ghost
dd736c7923 crontab schedule optimization 2023-06-10 00:19:27 +03:00
ghost
a79993a94b add an mk 2023-06-08 00:11:02 +03:00
ghost
edec590e09 fix MAYBE filter in the default search mode 2023-06-06 00:36:13 +03:00
ghost
e1fb7f8c17 change query separators to the MAYBE operator in default search mode 2023-06-05 23:33:07 +03:00
ghost
9379809261 colorize meow 2023-06-05 23:08:29 +03:00
ghost
0af5d165d3 remove logCrawler column not in use 2023-06-05 22:06:55 +03:00
ghost
4b16b41440 make transaction for each item in crawl queue 2023-06-05 22:01:22 +03:00
ghost
b585b16d31 fix datatype error detection 2023-06-05 21:02:18 +03:00
ghost
8726512cf0 change morphology from stem_enru to lemmatize_ru_all/lemmatize_en_all 2023-06-05 18:20:49 +03:00
ghost
8bc8a943e7 add lemmatize_de_all 2023-06-05 18:13:31 +03:00
ghost
3d3fcdda87 update readme 2023-06-05 13:44:23 +03:00
ghost
a1249859a8 update readme 2023-06-05 13:42:19 +03:00
ghost
eef23cc830 update readme 2023-06-05 13:40:35 +03:00
ghost
17f69b9661 add min_word_len, min_prefix_len, html_strip, index_exact_words presets example 2023-06-05 13:36:15 +03:00
ghost
1e2736d67b skip empty mime type index 2023-06-04 18:10:59 +03:00
ghost
c5e25d17fb prevent page ban when it MIME in the whitelist, skip steps below only (make multimedia/streaming resources visible in search results) 2023-06-04 17:44:09 +03:00
ghost
4fa33afe40 prevent infinitive connection on streaming resources detected 2023-06-04 17:02:32 +03:00
ghost
345c59b5f4 collect target location links on page redirect available 2023-06-04 14:58:33 +03:00
ghost
5d7f2bf68c fix snap foreign keys deletion 2023-06-04 13:39:47 +03:00
ghost
242e0abd86 ban pages only on data type error codes only 2023-06-04 13:10:32 +03:00
ghost
62a4f33b53 load missed dependency 2023-06-04 12:27:20 +03:00