ghost
|
23ead4e12c
|
update page / image description models, implement history snap crawling
|
2023-05-09 08:19:49 +03:00 |
|
ghost
|
77bd25f587
|
add line separators
|
2023-05-09 01:39:56 +03:00 |
|
ghost
|
0e9d29675f
|
implement host page description history crawling
|
2023-05-09 01:29:32 +03:00 |
|
ghost
|
e9d5137dfe
|
allow svg images mime content type
|
2023-05-08 13:00:37 +03:00 |
|
ghost
|
25b6bce2ec
|
add crawler/cleaner logs
|
2023-05-08 11:04:59 +03:00 |
|
ghost
|
fdd18de373
|
remove abstraction
|
2023-05-06 14:03:43 +03:00 |
|
ghost
|
4801360a51
|
update api version
|
2023-05-06 13:55:05 +03:00 |
|
ghost
|
b6605b9132
|
implement not reachable resources ban feature with timeout to prevent extra http requests
|
2023-05-06 08:45:37 +03:00 |
|
ghost
|
f88d2ee9ff
|
implement MIME content-type crawler filter
|
2023-05-05 21:25:57 +03:00 |
|
ghost
|
5999fb3a73
|
add distributed hosts crawling using yggo nodes manifest
|
2023-05-05 05:26:53 +03:00 |
|
ghost
|
297563d4a5
|
display related pages in priority to the unique host by rank, rand() order
|
2023-05-04 10:53:37 +03:00 |
|
ghost
|
834ac68cce
|
create separated pagination settings for page/image search types
|
2023-05-04 09:20:34 +03:00 |
|
ghost
|
79878d17fe
|
add crawler / proxy user agent settings
|
2023-05-04 07:38:22 +03:00 |
|
ghost
|
9ed8411d2f
|
add image queue crawler
|
2023-05-04 06:45:04 +03:00 |
|
ghost
|
d905e33b4f
|
update host images info on search requests
|
2023-05-04 06:12:51 +03:00 |
|
ghost
|
63b51f71c6
|
fix space offset
|
2023-05-04 04:20:54 +03:00 |
|
ghost
|
f980b6318c
|
add page meta to the image index
|
2023-05-04 04:20:20 +03:00 |
|
ghost
|
baf78e2bf5
|
add hostImage examples to sphinx configuration
|
2023-05-04 01:34:12 +03:00 |
|
ghost
|
0741a3e9ef
|
implement image crawler
|
2023-05-04 01:04:39 +03:00 |
|
ghost
|
56c79d8f3a
|
update config documentation
|
2023-05-03 09:31:40 +03:00 |
|
ghost
|
6d8f4f4882
|
create manifests registry
|
2023-05-03 09:22:14 +03:00 |
|
ghost
|
219a56d6cd
|
update manifest API
|
2023-05-03 05:47:02 +03:00 |
|
ghost
|
d7bbf1d96a
|
update default settings preset
|
2023-05-03 04:17:13 +03:00 |
|
ghost
|
0a199fce72
|
add project description and support links
|
2023-04-25 20:33:06 +03:00 |
|
ghost
|
a16a13b395
|
add application mode settings
|
2023-04-25 20:25:12 +03:00 |
|
ghost
|
a2fc14c8cf
|
implement manifest API
|
2023-04-25 19:35:52 +03:00 |
|
ghost
|
74dd15e544
|
add page rank sort order attribute
|
2023-04-25 17:07:57 +03:00 |
|
ghost
|
d20487acfd
|
add stem_enru, stem_cz, stem_ar morphology support
|
2023-04-25 16:10:44 +03:00 |
|
ghost
|
2a79671cf1
|
add missed option example
|
2023-04-25 16:09:38 +03:00 |
|
ghost
|
afd4375e4d
|
add hostPagesTotal info to the hosts API
|
2023-04-25 16:01:55 +03:00 |
|
ghost
|
1d7031e4f7
|
make protocol settings adaptive
|
2023-04-24 02:32:03 +03:00 |
|
ghost
|
3917ca8d4f
|
move crontab configuration example to the config directory
|
2023-04-23 09:07:06 +03:00 |
|
ghost
|
8dbb4a06af
|
add disk quota validation
|
2023-04-23 04:05:00 +03:00 |
|
ghost
|
13431008c4
|
add options documentation
|
2023-04-23 03:16:54 +03:00 |
|
ghost
|
9916fb701f
|
implement basic api
|
2023-04-23 03:01:51 +03:00 |
|
ghost
|
81cb970248
|
add options documentation
|
2023-04-23 01:54:10 +03:00 |
|
ghost
|
8da150b295
|
add options documentation
|
2023-04-23 01:46:34 +03:00 |
|
ghost
|
8f09db5045
|
add options documentation
|
2023-04-23 01:32:34 +03:00 |
|
ghost
|
c4dfb58fe3
|
add options documentation
|
2023-04-23 01:14:31 +03:00 |
|
ghost
|
8e8d89db0e
|
implement database cleaner
|
2023-04-09 00:06:28 +03:00 |
|
ghost
|
df6f2a1869
|
implement CRAWL_ROBOTS_POSTFIX_RULES configuration #5
|
2023-04-08 22:28:31 +03:00 |
|
ghost
|
8d102ecdf7
|
index hosts with enabled status only
|
2023-04-08 18:23:48 +03:00 |
|
ghost
|
0b12e872a3
|
add host name to the search index
|
2023-04-08 18:22:53 +03:00 |
|
ghost
|
e98146b78b
|
index only 200 http code pages
|
2023-04-07 05:34:45 +03:00 |
|
ghost
|
0f2b772fa8
|
remove not indexed pages from the search index
|
2023-04-07 04:50:01 +03:00 |
|
ghost
|
2495a2bbc7
|
implement MySQL/Sphinx data model #3, add basical robots.txt support #2
|
2023-04-07 04:04:24 +03:00 |
|
ghost
|
a07ca1dce1
|
add ipv6 example
|
2023-04-04 01:39:48 +03:00 |
|
ghost
|
79663c84db
|
add CRAWL_META_ONLY option
|
2023-04-03 03:07:54 +03:00 |
|
ghost
|
ff95df72c1
|
implement hostname identicons
|
2023-04-03 01:30:09 +03:00 |
|
ghost
|
4ea01bf8b4
|
implement search results pagination
|
2023-04-02 23:36:35 +03:00 |
|