ghost
|
29197ab904
|
remove testing construction
|
2023-06-26 15:59:08 +03:00 |
|
ghost
|
ed240d53b0
|
show available snaps only
|
2023-06-25 23:29:30 +03:00 |
|
ghost
|
5346b13602
|
implement custom hostPageDom elements index
|
2023-06-25 22:10:47 +03:00 |
|
ghost
|
1c5346bc07
|
remove single char words
|
2023-06-22 13:37:12 +03:00 |
|
ghost
|
dc2d971ba0
|
clean up banned pages extra data
|
2023-06-16 16:53:14 +03:00 |
|
ghost
|
a657d31e1d
|
fix enum data type
|
2023-06-16 16:32:46 +03:00 |
|
ghost
|
f3475035c2
|
show page size in explorer view, hide not available data
|
2023-06-13 23:20:22 +03:00 |
|
ghost
|
ab78e17ca8
|
add hostPage.size collection
|
2023-06-13 12:45:12 +03:00 |
|
ghost
|
7892784f5c
|
add httpCode column to hostPageSnapDownload table
|
2023-06-12 13:34:25 +03:00 |
|
ghost
|
edec590e09
|
fix MAYBE filter in the default search mode
|
2023-06-06 00:36:13 +03:00 |
|
ghost
|
e1fb7f8c17
|
change query separators to the MAYBE operator in default search mode
|
2023-06-05 23:33:07 +03:00 |
|
ghost
|
0af5d165d3
|
remove logCrawler column not in use
|
2023-06-05 22:06:55 +03:00 |
|
ghost
|
4fa33afe40
|
prevent infinitive connection on streaming resources detected
|
2023-06-04 17:02:32 +03:00 |
|
ghost
|
345c59b5f4
|
collect target location links on page redirect available
|
2023-06-04 14:58:33 +03:00 |
|
ghost
|
f49076bb0c
|
index homepages and shorter URL with higher priority
|
2023-06-04 11:38:56 +03:00 |
|
ghost
|
81f7ea1e1e
|
implement multi-storage snap downloads
|
2023-05-15 09:18:18 +03:00 |
|
ghost
|
1969707eeb
|
integrate optional MEGA/cmd snap storage
|
2023-05-14 19:41:20 +03:00 |
|
ghost
|
50c9066f62
|
add tables optimization to the cron/cleaner task
|
2023-05-14 02:39:32 +03:00 |
|
ghost
|
0d19004e86
|
make local snap storage optimization
|
2023-05-14 01:45:55 +03:00 |
|
ghost
|
2f7d99079d
|
implement local snaps
|
2023-05-13 10:15:07 +03:00 |
|
ghost
|
d98b8f5c94
|
remove hostPageToHostPage .quantity field because of implements wrong duplicates counting on reindex
|
2023-05-13 06:30:40 +03:00 |
|
ghost
|
eeeb3dceac
|
implement index explorer
|
2023-05-13 05:54:15 +03:00 |
|
ghost
|
377b519a2c
|
implement host page info mode
|
2023-05-13 03:51:34 +03:00 |
|
ghost
|
371670fadf
|
add media referrers info
|
2023-05-13 03:01:00 +03:00 |
|
ghost
|
4486bdc215
|
show mime type options that match search results only
|
2023-05-10 20:37:05 +03:00 |
|
ghost
|
307ebcf0b1
|
add page description on title | description | keywords not empty, remove deprecated constructions
|
2023-05-10 19:35:01 +03:00 |
|
ghost
|
7c5ba050b2
|
fix media crawling
|
2023-05-10 18:35:18 +03:00 |
|
ghost
|
0fed16621a
|
fix mime content type update
|
2023-05-10 14:47:33 +03:00 |
|
ghost
|
db0e66c846
|
refactor to mime-based content index #1
|
2023-05-10 12:47:36 +03:00 |
|
ghost
|
0ffcee1efb
|
fix image description updates timing
|
2023-05-09 15:53:21 +03:00 |
|
ghost
|
2c5ca1b630
|
fix image description duplicate
|
2023-05-09 15:23:32 +03:00 |
|
ghost
|
28bf526d53
|
add host nsfw settings
|
2023-05-09 13:26:19 +03:00 |
|
ghost
|
8ce0324e94
|
convert page data to string
|
2023-05-09 12:52:07 +03:00 |
|
ghost
|
dfca5570c6
|
remove unused construction
|
2023-05-09 12:10:42 +03:00 |
|
ghost
|
d186fff48f
|
skip curl download on response data size reached
|
2023-05-09 10:21:37 +03:00 |
|
ghost
|
ef4de6b245
|
fix image search page errors
|
2023-05-09 08:53:33 +03:00 |
|
ghost
|
23ead4e12c
|
update page / image description models, implement history snap crawling
|
2023-05-09 08:19:49 +03:00 |
|
ghost
|
0e9d29675f
|
implement host page description history crawling
|
2023-05-09 01:29:32 +03:00 |
|
ghost
|
32d0f390d3
|
update http code and mime type on page/image ban event
|
2023-05-08 14:13:53 +03:00 |
|
ghost
|
8fbd7f3516
|
count totals using sphinx index instead of database
|
2023-05-08 12:28:49 +03:00 |
|
ghost
|
25b6bce2ec
|
add crawler/cleaner logs
|
2023-05-08 11:04:59 +03:00 |
|
ghost
|
ea04220de3
|
add curl requests debug
|
2023-05-08 08:27:21 +03:00 |
|
ghost
|
6c41dd5831
|
fix ban time update / count affected rows only
|
2023-05-06 10:11:25 +03:00 |
|
ghost
|
b6605b9132
|
implement not reachable resources ban feature with timeout to prevent extra http requests
|
2023-05-06 08:45:37 +03:00 |
|
ghost
|
702a14b634
|
add mime content type crawling #1
|
2023-05-06 07:25:54 +03:00 |
|
ghost
|
f88d2ee9ff
|
implement MIME content-type crawler filter
|
2023-05-05 21:25:57 +03:00 |
|
ghost
|
bed5d3f149
|
fix offset out of bounds error
|
2023-05-05 15:16:36 +03:00 |
|
ghost
|
5999fb3a73
|
add distributed hosts crawling using yggo nodes manifest
|
2023-05-05 05:26:53 +03:00 |
|
ghost
|
f0b2eb1613
|
show images total instead of pages in placeholder on image search page
|
2023-05-05 01:42:44 +03:00 |
|
ghost
|
297563d4a5
|
display related pages in priority to the unique host by rank, rand() order
|
2023-05-04 10:53:37 +03:00 |
|