ghost
|
d96abb8ea8
|
ban host page on encoding not detected
|
2023-06-16 13:23:52 +03:00 |
|
ghost
|
d2469e9adc
|
fix meta variables overwrite
|
2023-06-14 02:53:14 +03:00 |
|
ghost
|
1d5d5ead5d
|
fix DomDocument initiation without encoding provided
|
2023-06-14 02:20:00 +03:00 |
|
ghost
|
8a747de341
|
fix HTML/multimedia content detection
|
2023-06-13 23:09:44 +03:00 |
|
ghost
|
93c6067fd9
|
fix host page mime detection
|
2023-06-13 22:29:28 +03:00 |
|
ghost
|
80d3912bc7
|
allow x-raw-image links
|
2023-06-13 20:26:17 +03:00 |
|
ghost
|
b23f550a1b
|
skip magnet links
|
2023-06-13 20:25:37 +03:00 |
|
ghost
|
ab78e17ca8
|
add hostPage.size collection
|
2023-06-13 12:45:12 +03:00 |
|
ghost
|
0af5d165d3
|
remove logCrawler column not in use
|
2023-06-05 22:06:55 +03:00 |
|
ghost
|
4b16b41440
|
make transaction for each item in crawl queue
|
2023-06-05 22:01:22 +03:00 |
|
ghost
|
b585b16d31
|
fix datatype error detection
|
2023-06-05 21:02:18 +03:00 |
|
ghost
|
c5e25d17fb
|
prevent page ban when it MIME in the whitelist, skip steps below only (make multimedia/streaming resources visible in search results)
|
2023-06-04 17:44:09 +03:00 |
|
ghost
|
4fa33afe40
|
prevent infinitive connection on streaming resources detected
|
2023-06-04 17:02:32 +03:00 |
|
ghost
|
345c59b5f4
|
collect target location links on page redirect available
|
2023-06-04 14:58:33 +03:00 |
|
ghost
|
242e0abd86
|
ban pages only on data type error codes only
|
2023-06-04 13:10:32 +03:00 |
|
ghost
|
512bd56056
|
ban page that throws the error and stuck the crawl queue
|
2023-06-04 12:04:41 +03:00 |
|
ghost
|
81f7ea1e1e
|
implement multi-storage snap downloads
|
2023-05-15 09:18:18 +03:00 |
|
ghost
|
1969707eeb
|
integrate optional MEGA/cmd snap storage
|
2023-05-14 19:41:20 +03:00 |
|
ghost
|
bd99dcb023
|
add leading zero to mkdir access code
|
2023-05-14 05:43:03 +03:00 |
|
ghost
|
48664f0caf
|
fix zip close, loop brake condition
|
2023-05-14 04:33:35 +03:00 |
|
ghost
|
0d19004e86
|
make local snap storage optimization
|
2023-05-14 01:45:55 +03:00 |
|
ghost
|
efc66d5dab
|
update local snap storage paths
|
2023-05-13 11:06:40 +03:00 |
|
ghost
|
2f7d99079d
|
implement local snaps
|
2023-05-13 10:15:07 +03:00 |
|
ghost
|
9477d87b2e
|
change strpos to stripos
|
2023-05-13 01:28:50 +03:00 |
|
ghost
|
28e8bcf8d7
|
add audio/video media crawl support
|
2023-05-13 01:23:09 +03:00 |
|
ghost
|
307ebcf0b1
|
add page description on title | description | keywords not empty, remove deprecated constructions
|
2023-05-10 19:35:01 +03:00 |
|
ghost
|
7c5ba050b2
|
fix media crawling
|
2023-05-10 18:35:18 +03:00 |
|
ghost
|
0fed16621a
|
fix mime content type update
|
2023-05-10 14:47:33 +03:00 |
|
ghost
|
db0e66c846
|
refactor to mime-based content index #1
|
2023-05-10 12:47:36 +03:00 |
|
ghost
|
0ffcee1efb
|
fix image description updates timing
|
2023-05-09 15:53:21 +03:00 |
|
ghost
|
2c5ca1b630
|
fix image description duplicate
|
2023-05-09 15:23:32 +03:00 |
|
ghost
|
28bf526d53
|
add host nsfw settings
|
2023-05-09 13:26:19 +03:00 |
|
ghost
|
8ce0324e94
|
convert page data to string
|
2023-05-09 12:52:07 +03:00 |
|
ghost
|
d186fff48f
|
skip curl download on response data size reached
|
2023-05-09 10:21:37 +03:00 |
|
ghost
|
d7a5f7ef84
|
remove content filter, snap raw the data
|
2023-05-09 09:02:17 +03:00 |
|
ghost
|
23ead4e12c
|
update page / image description models, implement history snap crawling
|
2023-05-09 08:19:49 +03:00 |
|
ghost
|
0e9d29675f
|
implement host page description history crawling
|
2023-05-09 01:29:32 +03:00 |
|
ghost
|
6371def666
|
fix attributes passing
|
2023-05-08 17:52:17 +03:00 |
|
ghost
|
32d0f390d3
|
update http code and mime type on page/image ban event
|
2023-05-08 14:13:53 +03:00 |
|
ghost
|
84dcecf50b
|
add svg images support, fix mime validation
|
2023-05-08 13:12:16 +03:00 |
|
ghost
|
bf1eeb332c
|
fix page/image mime content type detection
|
2023-05-08 12:10:57 +03:00 |
|
ghost
|
25b6bce2ec
|
add crawler/cleaner logs
|
2023-05-08 11:04:59 +03:00 |
|
ghost
|
dcdc2c50ad
|
update debug string names
|
2023-05-08 08:31:34 +03:00 |
|
ghost
|
ea04220de3
|
add curl requests debug
|
2023-05-08 08:27:21 +03:00 |
|
ghost
|
1aba060d34
|
fix variable name
|
2023-05-08 07:23:50 +03:00 |
|
ghost
|
fdd18de373
|
remove abstraction
|
2023-05-06 14:03:43 +03:00 |
|
ghost
|
6c41dd5831
|
fix ban time update / count affected rows only
|
2023-05-06 10:11:25 +03:00 |
|
ghost
|
20514c455f
|
add banned items counters
|
2023-05-06 08:50:41 +03:00 |
|
ghost
|
b6605b9132
|
implement not reachable resources ban feature with timeout to prevent extra http requests
|
2023-05-06 08:45:37 +03:00 |
|
ghost
|
702a14b634
|
add mime content type crawling #1
|
2023-05-06 07:25:54 +03:00 |
|