Pieter Wuille
10 years ago
40 changed files with 602 additions and 282 deletions
@ -0,0 +1,36 @@
@@ -0,0 +1,36 @@
|
||||
# Contributing |
||||
|
||||
We'd love to accept your code patches! However, before we can take them, we |
||||
have to jump a couple of legal hurdles. |
||||
|
||||
## Contributor License Agreements |
||||
|
||||
Please fill out either the individual or corporate Contributor License |
||||
Agreement as appropriate. |
||||
|
||||
* If you are an individual writing original source code and you're sure you |
||||
own the intellectual property, then sign an [individual CLA](https://developers.google.com/open-source/cla/individual). |
||||
* If you work for a company that wants to allow you to contribute your work, |
||||
then sign a [corporate CLA](https://developers.google.com/open-source/cla/corporate). |
||||
|
||||
Follow either of the two links above to access the appropriate CLA and |
||||
instructions for how to sign and return it. |
||||
|
||||
## Submitting a Patch |
||||
|
||||
1. Sign the contributors license agreement above. |
||||
2. Decide which code you want to submit. A submission should be a set of changes |
||||
that addresses one issue in the [issue tracker](https://github.com/google/leveldb/issues). |
||||
Please don't mix more than one logical change per submission, because it makes |
||||
the history hard to follow. If you want to make a change |
||||
(e.g. add a sample or feature) that doesn't have a corresponding issue in the |
||||
issue tracker, please create one. |
||||
3. **Submitting**: When you are ready to submit, send us a Pull Request. Be |
||||
sure to include the issue number you fixed and the name you used to sign |
||||
the CLA. |
||||
|
||||
## Writing Code ## |
||||
|
||||
If your contribution contains code, please make sure that it follows |
||||
[the style guide](http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml). |
||||
Otherwise we will have to ask you to make changes, and that's no fun for anyone. |
@ -0,0 +1,138 @@
@@ -0,0 +1,138 @@
|
||||
**LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.** |
||||
|
||||
Authors: Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com) |
||||
|
||||
# Features |
||||
* Keys and values are arbitrary byte arrays. |
||||
* Data is stored sorted by key. |
||||
* Callers can provide a custom comparison function to override the sort order. |
||||
* The basic operations are `Put(key,value)`, `Get(key)`, `Delete(key)`. |
||||
* Multiple changes can be made in one atomic batch. |
||||
* Users can create a transient snapshot to get a consistent view of data. |
||||
* Forward and backward iteration is supported over the data. |
||||
* Data is automatically compressed using the [Snappy compression library](http://code.google.com/p/snappy). |
||||
* External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions. |
||||
* [Detailed documentation](http://htmlpreview.github.io/?https://github.com/google/leveldb/blob/master/doc/index.html) about how to use the library is included with the source code. |
||||
|
||||
|
||||
# Limitations |
||||
* This is not a SQL database. It does not have a relational data model, it does not support SQL queries, and it has no support for indexes. |
||||
* Only a single process (possibly multi-threaded) can access a particular database at a time. |
||||
* There is no client-server support builtin to the library. An application that needs such support will have to wrap their own server around the library. |
||||
|
||||
# Performance |
||||
|
||||
Here is a performance report (with explanations) from the run of the |
||||
included db_bench program. The results are somewhat noisy, but should |
||||
be enough to get a ballpark performance estimate. |
||||
|
||||
## Setup |
||||
|
||||
We use a database with a million entries. Each entry has a 16 byte |
||||
key, and a 100 byte value. Values used by the benchmark compress to |
||||
about half their original size. |
||||
|
||||
LevelDB: version 1.1 |
||||
Date: Sun May 1 12:11:26 2011 |
||||
CPU: 4 x Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz |
||||
CPUCache: 4096 KB |
||||
Keys: 16 bytes each |
||||
Values: 100 bytes each (50 bytes after compression) |
||||
Entries: 1000000 |
||||
Raw Size: 110.6 MB (estimated) |
||||
File Size: 62.9 MB (estimated) |
||||
|
||||
## Write performance |
||||
|
||||
The "fill" benchmarks create a brand new database, in either |
||||
sequential, or random order. The "fillsync" benchmark flushes data |
||||
from the operating system to the disk after every operation; the other |
||||
write operations leave the data sitting in the operating system buffer |
||||
cache for a while. The "overwrite" benchmark does random writes that |
||||
update existing keys in the database. |
||||
|
||||
fillseq : 1.765 micros/op; 62.7 MB/s |
||||
fillsync : 268.409 micros/op; 0.4 MB/s (10000 ops) |
||||
fillrandom : 2.460 micros/op; 45.0 MB/s |
||||
overwrite : 2.380 micros/op; 46.5 MB/s |
||||
|
||||
Each "op" above corresponds to a write of a single key/value pair. |
||||
I.e., a random write benchmark goes at approximately 400,000 writes per second. |
||||
|
||||
Each "fillsync" operation costs much less (0.3 millisecond) |
||||
than a disk seek (typically 10 milliseconds). We suspect that this is |
||||
because the hard disk itself is buffering the update in its memory and |
||||
responding before the data has been written to the platter. This may |
||||
or may not be safe based on whether or not the hard disk has enough |
||||
power to save its memory in the event of a power failure. |
||||
|
||||
## Read performance |
||||
|
||||
We list the performance of reading sequentially in both the forward |
||||
and reverse direction, and also the performance of a random lookup. |
||||
Note that the database created by the benchmark is quite small. |
||||
Therefore the report characterizes the performance of leveldb when the |
||||
working set fits in memory. The cost of reading a piece of data that |
||||
is not present in the operating system buffer cache will be dominated |
||||
by the one or two disk seeks needed to fetch the data from disk. |
||||
Write performance will be mostly unaffected by whether or not the |
||||
working set fits in memory. |
||||
|
||||
readrandom : 16.677 micros/op; (approximately 60,000 reads per second) |
||||
readseq : 0.476 micros/op; 232.3 MB/s |
||||
readreverse : 0.724 micros/op; 152.9 MB/s |
||||
|
||||
LevelDB compacts its underlying storage data in the background to |
||||
improve read performance. The results listed above were done |
||||
immediately after a lot of random writes. The results after |
||||
compactions (which are usually triggered automatically) are better. |
||||
|
||||
readrandom : 11.602 micros/op; (approximately 85,000 reads per second) |
||||
readseq : 0.423 micros/op; 261.8 MB/s |
||||
readreverse : 0.663 micros/op; 166.9 MB/s |
||||
|
||||
Some of the high cost of reads comes from repeated decompression of blocks |
||||
read from disk. If we supply enough cache to the leveldb so it can hold the |
||||
uncompressed blocks in memory, the read performance improves again: |
||||
|
||||
readrandom : 9.775 micros/op; (approximately 100,000 reads per second before compaction) |
||||
readrandom : 5.215 micros/op; (approximately 190,000 reads per second after compaction) |
||||
|
||||
## Repository contents |
||||
|
||||
See doc/index.html for more explanation. See doc/impl.html for a brief overview of the implementation. |
||||
|
||||
The public interface is in include/*.h. Callers should not include or |
||||
rely on the details of any other header files in this package. Those |
||||
internal APIs may be changed without warning. |
||||
|
||||
Guide to header files: |
||||
|
||||
* **include/db.h**: Main interface to the DB: Start here |
||||
|
||||
* **include/options.h**: Control over the behavior of an entire database, |
||||
and also control over the behavior of individual reads and writes. |
||||
|
||||
* **include/comparator.h**: Abstraction for user-specified comparison function. |
||||
If you want just bytewise comparison of keys, you can use the default |
||||
comparator, but clients can write their own comparator implementations if they |
||||
want custom ordering (e.g. to handle different character encodings, etc.) |
||||
|
||||
* **include/iterator.h**: Interface for iterating over data. You can get |
||||
an iterator from a DB object. |
||||
|
||||
* **include/write_batch.h**: Interface for atomically applying multiple |
||||
updates to a database. |
||||
|
||||
* **include/slice.h**: A simple module for maintaining a pointer and a |
||||
length into some other byte array. |
||||
|
||||
* **include/status.h**: Status is returned from many of the public interfaces |
||||
and is used to report success and various kinds of errors. |
||||
|
||||
* **include/env.h**: |
||||
Abstraction of the OS environment. A posix implementation of this interface is |
||||
in util/env_posix.cc |
||||
|
||||
* **include/table.h, include/table_builder.h**: Lower-level modules that most |
||||
clients probably won't use directly |
@ -0,0 +1,225 @@
@@ -0,0 +1,225 @@
|
||||
// Copyright (c) 2012 The LevelDB Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style license that can be
|
||||
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
||||
|
||||
#include <stdio.h> |
||||
#include "db/dbformat.h" |
||||
#include "db/filename.h" |
||||
#include "db/log_reader.h" |
||||
#include "db/version_edit.h" |
||||
#include "db/write_batch_internal.h" |
||||
#include "leveldb/env.h" |
||||
#include "leveldb/iterator.h" |
||||
#include "leveldb/options.h" |
||||
#include "leveldb/status.h" |
||||
#include "leveldb/table.h" |
||||
#include "leveldb/write_batch.h" |
||||
#include "util/logging.h" |
||||
|
||||
namespace leveldb { |
||||
|
||||
namespace { |
||||
|
||||
bool GuessType(const std::string& fname, FileType* type) { |
||||
size_t pos = fname.rfind('/'); |
||||
std::string basename; |
||||
if (pos == std::string::npos) { |
||||
basename = fname; |
||||
} else { |
||||
basename = std::string(fname.data() + pos + 1, fname.size() - pos - 1); |
||||
} |
||||
uint64_t ignored; |
||||
return ParseFileName(basename, &ignored, type); |
||||
} |
||||
|
||||
// Notified when log reader encounters corruption.
|
||||
class CorruptionReporter : public log::Reader::Reporter { |
||||
public: |
||||
WritableFile* dst_; |
||||
virtual void Corruption(size_t bytes, const Status& status) { |
||||
std::string r = "corruption: "; |
||||
AppendNumberTo(&r, bytes); |
||||
r += " bytes; "; |
||||
r += status.ToString(); |
||||
r.push_back('\n'); |
||||
dst_->Append(r); |
||||
} |
||||
}; |
||||
|
||||
// Print contents of a log file. (*func)() is called on every record.
|
||||
Status PrintLogContents(Env* env, const std::string& fname, |
||||
void (*func)(uint64_t, Slice, WritableFile*), |
||||
WritableFile* dst) { |
||||
SequentialFile* file; |
||||
Status s = env->NewSequentialFile(fname, &file); |
||||
if (!s.ok()) { |
||||
return s; |
||||
} |
||||
CorruptionReporter reporter; |
||||
reporter.dst_ = dst; |
||||
log::Reader reader(file, &reporter, true, 0); |
||||
Slice record; |
||||
std::string scratch; |
||||
while (reader.ReadRecord(&record, &scratch)) { |
||||
(*func)(reader.LastRecordOffset(), record, dst); |
||||
} |
||||
delete file; |
||||
return Status::OK(); |
||||
} |
||||
|
||||
// Called on every item found in a WriteBatch.
|
||||
class WriteBatchItemPrinter : public WriteBatch::Handler { |
||||
public: |
||||
WritableFile* dst_; |
||||
virtual void Put(const Slice& key, const Slice& value) { |
||||
std::string r = " put '"; |
||||
AppendEscapedStringTo(&r, key); |
||||
r += "' '"; |
||||
AppendEscapedStringTo(&r, value); |
||||
r += "'\n"; |
||||
dst_->Append(r); |
||||
} |
||||
virtual void Delete(const Slice& key) { |
||||
std::string r = " del '"; |
||||
AppendEscapedStringTo(&r, key); |
||||
r += "'\n"; |
||||
dst_->Append(r); |
||||
} |
||||
}; |
||||
|
||||
|
||||
// Called on every log record (each one of which is a WriteBatch)
|
||||
// found in a kLogFile.
|
||||
static void WriteBatchPrinter(uint64_t pos, Slice record, WritableFile* dst) { |
||||
std::string r = "--- offset "; |
||||
AppendNumberTo(&r, pos); |
||||
r += "; "; |
||||
if (record.size() < 12) { |
||||
r += "log record length "; |
||||
AppendNumberTo(&r, record.size()); |
||||
r += " is too small\n"; |
||||
dst->Append(r); |
||||
return; |
||||
} |
||||
WriteBatch batch; |
||||
WriteBatchInternal::SetContents(&batch, record); |
||||
r += "sequence "; |
||||
AppendNumberTo(&r, WriteBatchInternal::Sequence(&batch)); |
||||
r.push_back('\n'); |
||||
dst->Append(r); |
||||
WriteBatchItemPrinter batch_item_printer; |
||||
batch_item_printer.dst_ = dst; |
||||
Status s = batch.Iterate(&batch_item_printer); |
||||
if (!s.ok()) { |
||||
dst->Append(" error: " + s.ToString() + "\n"); |
||||
} |
||||
} |
||||
|
||||
Status DumpLog(Env* env, const std::string& fname, WritableFile* dst) { |
||||
return PrintLogContents(env, fname, WriteBatchPrinter, dst); |
||||
} |
||||
|
||||
// Called on every log record (each one of which is a WriteBatch)
|
||||
// found in a kDescriptorFile.
|
||||
static void VersionEditPrinter(uint64_t pos, Slice record, WritableFile* dst) { |
||||
std::string r = "--- offset "; |
||||
AppendNumberTo(&r, pos); |
||||
r += "; "; |
||||
VersionEdit edit; |
||||
Status s = edit.DecodeFrom(record); |
||||
if (!s.ok()) { |
||||
r += s.ToString(); |
||||
r.push_back('\n'); |
||||
} else { |
||||
r += edit.DebugString(); |
||||
} |
||||
dst->Append(r); |
||||
} |
||||
|
||||
Status DumpDescriptor(Env* env, const std::string& fname, WritableFile* dst) { |
||||
return PrintLogContents(env, fname, VersionEditPrinter, dst); |
||||
} |
||||
|
||||
Status DumpTable(Env* env, const std::string& fname, WritableFile* dst) { |
||||
uint64_t file_size; |
||||
RandomAccessFile* file = NULL; |
||||
Table* table = NULL; |
||||
Status s = env->GetFileSize(fname, &file_size); |
||||
if (s.ok()) { |
||||
s = env->NewRandomAccessFile(fname, &file); |
||||
} |
||||
if (s.ok()) { |
||||
// We use the default comparator, which may or may not match the
|
||||
// comparator used in this database. However this should not cause
|
||||
// problems since we only use Table operations that do not require
|
||||
// any comparisons. In particular, we do not call Seek or Prev.
|
||||
s = Table::Open(Options(), file, file_size, &table); |
||||
} |
||||
if (!s.ok()) { |
||||
delete table; |
||||
delete file; |
||||
return s; |
||||
} |
||||
|
||||
ReadOptions ro; |
||||
ro.fill_cache = false; |
||||
Iterator* iter = table->NewIterator(ro); |
||||
std::string r; |
||||
for (iter->SeekToFirst(); iter->Valid(); iter->Next()) { |
||||
r.clear(); |
||||
ParsedInternalKey key; |
||||
if (!ParseInternalKey(iter->key(), &key)) { |
||||
r = "badkey '"; |
||||
AppendEscapedStringTo(&r, iter->key()); |
||||
r += "' => '"; |
||||
AppendEscapedStringTo(&r, iter->value()); |
||||
r += "'\n"; |
||||
dst->Append(r); |
||||
} else { |
||||
r = "'"; |
||||
AppendEscapedStringTo(&r, key.user_key); |
||||
r += "' @ "; |
||||
AppendNumberTo(&r, key.sequence); |
||||
r += " : "; |
||||
if (key.type == kTypeDeletion) { |
||||
r += "del"; |
||||
} else if (key.type == kTypeValue) { |
||||
r += "val"; |
||||
} else { |
||||
AppendNumberTo(&r, key.type); |
||||
} |
||||
r += " => '"; |
||||
AppendEscapedStringTo(&r, iter->value()); |
||||
r += "'\n"; |
||||
dst->Append(r); |
||||
} |
||||
} |
||||
s = iter->status(); |
||||
if (!s.ok()) { |
||||
dst->Append("iterator error: " + s.ToString() + "\n"); |
||||
} |
||||
|
||||
delete iter; |
||||
delete table; |
||||
delete file; |
||||
return Status::OK(); |
||||
} |
||||
|
||||
} // namespace
|
||||
|
||||
Status DumpFile(Env* env, const std::string& fname, WritableFile* dst) { |
||||
FileType ftype; |
||||
if (!GuessType(fname, &ftype)) { |
||||
return Status::InvalidArgument(fname + ": unknown file type"); |
||||
} |
||||
switch (ftype) { |
||||
case kLogFile: return DumpLog(env, fname, dst); |
||||
case kDescriptorFile: return DumpDescriptor(env, fname, dst); |
||||
case kTableFile: return DumpTable(env, fname, dst); |
||||
default: |
||||
break; |
||||
} |
||||
return Status::InvalidArgument(fname + ": not a dump-able file type"); |
||||
} |
||||
|
||||
} // namespace leveldb
|
@ -0,0 +1,25 @@
@@ -0,0 +1,25 @@
|
||||
// Copyright (c) 2014 The LevelDB Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style license that can be
|
||||
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
||||
|
||||
#ifndef STORAGE_LEVELDB_INCLUDE_DUMPFILE_H_ |
||||
#define STORAGE_LEVELDB_INCLUDE_DUMPFILE_H_ |
||||
|
||||
#include <string> |
||||
#include "leveldb/env.h" |
||||
#include "leveldb/status.h" |
||||
|
||||
namespace leveldb { |
||||
|
||||
// Dump the contents of the file named by fname in text format to
|
||||
// *dst. Makes a sequence of dst->Append() calls; each call is passed
|
||||
// the newline-terminated text corresponding to a single item found
|
||||
// in the file.
|
||||
//
|
||||
// Returns a non-OK result if fname does not name a leveldb storage
|
||||
// file, or if the file cannot be read.
|
||||
Status DumpFile(Env* env, const std::string& fname, WritableFile* dst); |
||||
|
||||
} // namespace leveldb
|
||||
|
||||
#endif // STORAGE_LEVELDB_INCLUDE_DUMPFILE_H_
|
@ -0,0 +1,54 @@
@@ -0,0 +1,54 @@
|
||||
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style license that can be
|
||||
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
||||
|
||||
#include "util/hash.h" |
||||
#include "util/testharness.h" |
||||
|
||||
namespace leveldb { |
||||
|
||||
class HASH { }; |
||||
|
||||
TEST(HASH, SignedUnsignedIssue) { |
||||
const unsigned char data1[1] = {0x62}; |
||||
const unsigned char data2[2] = {0xc3, 0x97}; |
||||
const unsigned char data3[3] = {0xe2, 0x99, 0xa5}; |
||||
const unsigned char data4[4] = {0xe1, 0x80, 0xb9, 0x32}; |
||||
const unsigned char data5[48] = { |
||||
0x01, 0xc0, 0x00, 0x00, |
||||
0x00, 0x00, 0x00, 0x00, |
||||
0x00, 0x00, 0x00, 0x00, |
||||
0x00, 0x00, 0x00, 0x00, |
||||
0x14, 0x00, 0x00, 0x00, |
||||
0x00, 0x00, 0x04, 0x00, |
||||
0x00, 0x00, 0x00, 0x14, |
||||
0x00, 0x00, 0x00, 0x18, |
||||
0x28, 0x00, 0x00, 0x00, |
||||
0x00, 0x00, 0x00, 0x00, |
||||
0x02, 0x00, 0x00, 0x00, |
||||
0x00, 0x00, 0x00, 0x00, |
||||
}; |
||||
|
||||
ASSERT_EQ(Hash(0, 0, 0xbc9f1d34), 0xbc9f1d34); |
||||
ASSERT_EQ( |
||||
Hash(reinterpret_cast<const char*>(data1), sizeof(data1), 0xbc9f1d34), |
||||
0xef1345c4); |
||||
ASSERT_EQ( |
||||
Hash(reinterpret_cast<const char*>(data2), sizeof(data2), 0xbc9f1d34), |
||||
0x5b663814); |
||||
ASSERT_EQ( |
||||
Hash(reinterpret_cast<const char*>(data3), sizeof(data3), 0xbc9f1d34), |
||||
0x323c078f); |
||||
ASSERT_EQ( |
||||
Hash(reinterpret_cast<const char*>(data4), sizeof(data4), 0xbc9f1d34), |
||||
0xed21633a); |
||||
ASSERT_EQ( |
||||
Hash(reinterpret_cast<const char*>(data5), sizeof(data5), 0x12345678), |
||||
0xf333dabb); |
||||
} |
||||
|
||||
} // namespace leveldb
|
||||
|
||||
int main(int argc, char** argv) { |
||||
return leveldb::test::RunAllTests(); |
||||
} |
Loading…
Reference in new issue