Implementing a key-value database in Golang - Part 1
This is the first of several blog posts about implementing a simple key-value store in Golang. All code described in this blog post can be found here: https://github.com/olif/kvdb.
Other posts in this series:
Intro
I had just started to read the fantastic book Designing Data-Intensive Applications by Martin Kleppman. Chapter 3 covers Storage and Retrieval and among other things it explains how many databases work under the hood. The chapter starts out with a very simple example of key-value store written in bash.
db_set () {
echo "$1,$2" >> database
}
db_get () {
grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
Even though the code above is just a simple toy example it is functioning and has surprisingly good write performance (read performance not so much) since it's just appending data to a file.
Another simple example of a key-value store with good performance is a hashtable. Consider the map implementation in golang:
var store = map[string][]byte{}
func set(key string, val []byte) {
store[key] = val
}
func get(key) []byte) ([]byte, bool) {
val, ok := table[key]
return val, ok
}
Which naively has O(1) performance for both write and read operations (can't find any guarantees about the actual map performance).
Both these implementations suffers from shortcomings like:
-
both implementations lacks concurrency control
-
the bash implementation has O(n) read performance
-
the hashtable implementation is not durable since all data is lost in case of a crash
-
the hashtable implementation will have large costs associated to it for more than a trivial amount of data (disk space is cheaper than RAM).
These issues can be resolved and this is what this and the follow-up blog-posts are about. I will start out with laying the groundwork of the database and then provide a very simple in-memory implementation which I over the course of this blog-series will improve in terms of durability, reliability and performance รก la Designing Data-Intensive Applications.
Requirements
The database I am going to develop will be named KVDB (Key-Value Database). The implementation will be done in Golang since it has great concurerncy primitives, a good stdlib and is garbage collected (and I haven't tried Rust yet). Moreover it should
-
be importable as a library from other applications
-
expose the functionality through an optional HTTP API
-
provide [GET, PUT, DELETE] operations, no differentiation between creating and updating a value
-
the key type should be limited to strings
. Note that building a production ready database is not a goal of this project which means that simplicity and readability of the code will take precedence over micro-optimizations and/or handling certain edge cases.
Foundation
For the rest of this post, I will lay the foundaton of the database, create the project, define the database interface and make a very simple in-memory implementation.
The database interface
What is the minimal interface for our key-value store? At least, it should support the [GET, PUT, DELETE] operations. Also, I need a way to signal that a value was not found and perhaps also some mechanism to signal other consumer/client errors.
Let's make an ansatz of the database interface in terms of a Golang interface.
// Store defines the kvdb public interface
type Store interface {
// Get returns the value for the given key or any error encountered. If the
// key was not found it will return a NotFoundError.
Get(key string) ([]byte, error)
// Put stores the value. It will return an BadRequestError if the provided
// data was invalid or any other error encountered.
Put(key string, value []byte) error
// Delete deletes the value for the given key
Delete(key string) error
// Close closes the database and returns when all internal processes
// has stopped. It returns any error encountered.
Close() error
// Returns true if the error signals that the given key was not found.
IsNotFoundError(err error) bool
// Returns true if the error signals that the consumer did something wrong.
IsBadRequestError(err error) bool
}
Besides the already stated operations I added methods for checking the type of
some special errors and a Close
method.
HTTP server
Now that I have defined the contract for our database I will also expose it
over a http api. Since the keys are restricted to be of string
type, I can
just let the key act as a REST-api resource:
# Get value, return 404 if the key was not found
$> curl -X "GET" host:port/{key}
value
# Add/update value
$> curl -X "POST" -d 'value' host:port/{key}
$> curl -X "PUT" -d 'value' host:port/{key}
# Delete value
$> curl -X "DELETE" host:port/{key}
In memory implementation
Our first attempt at an implementation of the Store
interface will be a simple
in-memory engine. Storing data in volatile memory makes up a bad database but
can have its uses like a cache for instance (like a non-distributed
Memcached).
The in-memory implementation is pretty straightforward:
package inmemory
import (
"fmt"
"log"
"sync"
"github.com/olif/kvdb/pkg/kvdb"
)
// Store is a simple key value store which keeps the state in-memory. It
// has the performance characteristics of go:s map structure
type Store struct {
maxRecordSize int
logger *log.Logger
sync.RWMutex
table map[string][]byte
}
// Config contains the configuration properties for the inmemory store
type Config struct {
MaxRecordSize int
Logger *log.Logger
}
// NewStore returns a new memory store
func NewStore(config Config) *Store {
return &Store{
maxRecordSize: config.MaxRecordSize,
logger: config.Logger,
table: map[string][]byte{},
}
}
// Get returns the value associated with the key or a kvdb.NotFoundError if the
// key was not found, or any other error encountered
func (s *Store) Get(key string) ([]byte, error) {
s.RLock()
v, ok := s.table[key]
s.RUnlock()
if !ok {
return nil, kvdb.NewNotFoundError(key)
}
return v, nil
}
// Put saves the value to the database and returns any error encountered
func (s *Store) Put(key string, value []byte) error {
size := len([]byte(key)) + len(value)
if size > s.maxRecordSize {
msg := fmt.Sprintf("key-value too big, max size: %d", s.maxRecordSize)
return kvdb.NewBadRequestError(msg)
}
s.Lock()
s.table[key] = value
s.Unlock()
return nil
}
// Delete removes the value from the store
func (s *Store) Delete(key string) error {
s.Lock()
delete(s.table, key)
s.Unlock()
return nil
}
// Close closes the store
func (s *Store) Close() error {
s.logger.Print("Closing database")
return nil
}
// IsNotFoundError returns true if the error signales that a non-existing key
// was requested
func (s *Store) IsNotFoundError(err error) bool {
return kvdb.IsNotFoundError(err)
}
// IsBadRequestError returns true if the error, or any of the wrapped errors
// is of type BadRequestError
func (s *Store) IsBadRequestError(err error) bool {
return kvdb.IsBadRequestError(err)
}
. It's just a wrapper around Go:s built-in map
data structure with locking.
Note that I choose to not use go:s built-in sync.Map
which is safe for
concurrent use without the use of additional locking. The reasons for this is
that sync.Map
is non-generic, it can only store interface{}
which means that
I loose type safety.
The other reason described in
this blog-post is that the performane actually is worse than the built-in map
with
RwMutex
when utilizing a small number of cores. As always this is a trade-off
and now I choose to keep the type-safety at the expense of better performance
gains when adding cores.
Summary
I have defined an interface for our key-value database, exposed it via a http api and made a simple in-memory implementation as a first attempt of a database engine.
This implementation has very good performance since all data resides in memory and is stored in a hashtable. It is not durable however and in the event of a crash all data will be lost. This could be fine for a cache but it is a show-stopper for an actual database.
In the next blog post I will make the database durable by making another implementation of the interface but as you will see, it will come at the expense of performance.