Implementing a key-value database in Golang - Part 1

/images/post/kvdb-1.jpg

This is the first of several blog posts about implementing a simple key-value store in Golang. All code described in this blog post can be found here: https://github.com/olif/kvdb.

Other posts in this series:

Intro

I had just started to read the fantastic book Designing Data-Intensive Applications by Martin Kleppman. Chapter 3 covers Storage and Retrieval and among other things it explains how many databases work under the hood. The chapter starts out with a very simple example of key-value store written in bash.

db_set () {
    echo "$1,$2" >> database
}

db_get () {
    grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}

Even though the code above is just a simple toy example it is functioning and has surprisingly good write performance (read performance not so much) since it's just appending data to a file.

Another simple example of a key-value store with good performance is a hashtable. Consider the map implementation in golang:

var store = map[string][]byte{}

func set(key string, val []byte) {
	store[key] = val
}

func get(key) []byte) ([]byte, bool) {
	val, ok := table[key]
	return val, ok
}

Which naively has O(1) performance for both write and read operations (can't find any guarantees about the actual map performance).

Both these implementations suffers from shortcomings like:

These issues can be resolved and this is what this and the follow-up blog-posts are about. I will start out with laying the groundwork of the database and then provide a very simple in-memory implementation which I over the course of this blog-series will improve in terms of durability, reliability and performance รก la Designing Data-Intensive Applications.

Requirements

The database I am going to develop will be named KVDB (Key-Value Database). The implementation will be done in Golang since it has great concurerncy primitives, a good stdlib and is garbage collected (and I haven't tried Rust yet). Moreover it should

. Note that building a production ready database is not a goal of this project which means that simplicity and readability of the code will take precedence over micro-optimizations and/or handling certain edge cases.

Foundation

For the rest of this post, I will lay the foundaton of the database, create the project, define the database interface and make a very simple in-memory implementation.

The database interface

What is the minimal interface for our key-value store? At least, it should support the [GET, PUT, DELETE] operations. Also, I need a way to signal that a value was not found and perhaps also some mechanism to signal other consumer/client errors.

Let's make an ansatz of the database interface in terms of a Golang interface.

// Store defines the kvdb public interface
type Store interface {
	// Get returns the value for the given key or any error encountered. If the
	// key was not found it will return a NotFoundError.
	Get(key string) ([]byte, error)
	// Put stores the value. It will return an BadRequestError if the provided
	// data was invalid or any other error encountered.
	Put(key string, value []byte) error
	// Delete deletes the value for the given key
	Delete(key string) error
	// Close closes the database and returns when all internal processes
	// has stopped. It returns any error encountered.
	Close() error
	// Returns true if the error signals that the given key was not found.
	IsNotFoundError(err error) bool
	// Returns true if the error signals that the consumer did something wrong.
	IsBadRequestError(err error) bool
}

Besides the already stated operations I added methods for checking the type of some special errors and a Close method.

HTTP server

Now that I have defined the contract for our database I will also expose it over a http api. Since the keys are restricted to be of string type, I can just let the key act as a REST-api resource:

# Get value, return 404 if the key was not found
$> curl -X "GET" host:port/{key}
value

# Add/update value
$> curl -X "POST" -d 'value' host:port/{key}
$> curl -X "PUT" -d 'value' host:port/{key}

# Delete value
$> curl -X "DELETE" host:port/{key}

In memory implementation

Our first attempt at an implementation of the Store interface will be a simple in-memory engine. Storing data in volatile memory makes up a bad database but can have its uses like a cache for instance (like a non-distributed Memcached).

The in-memory implementation is pretty straightforward:

package inmemory

import (
	"fmt"
	"log"
	"sync"

	"github.com/olif/kvdb/pkg/kvdb"
)

// Store is a simple key value store which keeps the state in-memory. It
// has the performance characteristics of go:s map structure
type Store struct {
	maxRecordSize int
	logger        *log.Logger

	sync.RWMutex
	table map[string][]byte
}

// Config contains the configuration properties for the inmemory store
type Config struct {
	MaxRecordSize int
	Logger        *log.Logger
}

// NewStore returns a new memory store
func NewStore(config Config) *Store {
	return &Store{
		maxRecordSize: config.MaxRecordSize,
		logger:        config.Logger,

		table: map[string][]byte{},
	}
}

// Get returns the value associated with the key or a kvdb.NotFoundError if the
// key was not found, or any other error encountered
func (s *Store) Get(key string) ([]byte, error) {
	s.RLock()
	v, ok := s.table[key]
	s.RUnlock()
	if !ok {
		return nil, kvdb.NewNotFoundError(key)
	}

	return v, nil
}

// Put saves the value to the database and returns any error encountered
func (s *Store) Put(key string, value []byte) error {
	size := len([]byte(key)) + len(value)
	if size > s.maxRecordSize {
		msg := fmt.Sprintf("key-value too big, max size: %d", s.maxRecordSize)
		return kvdb.NewBadRequestError(msg)
	}
	s.Lock()
	s.table[key] = value
	s.Unlock()
	return nil
}

// Delete removes the value from the store
func (s *Store) Delete(key string) error {
	s.Lock()
	delete(s.table, key)
	s.Unlock()
	return nil
}

// Close closes the store
func (s *Store) Close() error {
	s.logger.Print("Closing database")
	return nil
}

// IsNotFoundError returns true if the error signales that a non-existing key
// was requested
func (s *Store) IsNotFoundError(err error) bool {
	return kvdb.IsNotFoundError(err)
}

// IsBadRequestError returns true if the error, or any of the wrapped errors
// is of type BadRequestError
func (s *Store) IsBadRequestError(err error) bool {
	return kvdb.IsBadRequestError(err)
}

. It's just a wrapper around Go:s built-in map data structure with locking. Note that I choose to not use go:s built-in sync.Map which is safe for concurrent use without the use of additional locking. The reasons for this is that sync.Map is non-generic, it can only store interface{} which means that I loose type safety.

The other reason described in this blog-post is that the performane actually is worse than the built-in map with RwMutex when utilizing a small number of cores. As always this is a trade-off and now I choose to keep the type-safety at the expense of better performance gains when adding cores.

Summary

I have defined an interface for our key-value database, exposed it via a http api and made a simple in-memory implementation as a first attempt of a database engine.

This implementation has very good performance since all data resides in memory and is stored in a hashtable. It is not durable however and in the event of a crash all data will be lost. This could be fine for a cache but it is a show-stopper for an actual database.

In the next blog post I will make the database durable by making another implementation of the interface but as you will see, it will come at the expense of performance.

comments powered by Disqus