Andrew Plewe Andrew Plewe

Hello World

Forward, and an introduction to WellDB

Greetings! If you are here you probably followed some breadcrumb I left out randomly on the fractured interwebs. I have debated internally about whether to Substack, it is nice to have someone else take care of archiving and to “ride the wave”. So, in probably some gross violation of founder rules or whatever, I’m going to do my posting here on the website for my company (shock!). What’s a bit of SEO between friends, right?

I’ve realized that sometimes my writing can be cryptic, especially when my mind is going a million miles per hr and I’m working through an idea. Over the last few months I’ve built out, essentially, my “dream team” of tooling that makes managing and working with test data somewhat possible and, ideally, a thing that takes less than six months to accomplish. However, much like Frankenstein it can come across… a little rough. Everyone uses data in different ways, what SmallMinds has created will, if there’s enough magic fairy dust, provide you with tools you can use to make your life better.

So, welcome! In the coming weeks I will lay out all my Freudian and Orwellian plans to rise up against our “AI” overlords and reclaim the power of Machine Learning for the rest of us. It has been a very unexpected journey getting to this point, but if I expected it then it probably would have never happened.

So, with that in mind I will start with a brief introduction to “the one thing that will change the world!” for which I don’t even have product page yet:

Spooky lights mean data

Introducing WellDB

Hydra grew as a perfect parasite. That’s actually a weird and creepy story, this is nothing like that.

WellDB, however, does take advantage of several nifty things that are true to create a real live Generative Database. This isn’t a front-end for LLMs or anything like that. It is, rather, the marrying of statistical modeling with the powers of SQL, a very shallow (this is a good thing) and transactional relationship meant to generate all of the data you need for testing without:

a.) Storage costs. Data for tables comes from DataFlood models and is generated on-the-fly. A typical DataFlood model is between a few kilobytes up to a couple of megabytes. And, the size for any particular dataset doesn’t change much over time.

b.) Licensing costs. The only elements WellDB uses from different kinds of databases is wire protocols. Therefore you don’t have to burn an expensive software license on your test server. And you can have way more than one and it won’t break the bank!

c.) The performance impacts of large datasets. The problem with a fully-functional and operational database of sufficient size to do real load testing is that you almost have to do more engineering to get really good test data in the quantities necessary than you otherwise normally would. There is great value in those exercises, but there is, thanks to DataFlood, a much easier and cheaper way. DataFlood won’t bog down if you need gazillions of rows — it has to generate that much data, but it doesn’t need to store that much data.

The main drawback to this approach is WellDB is not for performance-testing SQL queries and databases themselves. For that purpose you can use DataFlood to generate a gazillion rows and insert them into an actual database. WellDB is more for integration testing when performance testing of the database isn’t on the menu.

Nevermind the Fluff, Give It To Us Plain (TL/DR)

WellDB is an emulated database server. You can run SQL against it and it will respond with data generated on the spot using DataFlood models. It can show up as one or more different kinds of databases on your network, and we’re ensuring that it can “speak” like all of the databases that it emulates. Your connection code and everything else you use right now (including Entity Framework and LINQ and other ORMs) to communicate with your database will work with WellDB.

So, imagine a scenario where you need to orchestrate moving data between three or four different systems (this never happens in real life…). One might be a Postgres server, one may speak MSSQL, one may be MySql or a document store like MongoDB. All of them can “live” within a single instance of WellDB, but you could also have multiple instances running to represent each database. They can share tables, or you can create tables only, for instance, for the MySQL server. Using Tides configuration files you can specify foreign key relationships, generate sequencing data (such as step counters and specific state flows), and generally control all aspects of how your data is laid out inside each kind of database. Then you can run your SQL queries. Here is what happens with each basic type of query:

1.) Select queries will generate data to satisfy the select statements. Depending on the Where clause, different things can happen — for instance “Where [a] = [b]” can be used to ensure that element [a] is assigned the value of [b] in the result set. This can be useful for fixing an id field to a specific value. Other kinds of Where clauses will operate against the statistical model for an element.

2.) Insert queries will either create a new DataFlood model (or models for multi-table inserts) on the server for that kind of database, if a table doesn’t exist. Otherwise an insert will “train” a model or multiple models, depending on your SQL statement.

3.) Update queries will modify the statistics of a model. For databases that support “update if exists otherwise insert”, an insert will “train” and an update will modify if a value already exists within the range of a particular field/element.

4.) Delete queries will remove values from a model, which may decrease “seen” counts or remove a value entirely, depending on the state of the model.

In this manner you can “exercise” your code against fake databases that act like real databases for testing, without having to set up a real database in a test environment. Because DataFlood models are small, for CI/CD purposes your databases can live entirely in code (without Terraform or other infrastructure-as-code elements) and be deployed as code to a WellDB server on-the-fly. When it’s time to test the IaC elements, you can use DataFlood models as part of the CI/CD flow within those environments to “hydrate” the databases with data after deployment.

Tune in next week

So, that is a brief introduction to WellDB. Over the next few posts I will walk through exercises for using all of the DataFlood elements in various ways, with worked examples and screenshots and video walk-throughs. Thanks for reading!

Read More