Subtopic: Database & storage

Company and industry news, featured projects, open source code, tech tips, and more.

Better search results with character n-grams

Michael Argentini Avatar
Michael ArgentiniWednesday, February 11, 2026

Vector search has been around for a long time. For example, Google has been using it since the late 1990s. This powerful, almost magical technology serves as a core component of most web and app services today, including modern AI-powered search using retrieval augmented generation (RAG). It provides a low-power way to leverage AI's natural language processing to find data with semantic context.

What is a vector database?

A vector database stores data as numerical embeddings that capture the semantic meaning of text, images, or other content, and it retrieves results by finding the closest vectors using similarity search rather than exact matches. Sounds like an AI large language model (LLM) right? This makes it great for web and app content because users can search by meaning and intent, not just keywords. So synonyms and loosely related concepts still match. It also scales efficiently which is one reason large organizations like Google have used it.

A happy side effect of how these platforms work is that they're also good at handling misspellings, to a point. To really get robust handling of spelling variations, however, two strategies tend to be common:

  1. Spell correct the actual search text before using it

  2. Include character n-grams in your vector database entries

What are character n-grams?

Character n-grams are vector embeddings (like commonly used semantic embeddings) that break text into overlapping sequences of characters, allowing vector search systems to better match terms despite typos, inflections, or spelling variations. Without these n-grams, a misspelled query like "saracha sauce" would likely return a higher score for "hot sauce" entries. But including character n-grams, a combined (fused) search would more consistently return a higher score for items with the correct spelling "sriracha sauce".

Using these n-grams can better handle searches with:

  • typos

  • missing letters

  • swapped letters

  • phonetic-ish variants

  • common misspellings

How does this work? At a high level, it adds a character match capability to the standard semantic search used by most vector database implementations. Here's a quick example of what happens under the hood. Take the first word in our previous example:

sriracha

  • 3-grams: sri, rir, ira, rac, ach, cha

  • 4-grams: srir, rira, irac, rach, acha

saracha

  • 3-grams: sar, ara, rac, ach, cha

  • 4-grams: sara, arac, rach, acha

Shared grams:

  • shared 3-grams: rac, ach, cha

  • shared 4-grams: rach, acha

So even though the beginning is wrong (sri vs sa), the ending chunks that carry a lot of the distinctive shape of "sriracha" survive (racha, acha, cha). And since the second word is the same, they have even more matching grams.

When these matches are fused with semantic matches, it adds weight to the correctly spelled "sriracha sauce" entry, yielding a better match set.

How to use character n-grams

When it comes to including character n-grams, there are only a couple changes you need to make to a standard semantic vector database implementation:

  1. When you generate embeddings, you also need to generate character n-gram embeddings; this is true both when you store data in the database, and when you search.

  2. When searching, you need to execute a search both on the semantic vectors and the n-gram vectors, then fuse the results using Reciprocal Rank Fusion (RRF), which is a great way to merge disparate result sets and combine the scores.

The following samples will fill those gaps. They are written with C# for .NET, which is part of a common stack we use to build cross-platform, secure, high-performance web and mobile apps and services for our clients. We also tend to prefer the vector database Qdrant for its performance, maintainability, and open source model. So that is also referenced in the samples.

References to AiService.GenerateEmbeddingsAsync() are not covered here. Essentially it's a method to generate standard semantic embeddings. Replace that with your own (likely existing) method. And references to QdrantService.Client are merely references to a standard Qdrant client provided by the Qdrant Nuget package.

Note: Some of the code was generated by AI, but was reviewed and refactored by an actual human developer (me!).

Character n-gram helper

First, you need a way to create n-grams. The CharNGramEmbedding class below will fill that gap. It allows you to generate character n-grams for a given string, and it also provides a method for fusing the semantic and n-gram search results into a single, weighted result set.

using System.Globalization;

namespace MyApp.Extensions;

/// <summary>
/// Generates a typo-robust, fixed-length dense vector representation of text
/// using hashed character n-grams.
/// </summary>
public static class CharNGramEmbedding
{
    /// <summary>
    /// Generates a normalized dense embedding vector for the specified text
    /// using hashed character n-grams.
    /// </summary>
    /// <param name="text">
    /// The input text to embed.
    /// </param>
    /// <param name="dims">
    /// The dimensionality of the output vector. Higher values reduce hash
    /// collisions at the cost of additional memory and storage.
    /// A value of 256 is a good default for typo-robust search.
    /// </param>
    /// <param name="minGram">
    /// The minimum character n-gram size to generate.
    /// Smaller values increase recall but may introduce noise.
    /// </param>
    /// <param name="maxGram">
    /// The maximum character n-gram size to generate.
    /// Larger values emphasize longer, more specific substrings.
    /// </param>
    public static float[] Embed(string text, int dims = 256, int minGram = 3, int maxGram = 4)
    {
        ArgumentOutOfRangeException.ThrowIfNegativeOrZero(dims);

        var v = new float[dims];
        var normalized = Normalize(text);

        if (normalized.Length == 0)
            return v;

        // Add boundary markers so "sriracha" and "sriracha sauce"
        // still share useful grams
        var s = $"^{normalized}$";

        for (var n = minGram; n <= maxGram; n++)
        {
            if (s.Length < n)
                continue;

            for (var i = 0; i <= s.Length - n; i++)
            {
                var gram = s.AsSpan(i, n);

                // Hash n-gram → index
                var h = Fnv1A32(gram);
                var idx = (int)(h % (uint)dims);

                // Optional sign-hash reduces collisions bias
                var sign = ((h & 1u) == 0u) ? 1f : -1f;

                v[idx] += sign;
            }
        }

        // L2 normalize for cosine similarity
        // (or dot product on normalized vectors)
        L2NormalizeInPlace(v);

        return v;

        static string Normalize(string input)
        {
            if (string.IsNullOrWhiteSpace(input))
                return string.Empty;

            // lowercase + strip accents + keep letters/digits/spaces
            var lower = input.ToLowerInvariant().Normalize(NormalizationForm.FormD);
            var sb = new StringBuilder(lower.Length);

            foreach (var ch in lower)
            {
                var uc = CharUnicodeInfo.GetUnicodeCategory(ch);
            
                if (uc == UnicodeCategory.NonSpacingMark)
                    continue;

                // ignore punctuation
                if (char.IsLetterOrDigit(ch))
                    sb.Append(ch);
                else if (char.IsWhiteSpace(ch) || ch == '-' || ch == '_')
                    sb.Append(' ');
            }

            // collapse spaces
            return string.Join(' ', sb.ToString().Split(' ', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries));
        }

        static uint Fnv1A32(ReadOnlySpan<char> s)
        {
            const uint offset = 2166136261;
            const uint prime = 16777619;

            var hash = offset;
            
            for (var i = 0; i < s.Length; i++)
            {
                // hash UTF-16 chars (fine for this purpose)
                hash ^= s[i];
                hash *= prime;
            }

            return hash;
        }

        static void L2NormalizeInPlace(float[] v)
        {
            double sumSq = 0;

            for (var i = 0; i < v.Length; i++)
                sumSq += (double)v[i] * v[i];
            
            if (sumSq <= 0)
                return;

            var inv = (float)(1.0 / Math.Sqrt(sumSq));
            
            for (var i = 0; i < v.Length; i++)
                v[i] *= inv;
        }
    }
    
    /// <summary>
    /// Fuses multiple ranked result lists using <b>Reciprocal Rank Fusion (RRF)</b>.
    /// RRF is robust when combining heterogeneous retrieval signals (e.g. semantic
    /// embeddings and character n-gram embeddings) whose raw scores are not directly
    /// comparable.
    /// </summary>
    /// <param name="a">
    /// The first ranked result list (e.g. results from a semantic embedding search),
    /// ordered from best to worst. The list should already be truncated to a reasonable
    /// top-K size.
    /// </param>
    /// <param name="b">
    /// The second ranked result list (e.g. results from a character n-gram or typo-robust
    /// search), ordered from best to worst. The list should already be truncated to a
    /// reasonable top-K size.
    /// </param>
    /// <param name="getId">
    /// A function that extracts a stable, unique identifier from a result item.
    /// This identifier is used to merge and score items that appear in multiple lists.
    /// </param>
    /// <param name="take">
    /// The maximum number of fused results to return after applying Reciprocal Rank Fusion.
    /// </param>
    /// <param name="k">
    /// The RRF rank constant. Higher values reduce the impact of rank position differences
    /// between lists. Typical values range from 50 to 100; a default of 60 is commonly used
    /// in practice.
    /// </param>
    /// <returns>
    /// A list of fused results ordered by descending RRF score, containing at most
    /// <paramref name="take"/> items.
    /// </returns>
    public static IReadOnlyList<TPoint> FuseScoredPoints<TPoint>(
        IReadOnlyList<TPoint> a,
        IReadOnlyList<TPoint> b,
        Func<TPoint, string> getId,
        int take,
        int k = 60)
    {
        var scores = new Dictionary<string, double>(StringComparer.Ordinal);
        var best = new Dictionary<string, TPoint>(StringComparer.Ordinal);

        Add(a);
        Add(b);

        return scores
            .OrderByDescending(kvp => kvp.Value)
            .Take(take)
            .Select(kvp => best[kvp.Key])
            .ToList();

        void Add(IReadOnlyList<TPoint> list)
        {
            for (var i = 0; i < list.Count; i++)
            {
                var p = list[i];
                var id = getId(p);
                
                if (scores.TryGetValue(id, out var s) == false)
                    s = 0;

                // rank is i+1 (1-based)
                s += 1.0 / (k + (i + 1));
                scores[id] = s;

                // keep a representative point object
                best.TryAdd(id, p);
            }
        }
    }
}

Example upsert to Qdrant

Now that you have the character n-gram generation and fusion handled, following is an example of performing a Qdrant upsert of a sample food object, including both sets of vectors.

/// <summary>
/// Generates embeddings (semantic and character n-grams), and upserts data to Qdrant.
/// </summary>
/// <param name="food"></param>
/// <param name="json"></param>
/// <returns></returns>
public async Task<bool> UpsertFoodItemAsync(SampleFoodItem? food)
{
    if (food?.Description is null)
        return false;
    
    var semantic = await AiService.GenerateEmbeddingsAsync(food.Description) ?? [];
    var chargram = CharNGramEmbedding.Embed(food.Description);
    
    if (semantic.Length != AiService.SemanticEmbeddingSize || chargram.Length != AiService.CharGramEmbeddingSize)
        return false;

    var point = new PointStruct
    {
        Id = food.Id,
        Vectors = new Dictionary<string, float[]>
        {
            ["semantic"] = semantic,
            ["chargram"] = chargram,
        },
        Payload = 
        {
            ["description"] = food.Description
        }                
    };

    var result = await QdrantService.Client.UpsertAsync("food-collection", [point]);

    return result.Status == UpdateStatus.Completed;
}

Example Qdrant search

Lastly, the following example shows how you can search the Qdrant data using both sets of vectors. Embeddings (semantic and character n-grams) for the prompt are generated and used in the search.

For the best fused results each search (semantic, n-grams) needs to return 3-5 times the number of the final result set. This is because you're trying to recover a good final top-K from two imperfect retrievers. If each retriever only returns exactly K (or close to it), you often don't have enough overlap + near misses to let fusion do its job, especially when the two methods return different items, and rank positions aren't directly comparable.

/// <summary>
/// Search food data items.
/// </summary>
/// <param name="prompt">
/// Search text prompt can be a question or just search text (e.g. keywords)
/// </param>
/// <param name="cancellationToken"></param>
/// <returns></returns>
public async Task<List<ScoredPoint>> SearchFoodItemsAsync(string prompt, CancellationToken cancellationToken = default)
{
    const int MaxSearchResults = 5;

    var semantic = await AiService.GenerateEmbeddingsAsync(prompt);
    var chargram = CharNGramEmbedding.Embed(prompt);

    var semanticHits = await QdrantService.Client.SearchAsync(
        "food-collection",
        semantic,
        limit: MaxSearchResults * 5, // extra results padding for fusing
        vectorName: "semantic",
        cancellationToken: cancellationToken
    );

    var chargramHits = await QdrantService.Client.SearchAsync(
        "food-collection",
        chargram,
        limit: MaxSearchResults * 5, // extra results padding for fusing
        vectorName: "chargram",
        cancellationToken: cancellationToken
    );
    
    return CharNGramEmbedding.FuseScoredPoints(
        semanticHits,
        chargramHits,
        getId: p => p.Id.ToString(),
        take: MaxSearchResults
    ).OrderByDescending(o => o.Score).ToList();
}

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

Datoids: a private data storage service for websites and apps

Michael Argentini Avatar
Michael ArgentiniThursday, September 18, 2025

Cloud storage services, like Google Cloud Firestore, are a common solution for scalable website and app data storage. But sometimes there are compliance mandates that require data services to be segregated, self-hosted, or otherwise provide enhanced security. There are also performance benefits when your data service is on the same subnet as your website or API. This is why we built the Datoids data service.

The Datoids data service is a standalone platform that can be hosted on Linux, macOS, or Windows. It uses Microsoft SQL Server as the database engine, and provides a gRPC API that is super fast because commands and data transfers are binary streams sent over an HTTP/2 connection. In addition to read/update/store functionality, it also provides a freetext search. We've been using it in production environments with great success.

Management

Although the API can be used to completely control the platform, Datoids also includes a separate web management interface. It provides a way to configure collections, API keys, and even browse and search data, and add/edit/delete items. We've embedded Microsoft's simple but powerful Monaco editor (the same one used for VS Code) for editing data.

Datoids Manager Tour

The architecture is clean. Projects are organizational structures like folders in a file system. Collections act like spreadsheets (or tables in SQL parlance) filled with your data. There are also service accounts that are used to access the data from your website or app.

Usage

To make using it as easy as possible, we built a .NET client package that can be included in any .NET project, so that using Datoids requires no knowledge of gRPC or HTTP/2, since reading and storing data is done using models or anonymous types.

Getting a value from Datoids is simple:

var result = await DatoidsClient.Documents.GetAsync("Quotepedia", "People", peopleId);

if (result.Success == false)
    return Result.Fail(result.Message);
        
var person = result.GetDocument<Person?>();

/* Or get items by native query and order by clauses */

var result = await DatoidsClient.Documents.GetAsync("Quotepedia", "People", new QueryRequest
        {
            NativeQuery = $"$$.firstName = 'Al'",
            NativeOrderBy = "$$.lastName"
        });

Likewise, storing data is just as easy.

var result = await DatoidsClient.Documents.AddAsync("Quotepedia", "People", new Person
        {
            FirstName = "Al",
            LastName = "Dente"
        });

You can also modify data without replacing the entire object.

var result = await DatoidsClient.Documents.ModifyAsync("Quotepedia", "People", personId, new JsonModifyRequest
        {
            Path = "$.firstName",
            Value = "Albert"
        });

There are plenty of other ways to read and write date as well, combining primary key and native query options. You can even perform bulk transactions.

If you have a website or app platform that needs a robust and performant data service, let us know! We can provide a demo and answer any questions.

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

Improve SSD performance and reliability

Michael Argentini Avatar
Michael ArgentiniWednesday, May 28, 2025

Modern computers, laptops, and mobile devices use solid state drive (SSD) storage, which is power efficient and fast! Since SSDs have no moving parts they're also more durable than hard disk technology in mobile scenarios.

But SSD storage does have limitations. Two primary concerns are:

  • Data can only be written a finite number of times
  • Data is not reliably stored for long periods of time when powered off

How SSDs work

Essentially, data is written to SSD storage as static charges in individual cells. These cells would normally hold a charge for a very long time, but the act of charging the cell is destructive. It takes a high voltage to weaken the cell barrier before the cell can be charged. And every time a cell is written the barrier is permanently weakened. Eventually the cell will not be able to reliably store a charge.

SSDs manage this problem in a few ways. One tactic is wear leveling, which means that data isn't generally written to the same cell. The drive writes to new cells as often as possible. This levels out the wear across all cells. Another strategy they use is to keep a bank of extra (hidden) cells available. When the SSD sees that a cell is sufficiently "bad", one of the "backup" cells will take its place. All of this happens in the background.

The problem

As cells lose their ability to hold a charge, the first symptom is a slowdown in reads. The SSD will try to read a cell, which sometimes returns a bad value (according to an ECC check), so it has to read it again, likely at a different voltage. Eventually the cell returns the correct value. But these repeated read attempts noticeably slow overall drive performance.

For computers and SSD drives that stay powered off for extended periods, you'll see advice that recommends turning on the device every so often. But all that really does is give the SSD a chance to mark bad cells, provided the device tells it to read or write to that bad cell in the first place. Some high end SSDs will perform periodic cell rewrites to refresh the data on their own, but consumer SSDs don't typically do this. To be clear: powering up an SSD does not recharge the cells or truly address these issues.

The solution

New SSDs can reliably store data for several years without power. But after actively using an SSD for months or years, it makes sense to begin periodically refreshing the cells. This not only ensures more reliable storage over time, it can also noticeably speed up SSD performance.

I ran some tests on my local workstation to verify these claims. I used a 2 year old MacBook Pro with an SSD boot drive that has remained more than half empty, ensuring lots of new cells were available for writes. It has had several OS upgrades and a couple format/rebuilds.

That Mac booted to login in 16.6 seconds. After refreshing the SSD with the same data, it booted to login in 14 seconds, which is over 15% faster. This indicates that overall performance should also improve, at least with regard to storage transfers anyway. So even on a relatively current machine there was a noticeable speed increase. As a software developer, the biggest benefit for me was the improved reliability.

So, if you want to refresh an SSD, following are some quick guides to help you through the process.

Refresh a Windows SSD

The easiest way to refresh your SSD on Windows is to use SpinRite (https://www.grc.com/sr/spinrite.htm). This is a time-tested, rock solid utility for hard disk maintenance and recovery, which can also handle SSD storage. Run this tool on level 3 to refresh all the cells and map out any bad cells. It will also work wonders on your hard disks.

Note: you need a computer with an Intel chip. SpinRite will not run on Arm.

Another way to do this without additional software is to make a system image of your drive using the poorly named "Backup and Restore (Windows 7)" control panel. This clones your entire drive (even the recovery partition) to a USB flash drive or other external media. You can then boot into recovery mode and restore the entire drive from that system image. You'll end up with the same PC with all your files intact. And you will have a backup of your drive for future use.

Choose the poorly named Backup and Restore (Windows 7) option.
Use the "Create a system image" option in the left column.
When your image is created, use the "Restart now" option under "Advanced startup", and when the computer restarts, choose the advanced option and recover from a system image.

Both of these methods will return your SSD to like-new performance, and ensure longer data retention.

Refresh a Mac SSD

Unlike with Windows, there are no great utilities like SpinRite for modern Apple Silicon Macs. But fear not! There is a way to refresh SSD cells using the built-in Time Machine feature. And it's pretty easy to use. You will be backing up your Mac, then erasing it, reinstalling macOS, and then restoring the backup.

Connect an external storage device to your Mac and configure it in Time Machine as your backup device. Then run a backup.

Time Machine

Note: some applications, like Docker, do not allow Time Machine to back up their data by default. In the case of Docker there is an option to enable this.

Once you have a complete backup, restart your Mac into recovery mode. On modern Apple Silicon Macs you just shut down the computer. Then turn it back on by pressing the power button until the Mac tells you it is loading startup options.

Use Disk Utility to erase the SSD, and then choose to reinstall macOS.

The recovery menu provides access to Disk Utility and macOS reinstallation.

After the OS is installed it will restart and run Migration Assistant.

Migration Assistant

Choose to transfer files from Time Machine, and follow the instructions. It will show you all Time Machine backups for connected drives. Choose the latest backup entry for your backup drive, and let Migration Assistant do its thing. You will be left with a refreshed SSD with all your files intact.

Refresh schedule

The research in this area is nascent, so the optimal frequency for refreshing your SSD cells really depends on how well it is performing, how many writes have been made, and how full it is on average. On my server data drive I rarely write new files. But the data is very important. So I'm planning on refreshing the cells yearly just to be safe.

So how often should you run this process? If your SSD is new or averages under 50% usage, and is under 10 years old, I would do this yearly through that period. As your SSD ages (or if you have a mostly full SSD) it may be better to run it more frequently.

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

SqlPkg for Microsoft SqlPackage

Michael Argentini Avatar
Michael ArgentiniMonday, July 1, 2024

SqlPkg is a 64-bit .NET command line (CLI) wrapper for the Microsoft SqlPackage CLI tool with the goal of making common backup and restore operations easier and more powerful. It does this through new Backup and Restore actions that provide additional features like the exclusion of specific table data in backups and destination prep prior to restore.

Visit the repository to see how you can install this tool to begin using it right away.

New action modes:

/Action:backup
This mode is equivalent to Action:Export to create a .bacpac file, with the following differences.

  • Specify one or more /p:ExcludeTableData= properties to exclude specific table data from the bacpac file. The table name format is the same as the /p:TableData= property.
  • /SourceTrustServerCertificate: defaults to true.
  • /SourceTimeout: defaults to 30.
  • /CommandTimeout: defaults to 120.
  • /p:VerifyExtraction= defaults to false.
  • Destination file paths will be created if they do not exist.

/Action:restore
This mode is equivalent to Action:Import to restore a .bacpac file, with the following differences.

  • The destination database will be purged of all user objects (tables, views, etc.) before the restoration.
  • If the destination database doesn't exist it will be created.
  • /TargetTrustServerCertificate: defaults to true.
  • /TargetTimeout: defaults to 30.
  • /CommandTimeout: defaults to 120.
  • Destination file paths will be created if they do not exist.

/Action:backup-all
This mode will back up all user databases on a server.

  • Provide a source connection to the master database.
  • Provide a target file path ending with 'master.bacpac'. The path will be used as the destination for each database backup file, ignoring 'master.bacpac'.
  • Optionally provide a log file path ending with 'master.log'. The path will be used as the destination for each database backup log file, ignoring 'master.log'.
  • Accepts all arguments that the Backup action mode accepts.

/Action:restore-all
This mode will restore all *.bacpac files in a given path to databases with the same names as the filenames.

  • Provide a source file path to 'master.bacpac' in the location of the bacpac files. The path will be used as the source location for each database backup file to restore, ignoring 'master.bacpac'.
  • Provide a target connection to the master database.
  • Optionally provide a log file path ending with 'master.log'. The path will be used as the destination for each database backup log file, ignoring 'master.log'.
  • Accepts all arguments that the Restore action mode accepts.

When not using SqlPkg special action modes, the entire argument list is simply piped to SqlPackage and will run normally. So you can use sqlpkg everywhere SqlPackage is used.

Installation

1. Install Microsoft .NET

SqlPkg requires that you already have the .NET 8.0 runtime installed, which you can get at https://dotnet.microsoft.com/en-us/download.

Because SqlPkg uses Microsoft SqlPackage, you will also need to install the .NET 6.0 runtime as well as SqlPackage.

dotnet tool install -g microsoft.sqlpackage

2. Install SqlPkg

Run the following command in your command line interface (e.g. cmd, PowerShell, Terminal, bash, etc.):

dotnet tool install --global fynydd.sqlpkg

Later you can update SqlPkg with the following command:

dotnet tool update --global fynydd.sqlpkg

Uninstall

If you need to completely uninstall SqlPkg, use the command below:

dotnet tool uninstall --global fynydd.sqlpkg
Screenshots

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

DataStore

Michael Argentini Avatar
Michael ArgentiniThursday, February 15, 2024

The DataStore project is a high performance JSON object store (ORM) for SQL Server.

DataStore uses and automatically creates and manages a pre-defined SQL Server data structure that can coexist with existing database objects. All database operations are performed with the DataStore helper class.

Your models are stored in the database as JSON text so you can have most any kind of object structure, provided your models inherit from DsObject.

Basic example

Instantiating DataStore with settings is non-destructive. Any existing DataStore tables are left untouched. Methods to delete all or unused schema objects are provided for those edge cases.

Models

Instantiate DataStore with a settings object and database schema will be created for all classes that inherit from DsObject. The following attributes can be used in your classes:

  • DsNoDatabaseTable prevents DataStore from creating a table for the class.
  • DsUseLineageFeatures enables lineage features for that table; add to the class itself.
  • DsSerializerContext(typeof(...)) to provide a de/serialization speed boost by using source generator JsonSerializationContext classes for each table; add to the class itself.
  • DsIndexedColumn generates a SQL computed column with index for faster queries on that data; add to properties and fields.
  • DsIndexedColumn("Food","Email") generates indexed SQL computed columns for faster queries on the dictionary key names specified; add to Dictionary properties and fields.
[DsUseLineageFeatures]
[DsSerializerContext(typeof(UserJsonSerializerContext))]
public class User: DsObject
{
    [DsIndexedColumn]
    public string Firstname { get; set; }
    
    [DsIndexedColumn]
    public int Age { get; set; }
    
    public List<Permissions> Permissions { get; set; }
    
    [DsIndexedColumn("Food", "Color")]
    public Dictionary<string, string> Favorites { get; set; } = new();
}

[JsonSerializable(typeof(User))]
[JsonSourceGenerationOptions(WriteIndented = false)]
internal partial class UserJsonSerializerContext : JsonSerializerContext
{ }

Construction

You can create a DataStore instance anywhere in your code:

var dataStore = new DataStore(new DataStoreSettings {
    SqlConnectionString = sqlConnectionString,
    UseIndexedColumns = true
});

You can also use DataStore as a singleton service:

services.AddSingleton<DataStore>((factory) => new DataStore(new DataStoreSettings {
    SqlConnectionString = sqlConnectionString,
    UseIndexedColumns = true
}));

Create and save objects

Creating and saving a DataStore object is simple:

var user = new User
{
    FirstName = "Michael",
    LastName = "Fynydd",
    Age = 50,
    Permissions = new List<Permission>
    {
        new() { Role = "user" },
        new() { Role = "admin" },
        // etc.
    }
};

await dataStore.SaveAsync(user);

The saved object is updated with any changes, like lineage and depth information, creation or last update date, etc. And you can provide a list of objects to save them all in one call.

Read objects

Querying the database for objects is simple too. In any read calls you can specify a DsQuery object with a fluent-style pattern for building your query. In the query you can specify property names as strings with dot notation:

var users = await dataStore.GetManyAsync<User>(
    page: 1,
    perPage: 50,
    new DsQuery()
        .StringProp("LastName").EqualTo("Fynydd")
        .AND()
        .StringProp("Permissions.Role").EqualTo("admin")
        .AND()
        .GroupBegin()
            .NumberProp<int>("Age").EqualTo(50)
            .OR()
            .NumberProp<int>("Age").EqualTo(51)
        .GroupEnd(),
    new DsOrderBy()
        .Prop<int>("Age").Ascending()
);

Or you can use the model structure to specify names, and make code refactoring easier:

var users = await dataStore.GetManyAsync<User>(
    page: 1,
    perPage: 50,
    new DsQuery()
        .StringProp<User>(u => u.LastName).EqualTo("Fynydd")
        .AND()
        .StringProp<User, Role>(u => u.Permissions, r => r.Role).EqualTo("admin")
        .AND()
        .GroupBegin()
            .NumberProp<User,int>(u => u.Age).EqualTo(50)
            .OR()
            .NumberProp<User,int>(u => u.Age).EqualTo(51)
        .GroupEnd(),
    new DsOrderBy()
        .Prop<User>(o => o.Age).Ascending()
);

Dynamic property access

If you need to access object properties without knowing the object type, DsObject exposes JSON features that allow you to access property values using standard JSON path syntax:

var users = await dataStore.GetManyAsync<User>(
    page: 1,
    perPage: 50
);

foreach (DsObject dso in users)
{
    dso.Serialize(dataStore);

    var lastName = dso.Value<string>("$.LastName");
    var roles = dso.Values(typeof(string), "$.Permissions.Role");

    // etc.
}

Remember: these JSON features are read-only. If you change a property value in the DsObject you will need to call Serialize() again to update the JSON representation.

Screenshots

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

Fynydd presents at 2012 UK Semantic Tech & Business Conference

Michael Argentini Avatar
Michael ArgentiniFriday, June 22, 2012

We'll be presenting the following topic at the 2012 UK Semantic Tech & Business (SemTech) Conference:

Building a Semantic Enterprise Content Management System from Scratch
2012 UK Semantic Tech & Business Conference
Millennium Gloucester Hotel, London, England
Thursday, September 20, 3:45-4:15 PM

Detail on our program is available on the SemTechBiz UK 2012 site. The conference runs September 19-20. We hope to see you there!

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

Fynydd presents at 2012 Semantic Tech & Business Conference in San Francisco, CA

Michael Argentini Avatar
Michael ArgentiniSunday, June 10, 2012

In early June, Fynydd was invited to speak at the 2012 Semantic Tech & Business Conference in San Francisco, California. By all accounts, Fynydd’s message was unique, informative, and overall very well received by conference attendees.

The topic “How To Build a Sematic Content Management System From Scratch” garnered much interest even before the conference began. During our presentation, we outlined a case study based on a recent Fynydd client engagement. We walked through the successes and challenges Fynydd faced in designing and implementing a semantic prototype system for a large financial institution.

Though primarily invited as a speaker, Fynydd also secured a location on the exhibit floor of the conference so that we could present our full breadth of services to conference attendees. Activity at the exhibit was bustling. Attendees remarked about Fynydd's unique set of service offerings, as well as how we worked with other exhibitor products to build holistic solutions for our clients. Conversations focused heavily on user interface design and overall user experience as well as implementation of semantic technologies across a wide variety of industries and projects.

With the success of this conference, Fynydd was asked to present at the SemTechBiz satellite conference in London later this year. You can find out more by visiting the Fynydd SemTech page and the SemTechBiz London site.

Want to know more?

There's usually more to the story so if you have questions or comments about this post let us know!

Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!

© 2026, Fynydd LLC / King of Prussia, Pennsylvania; United States / +1 855-439-6933

By using this website you accept our privacy policy. Choose the browser data you consent to allow:

Only Required
Accept and Close