Vector search has been around for a long time. For example, Google has been using it since the late 1990s. This powerful, almost magical technology serves as a core component of most web and app services today, including modern AI-powered search using retrieval augmented generation (RAG). It provides a low-power way to leverage AI's natural language processing to find data with semantic context.
A vector database stores data as numerical embeddings that capture the semantic meaning of text, images, or other content, and it retrieves results by finding the closest vectors using similarity search rather than exact matches. Sounds like an AI large language model (LLM) right? This makes it great for web and app content because users can search by meaning and intent, not just keywords. So synonyms and loosely related concepts still match. It also scales efficiently which is one reason large organizations like Google have used it.
A happy side effect of how these platforms work is that they're also good at handling misspellings, to a point. To really get robust handling of spelling variations, however, two strategies tend to be common:
Spell correct the actual search text before using it
Include character n-grams in your vector database entries
Character n-grams are vector embeddings (like commonly used semantic embeddings) that break text into overlapping sequences of characters, allowing vector search systems to better match terms despite typos, inflections, or spelling variations. Without these n-grams, a misspelled query like "saracha sauce" would likely return a higher score for "hot sauce" entries. But including character n-grams, a combined (fused) search would more consistently return a higher score for items with the correct spelling "sriracha sauce".
Using these n-grams can better handle searches with:
typos
missing letters
swapped letters
phonetic-ish variants
common misspellings
How does this work? At a high level, it adds a character match capability to the standard semantic search used by most vector database implementations. Here's a quick example of what happens under the hood. Take the first word in our previous example:
sriracha
3-grams: sri, rir, ira, rac, ach, cha
4-grams: srir, rira, irac, rach, acha
saracha
3-grams: sar, ara, rac, ach, cha
4-grams: sara, arac, rach, acha
Shared grams:
shared 3-grams: rac, ach, cha
shared 4-grams: rach, acha
So even though the beginning is wrong (sri vs sa), the ending chunks that carry a lot of the distinctive shape of "sriracha" survive (racha, acha, cha). And since the second word is the same, they have even more matching grams.
When these matches are fused with semantic matches, it adds weight to the correctly spelled "sriracha sauce" entry, yielding a better match set.
When it comes to including character n-grams, there are only a couple changes you need to make to a standard semantic vector database implementation:
When you generate embeddings, you also need to generate character n-gram embeddings; this is true both when you store data in the database, and when you search.
When searching, you need to execute a search both on the semantic vectors and the n-gram vectors, then fuse the results using Reciprocal Rank Fusion (RRF), which is a great way to merge disparate result sets and combine the scores.
The following samples will fill those gaps. They are written with C# for .NET, which is part of a common stack we use to build cross-platform, secure, high-performance web and mobile apps and services for our clients. We also tend to prefer the vector database Qdrant for its performance, maintainability, and open source model. So that is also referenced in the samples.
References to AiService.GenerateEmbeddingsAsync() are not covered here. Essentially it's a method to generate standard semantic embeddings. Replace that with your own (likely existing) method. And references to QdrantService.Client are merely references to a standard Qdrant client provided by the Qdrant Nuget package.
Note: Some of the code was generated by AI, but was reviewed and refactored by an actual human developer (me!).
First, you need a way to create n-grams. The CharNGramEmbedding class below will fill that gap. It allows you to generate character n-grams for a given string, and it also provides a method for fusing the semantic and n-gram search results into a single, weighted result set.
Now that you have the character n-gram generation and fusion handled, following is an example of performing a Qdrant upsert of a sample food object, including both sets of vectors.
Lastly, the following example shows how you can search the Qdrant data using both sets of vectors. Embeddings (semantic and character n-grams) for the prompt are generated and used in the search.
For the best fused results each search (semantic, n-grams) needs to return 3-5 times the number of the final result set. This is because you're trying to recover a good final top-K from two imperfect retrievers. If each retriever only returns exactly K (or close to it), you often don't have enough overlap + near misses to let fusion do its job, especially when the two methods return different items, and rank positions aren't directly comparable.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
AI prompts that reveal insight, bias, blind spots, or non-obvious reasoning are typically called “high-leverage prompts”. These types of prompts have always intrigued me more than any other, primarily because they focus on questions that were difficult or impossible to answer before we had large language models. I'm going to cover a few to get your creative juices flowing. This post isn't a tutorial about prompt engineering (syntax, structure, etc.) it's just an exploration in some ways to prompt AI that you may not have considered.
This one originally came to me from a friend who owns the digital marketing agency Arc Intermedia. I've made my own flavor of it, but it's still focused on the same goal: since potential customers will undoubtedly look you up in an AI tool, what will the tool tell them?
If someone decided not to hire {company name}, what are the most likely rational reasons they’d give, and which of those can be fixed? Focus specifically on {company name} as a company, its owners, its services, customer feedback, former employee reviews, and litigation history. Think harder on this.I would also recommend using a similar prompt to research your company's executives to get a complete picture. For example:
My name is {full name} and I am {job title} at {company}. Analyze how my public profiles (LinkedIn, Github, social networks, portfolio, posts, etc.) make me appear to an outside observer. What story do they tell, intentionally or not?This prompt is really helpful when you need to decide whether or not to respond to a prospective client's request for proposal (RFP). These responses are time consuming (and costly) to do right. And when a prospect is required to use the RFP process but already has a vendor chosen, it's an RFP you want to avoid.
What are the signs a {company name} RFP is quietly written for a pre-selected service partner? Include sources like reviews, posts, and known history of this behavior in your evaluation. Think harder on this but keep the answer brief.People looking for work run into a few roadblocks. One is a ghost job posted only to make the company appear like it's growing or otherwise thriving. Another is a posting for a job that is really for an internal candidate. Compliance may require the posting, but it's not worth your time.
What are the signs a company’s job posting is quietly written for an internal candidate?Another interesting angle a job-seeker can explore are signs that a company is moving into a new vertical or working on a new product or service. In those cases it's helpful to tailor your resume to fit their future plans.
Analyze open job listings, GitHub commits, blog posts, conference talks, recent patents, and press hints to infer what {company name} is secretly building. How should that change my resume below?
{resume text}You'll see all kinds of wild scientific/medical/technical claims on the Internet, usually with very little nuance or citation. A great way to begin verifying a claim is by using a simple prompt like the one below.
Stress-test the claim ‘{Claim}’. Pull meta-analyses, preprints, replications, and authoritative critiques. Separate mechanism-level evidence from population outcomes. Where do credible experts disagree and why?Even if you're a seasoned professional, it's easy to get lost in jargon as new terms are coined for emerging technologies, services, medical conditions, laws, policies, and more. Below is a simple prompt to help you keep up on the latest terms and acronyms in a particular industry.
Which terms of art or acronyms have emerged in the last 12 months around {technology/practice}? Build a glossary with first-sighting dates and primary sources.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
Consider this... is a recurring feature where we pose a provocative question and share our thoughts on the subject. We may not have answers, or even suggestions, but we will have a point of view, and hopefully make you think about something you haven't considered.
As more people use AI to create content, and AI platforms are trained on that content, how will that impact the quality of digital information over time?
It looks like we're kicking off this recurring feature with a mind bending exercise in recursion, thus the title reference to Ouroboros, the snake eating its own tail. Let's start with the most common sources of information that AI platforms use for training.
Books, articles, research papers, encyclopedias, documentation, and public forums
High-quality, licensed content that isn’t freely available to the public
Domain-specific content (e.g. programming languages, medical texts)
These represent the most common (and likely the largest) corpora that will contain AI generated or influenced information. And they're the most likely to increase in breadth and scope over time.
Training on these sources is a double edged sword. Good training content will be reinforced over time, but likewise, junk and erroneous content will be too. Complicating things, as the training set increases in size, it becomes exponentially more difficult to validate. But hey, we can use AI to do that. Can't we?
Here's another thing to think about: bad actors (e.g., geopolitical adversaries) are already poisoning training data through massive disinformation campaigns. According to Carnegie Mellon University Security and Privacy Institute: “Modern AI systems that are trained to understand language are trained on giant crawls of the internet,” said Daphne Ippolito, assistant professor at the Language Technologies Institute. “If an adversary can modify 0.1 percent of the Internet, and then the Internet is used to train the next generation of AI, what sort of bad behaviors could the adversary introduce into the new generation?”
We're scratching the surface here. This topic will certainly become more prominent in years to come. And tackling these issues is already a priority for AI companies. As Nature and others have determined, "AI models collapse when trained on recursively generated data." We dealt with similar issues when the Internet boom first enabled wide scale plagiarism and an easy path to bad information. AI has just amplified the issue through convenience and the assumption of correctness. As I wrote in a previous AI post, in spite of how helpful AI tools can be, the memes of AI fails may yet save us by educating the public on just how often AI is wrong, and that it doesn't actually think in the first place.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
The AI train is currently barreling through Hypeville, and it's easy to be dubious of anything branded with "AI". My previous post, Simulated intelligence definitely factors into this topic. And as I wrote at the time, AI is not what people think it is. But even with its flaws it is a transformative technology and it's here to stay. And one AI technology you're likely hearing/reading about lately is AI agents, and it's one to pay attention to.
AI agents are not (always) covert operatives. They are AI powered services that perform tasks, not just answer questions. They can work as an assistant, helping you as you work, or independently perform tasks on your behalf. Agents are specialists, and can be trained to perform tasks that would otherwise be performed by a person.
AI agents are already being used in your favorite web services, from social media platforms to accounting software. In those cases they're typically used behind the scenes to provide features you may not have thought were possible. For example, your accounting platform could auto-categorize or reconcile transactions before you even sign in for the day. And you may have already seen your favorite AI chat platform scour the web on your behalf to give you more up-to-date answers.
Co-working is another (more visible) way you can experience them. An agent trained on your company information (think bios, product information, marketing materials) can work with you to build your next presentation or update sales materials. It could be used to analyze comments or feedback based on context and sentiment, flagging items for follow up. It could find documents based on heuristics, like phrasing inconsistencies in your brand identity. All the odd edge cases you where you had to manually dig and process information could be delegated to an AI agent.
Here's one that everyone will love. Imagine being able to ask your computer to not only find that system setting you can never find, but even ask it to just "do the thing". For example, if there are numerous settings that control performance mode on your laptop, the agent knows which ones to change for you before you run that important presentation.
If all this sounds interesting, there are ways you can play with AI agents on your own and work them into your daily life in meaningful ways. As a software developer I've been using AI agents to enhance my workflow. One I've been using is Github Copilot. It can help perform refactoring and create unit tests, saving me typing and cognitive load so I can focus on planning, strategy, and creative tasks.
You can also try ChatGPT agent. ChatGPT can now do work for you using its own computer, handling complex tasks from start to finish. According to OpenAI:
You can now ask ChatGPT to handle requests like “look at my calendar and brief me on upcoming client meetings based on recent news,” “plan and buy ingredients to make Japanese breakfast for four,” and “analyze three competitors and create a slide deck.” ChatGPT will intelligently navigate websites, filter results, prompt you to log in securely when needed, run code, conduct analysis, and even deliver editable slideshows and spreadsheets that summarize its findings.
There is also a new standard that allows AI platforms to communicate with web services to more reliably and securely perform tasks. It's called Model Context Protocol (MCP). As this tech works its way through various software and services, we'll see more agent-driven features that make a real difference in our lives.
As I wrote at the outset, it's very likely that you're already using AI agents on your favorite social and productivity platforms but weren't aware. They'll be powering more of our digital lives over time and, personally, I welcome our new simulated intelligence overlords (ha!).
It's safe to say that AI agents are the real deal. So we should all strap in and hold on tight. This is going to be exciting!
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
The term AI is more of misleading brand name than an accurate description of technology. That distinction is causing real problems for people in many industries who increasingly rely on AI tools. One example of this is the case of Judge Julien Xavier Neals of the District of New Jersey, who had to withdraw his entire opinion after a lawyer politely pointed out that it was riddled with fabricated quotes, nonexistent case citations, and completely backwards case outcomes. You'd think that a judge would be more careful, but then again, if they're not tech-savvy you can see how they could be misled by the promise of AI.
In the 1983 movie War Games, a teenage computer whiz accidentally hacks into a U.S. military supercomputer (named WOPR for "War Operation Plan Response") while searching for video games, unknowingly triggering a potential nuclear crisis. As the system begins running a simulation it mistakes for a real attack, he must race against time to convince the AI to stop a global thermonuclear war.
So yeah, WOPR is what people today consider AI; artificial general intelligence (AGI) to be specific.
Released in 2008, the movie Iron Man features a billionaire inventor Tony Stark who was captured by terrorists, after which he builds a powerful armored suit to escape his captivity and later refines it to fight evil using a digital personal assistant named JARVIS (Just a Rather Very Intelligent System) to coordinate all his technology through voice commands.
JARVIS is also AGI.
OpenAI recently released ChatGPT 5 to mixed reviews. One such review was the blueberry test by Kieran Healy. He asked ChatGPT "How many times does the letter b appear in blueberry" to which ChatGPT responded "The word blueberry has the letter b three times". No matter how hard he tries to convince the AI that there are only 2 letter Bs in the word blueberry, ChatGPT is absolutely positive there are 3.
People expect and believe that AI has human-level or higher intelligence and is able to understand, learn, and apply knowledge in any domain, adapt to new problems, and reason abstractly. That would include knowing how to spell the word blueberry.
What we have with AI today is really a marketing issue. It is not a mechanical turk. It is a transformational technology and it's here to stay. It will improve over time, and it has the potential to make our lives better in many ways. But we need to understand what it is, and more importantly, what it is not.
Then what is AI?
Modern large language models (LLMs) like ChatGPT are trained on vast datasets covering a wide range of human-created content—from websites and books to transcripts, code, and other media. Instead of simply storing this data, the model uses neural networks to learn patterns in language, encoding knowledge as mathematical relationships. When generating responses, the LLM doesn’t look up answers in a database; it predicts the most likely sequence of words based on the context, drawing on statistical patterns it learned during training. LLMs operate through probabilistic prediction rather than direct retrieval, and they lack true understanding or reasoning in the human sense. Without ongoing training on the latest human-generated content, LLMs will become increasingly less useful.
So we're dealing with a simulated intelligence, not an artificial one. It's like the difference between precision and accuracy. You can be very precise, but completely wrong. So it does matter. There is no real intelligence at play here. Which is why the word blueberry has three Bs, the judge's opinion has non-existent citations, and glue was recommended by Google as the solution for making cheese stick better to pizza.
Once people really see that it's a simulation, albeit a very powerful and helpful one, responsible use of the technology will be far less of a problem.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
Tools like Google NotebookLM and custom generative AI services are fundamentally changing how users interact with information. We're seeing a transition from static reports and interfaces to dynamic chat-based tools that give users exactly what they need, and even things they didn't know they needed.
If you're not familiar with NotebookLM, it's a tool that allows you to provide your own documents (like PDF, text files, audio), and then chat with the data. You can even listen to an AI-generated podcast that explains all the information. For example, I had loaded a project with PDF documents containing the rule book, technical rules, and officials briefing information for USA Swimming, and was then able to get answers to questions like "how is a breaststroke turn judged?"
It was kinda magical.
We've been working with clients on permutations of this scenario for some time. For example, we partnered with a client in the life sciences space to build a chat-based tool that connects various third party API services with disparate information, providing account managers with a single source for helping their customers recommend products and services to ensure better health outcomes.
This is no small feat when the goal is a best-of-breed user experience (UX) like ChatGPT. It can involve multiple service providers like Microsoft Azure and Amazon Web Services, as well as various tools like cloud-based large language models (LLM), vector search, speech services, cloud storage, charting tools, location services, AI telemetry, and more. But when it's done right, the result is amazing. You can ask questions that span disciplines and contexts and see results you may not have ever seen before.
Most organizations can really benefit from exploring how generative AI can positively impact their offerings and give them a competitive advantage. Like we always say, it's not about the organizations that use AI, it's about the ones that don't.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
People building robots to build robots... what could go wrong?
If you're not already familiar with the term, vibe coding is a new way of coding that allows people to use AI to create software without needing to know how to program. In the best case it empowers people to be creative and build tools that help with work or play, as low- or no-code solutions have always done. In the worst case it gives the impression (or rather sets the expectation) that they can literally build anything, and that software developers are a thing of the past. As with most things in life the truth of the matter lies somewhere between these extremes.
Vibe coding can be a great way to learn programming (and just have fun). It could save you hours of research, though AI is notorious for confidently giving you the wrong answer.
In many ways vibe coding is a variation of a theme. For many years there have been services to help non-programmers create tools. Some of the more recent iterations are low- and no-code solutions using drag and drop and interactive prompts. An example of this is Zapier, which allows you to connect various services and platforms to create workflows, among other things. One way you could use it would be to create a workflow that syndicates a blog post to your social network accounts or emails subscribers. In these cases the technology, hosting platform, security, and protocols are abstracted away so users can focus on the what and not be concerned with the how.
Vibe coding differs in that it requires that you also have an understanding of the how. In the example of syndicating a blog post, you would need to have some understanding of how each connected service handles communication with third party services, how to configure access for each platform, how the app needs to be hosted, how to deploy the app, and how to ensure the app is secure. You also need to know how to set up, use, and maintain a development tool chain, though some services may generate/host projects or compile code for you.
AI is trained on code written by people in the past. The word "train" implies that it's learning how to code when in fact it's just indexing the data in a way that allows the AI to regurgitate answers derived from that information. As technology changes AI needs to ingest new code written by software developers in order to keep up.
So if your choice of using vibe coding is simply a way to learn programming (and just have fun) you should go for it!
Otherwise, below is a checklist of good reasons to use vibe coding to build something. Keep in mind that complexity and tolerance for adventure are always subjective.
If any of the previous points are an issue, here are some good reasons for using a low-/no-code hosted solution instead.
After a while you may realize that building something yourself wasn't the best choice.
This is merely scratching the surface. As a professional software developer I can tell you that the devil is in the details. One example is how important security is nowadays, and how challenging it can be to maintain a proper security posture even when you know how to code. Besides, with the right software development partner you'll end up with a better result, and stay within your timeline and budget.
A professional software development partner can handle all of the gaps and requirements you may have identified in the previous lists, including:
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
Ollama Farm is a CLI tool that intermediates REST API calls to multiple ollama API services. Simply make calls to the Ollama Farm REST API as if it were an ollama REST API and the rest is handled for you.
Install dotnet 8 or later from https://dotnet.microsoft.com/en-us/download and then install Ollama Farm with the following command:
You should relaunch Terminal/cmd/PowerShell so that the system path will be reloaded and the ollamafarm command can be found. If you've previously installed the dotnet runtime, this won't be necessary.
You can update to the latest version using the command below.
You can remove the tool from your system using the command below.
Ollama Farm is a system-level command line interface application (CLI). After installing you can access Ollama Farm at any time.
To get help on the available commands, just run ollamafarm in Terminal, cmd, or PowerShell. This will launch the application in help mode which displays the commands and options.
For example, you can launch Ollama Farm with one or more host addresses to include in the farm:
In this example, Ollama Farm will listen on port 4444 for requests to /api/generate. The requests are standard Ollama API REST requests: HTTP POST with a JSON payload. Requests will get sent to the first available host in the farm.
You can also change the default Ollama Farm listening port of 4444:
And if you run any ollama hosts on a port other than 11434, just specify the port in the host names using colon syntax:
Requests made to the Ollama Farm service will be routed to one of the available Ollama API hosts in the farm. Requests should be sent to this service (default port 4444) following the standard Ollama JSON request format (HTTP POST to /api/generate/). Streaming is supported.
Hosts are checked periodically and are taken offline when they are unavailable. They are also brought back online when they become available.
To optimize performance Ollama Farm restricts each host to processing one request at a time. When all hosts are busy REST calls return status code 429 (too many requests). This allows requesters to poll until a resource is available.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!
By default the ollama API runs on the localhost IP address of 127.0.0.1. If you want to host it on all of your Mac's IP addresses it requires that you set a system-wide environment variable. The problem with doing this is that Login Items (in System Settings) can launch before Launch Agents. This means that Ollama (in the menu bar) may not see the host settings. To solve this you need to launch Ollama at startup after a delay.
Here's how to add the host binding for all IP addresses on the Mac and then have Ollama launch 10 seconds after you sign in. This works in macOS 14.5 Sonoma and should work in later versions.
Step 1: Create a launch daemon plist file below. Save it as com.fynydd.ollama.plist.
Step 2: Copy the file to two locations:
This will set the host bindings at the system level, and also at the user level. So you should be covered no matter how you launch Ollama in the future.
Step 3: Set file permissions on the system-level file.
Step 4: Install the launch agents:
Now your system will start up and bind the Ollama host address to all IP addresses on the Mac.
Step 5: To launch Ollama after a 10 second delay, Open Script Editor and create the simple AppleScript file below.
In the File menu choose Export, and then export it as type “Application” and name it “LaunchOllamaDelay”. Save it to your user Applications folder.
In System Settings go to Login Items and add the LaunchOllamaDelay application to the startup items. Also remove any existing Ollama startup item.
Now when you restart and sign in, Ollama will launch after 10 seconds which should be enough time for the Launch Agent to have executed. And if Ollama updates itself in the future it should also just work when it restarts.
There's usually more to the story so if you have questions or comments about this post let us know!
Do you need a new software development partner for an upcoming project? We would love to work with you! From websites and mobile apps to cloud services and custom software, we can help!