Cost Effective Azure Cosmos DB
- Published on
- Reading time
- Authors
- Name
- Simon Waight
- Mastodon
- @simonwaight
Before I dive into how to use Cosmos DB in a cost effective manner let me take a short detour!
Many people new to cloud technology have an expectation around both the elasticity of the services provided as well as the pricing points they are offered at.
As an industry we have highlighted cost savings resulting from migrating workloads to the cloud as a selling point of cloud technology, but over time the actual way you save costs has become a little less clear.
When we talk about cost savings by moving to the cloud what we are typically talking about is:
- on-demand services that are pay-per-use. Many people have forgotten the "pay-per-use" component of this and table flip when they end up paying more than they expected for resources in the cloud. My observations are that this is typically the result of continuing to use the same processes to manage "pay-per-use" cloud environments as were used to manage "fixed-cost" data centres.
- long term cost saving in capital expenditure (capex). No more three-to-five year data centre refresh cycles and no need to purchase fixed-term software licenses. These sometimes don't have an immediate positive bump unless you are avoiding a refresh/renewal when migrating.
- increased solution delivery velocity. Now that you have a range of on-demand services the cost of innovation and solution delivery is reduced. No more need to spend time re-inventing the wheel to stand up high availability storage, networking, compute, databases, etc. They are already there for you to use.
So how do all of the above relate to cost effective use of Cosmos DB?
I see many people start working with Cosmos DB without understanding the value they are getting beyond Cosmos DB being part of the cool new "NoSQL" revolution that everyone wants to be a part of.
The other behaviour I see is people make no attempt at designing their data structures and start working with Cosmos DB as an unstructured data store. There is nothing wrong with this approach per-se, but you are trading off the flexibility of no up-front design with the potential future cost of operation. Even a small amount of design is likely to save you in the longer term.
If you have never worked with Cosmos DB before or you are more of a hands-on person then there are a couple of options for you to build familiarity - you can try Cosmos DB for free for a month or do Cosmos DB modules on Microsoft Learn. Microsoft Learn is great because you don't need an Azure Subscription!
In the remainder of this post I'm going to look at how you can do more with Cosmos DB without needing to have an elastic wallet!
Starting point: understand Cosmos DB features and cost model
There are two important documents to read:
- Overview of Cosmos DB's features. At a high level: a schema-less, geo-replicated data store that has an availability SLA of 99.999% (multi-region), has performance SLAs and which supports multiple client APIs. :cool:
- Understanding Request Units (RUs). RUs are key to understanding and controlling performance and cost in Cosmos DB. There is also some documentation on cost-effective reads and writes with Cosmos DB.
You should really read that documentation as I think it's important you know what you get when you create and configure a Database or Container in Cosmos DB. If you read the documentation and consider some of it may never apply to your scenario, then don't be afraid to select one of the other data services in Azure!
Using No SQL data stores is a bit like using micro services... just because it's the "in" thing doesn't mean you need to use it!
If your application is only occasionally used or doesn't have demanding throughput requirements and can use the Cosmos DB Table API then you can look to use Azure Table Storage. Massive services like "Have I been pwned?" make good use of Table Storage so it's a viable option.
Over time your application's demands may change and Table Storage may no longer suit your scenario so you can leverage the Cosmos DB Table API and migrate your data if required. This is where some planning upfront comes in handy as you can identify the inflection point where switching makes sense.
Local development
There is a Cosmos DB emulator that supports local development with the core Cosmos DB APIs. If you're using one of the APIs that is unsupported by the emulator then you can spin up a local development version of your preferred native storage provider.
Table, Graph and Cassandra all have downloads you can run locally to help with development.
To the cloud!
Once you're ready to deploy into Azure you have a few options to drive cost efficiencies.
If you chose to build on top of Azure Table Storage then that's your initial starting point, potentially with a later migration to Cosmos DB as per the paragraph above. If you chose one of the other client-compatible APIs (Mongo, Graph or Cassandra) you can always run MongoDB, Gremlin or Cassandra on Virtual Machines if you really want to, but as a developer you should strongly consider the value of doing that - do you want to feed and maintain those servers and make sure they scale on demand and that the data is backed up effectively? Me neither. I'm pretty sure you'd surpass Cosmos DB's costs pretty quickly at scale with VMs as well!
Which brings us to using Cosmos DB proper.
The Cosmos DB team has worked hard give you options to save cost:
- Use Provisioned Throughput at the Database-level to control costs, but with the trade-off in guaranteed performance. Until recently Container-level (a Collection if you are using the SQL API for Cosmos DB) was the only choice you had. At the Microsoft Connect(); 2018 event Microsoft announced that you can now set the Database-level. I have been using this and it has bought the cost of one of my Cosmos DB deployments down under $30 US per month which is perfect for my scenario.
- The other option you have is to use Reserved Capacity which can, depending on your scenario, drop your Cosmos DB costs by up to 65%. I would strongly recommend you don't start here - this is a good place to get to after you have been running Cosmos DB for a while. This means you have validated it as part of your architecture and you will also have metrics which will allow you determine the appropriate reservation to make.
So there we are.. hopefully you found this insightful in uncovering ways you can work with Cosmos DB (and related services) without you needing to spend the equivalent of a small country's GDP. If you have any questions post them in the comments or ping me on Twitter.
Happy days 😎