AWS Insiders

An Inside Look at The Planet-Scale Architecture of DynamoDB with Alex DeBrie, Author of "The DynamoDB Book"(Part Two)

Episode Summary

On this episode, Alex dives deep into the intricacies of Amazon DynamoDB, a Planet-Scale NoSQL database service that supports key-value and document data structures. Alex discusses the consistency and predictability in the design of DynamoDB’s performance, and how to best utilize it. Alex is the author of "The DynamoDB Book," a comprehensive guide to data modeling with DynamoDB. He also has been recognized as an AWS Data Hero.

Episode Notes

On this episode, Alex dives deep into the intricacies of Amazon DynamoDB, a Planet-Scale NoSQL database service that supports key-value and document data structures. Alex is the author of "The DynamoDB Book," a comprehensive guide to data modeling with DynamoDB. He also has been recognized as an AWS Data Hero. Alex discusses the consistency and predictability in the design of DynamoDB’s performance, and how to best utilize it.

--------

“I think the most important thing to know about Dynamo, it has that primary key structure that we talked about, but every single item is going to have what's called a partition key, and what this partition key is doing is basically helping decide to which shard or node, a particular item is going to go to. So when a request comes into DynamoDB, the first thing that's going to hit is just this global request router that's deployed across the entire region. It handles all the tables within a particular region, and it gets that item, it pulls up the metadata for that table, it hashes the partition key value that was sent in for a particular item, and then based on that, it knows which node to go to in the fleet.” Alex DeBrie, Principal at DeBrie Advisory and Author of “The DynamoDB Book.”

--------

Time Stamps

* (01:50) DynamoDB scale cases

* (05:49) The architecture of DynamoDB that allows it to theoretically scale

* (09:37) The top three tips for DynamoDB customers

* (11:51) The value of doing the math for your application

* (13:14) What mistakes to avoid to keep your costs low

* (16:28) Examples of managing costs with DynamoDB

--------

Sponsor

This podcast is presented by Cloudfix

CloudFix automates easy, no risk AWS cost savings. So your team can focus on changing the world.

--------

Links

Connect with Alex DeBrie on LinkedIn

Connect with Rahul Subramaniam on LinkedIn

Episode Transcription

Speaker 1:

Hello and welcome to AWS Insiders. On this podcast, we'll uncover how today's tech leaders can stay ahead of the constantly evolving pace of innovation at AWS. You'll hear the secrets and strategies of Amazon's top product managers on how to reduce costs and improve performance. Today's episode features an interview with Alex DeBrie, principal at DeBrie Advisory and author of The DynamoDB Book. On this episode, Alex dives deep into the intricacies of Amazon DynamoDB, a PlanetScale, NoSQL database service that supports key value, and document data structures. Alex discusses the consistency and predictability in the design of DynamoDB's performance and how to best utilize it. But before we get into it, here's a brief word from our sponsor.

Speaker 2:

This podcast is brought to you by cloud fix, ready to save money on your AWS bill. Cloud fix finds and implements 100% safe. AWS recommended account fixes that can save you 10 to 20% on your AWS bill. Visit cloudfix.com for a free savings assessment.

Speaker 1:

And now here's your host AWS super fan and CTO of SW Capital Rahul Subramaniam.

Rahul Subramaniam:

Welcome back to AWS Insiders. On the previous episode, we started talking about DynamoDB with Alex DeBrie. We discussed its architecture and the unique characteristics that allow it to be a PlanetScale data store. Thanks for joining us again, Alex, and welcome back. I want to jump right in and take off from where we left the conversation on the last episode. I'd love for you to share with us one or two examples of use cases where the scale of the problem made DynamoDB the only real choice.

Alex DeBrie:

Yep, absolutely. So it's hard to pick one or two, but a few interesting ones would be, basically what we know without the factory trilogy, there's a billing and charging system they're doing for telecoms. So you think all these cell phones or even, I believe internet connected devices sending 3G, 4G, 5G data all over the place, and you just think of all the tiny little charging requests they have on a per second per minute basis where they're saying, "Hey, can I make this phone call" and maybe authorize them for saying, "Yeah, you can, you can have it." For the next five minutes, check in. If that phone call is still coming and then telling us how long that phone call actually lasted or how much data was actually sent. And just think of the scale of these, these telecom things, keeping track of all those tiny little charge requests and making that happen.

Alex DeBrie:

And, I think a lot of the old generation ones are built on relational databases and they can do that, but there's more latency in how much they can do. There's more limits on how many concurrent transactions they can handle or starts to get quite expensive. Whereas DynamoDB, we designed this one from scratch using DynamoDB, and you can roll it out. It can handle global telecom request and it can handle pretty significant amounts of traffic at a really low, predictable latency. We can optimize for, Hey, these are the really important paths that we need to handle to authorize these particular charge requests. Make sure they're good and allow it because, you don't want someone trying to make a call and it's got to take 30 seconds to figure out if they can, if they have enough credits on their bill or whatever like that.

Alex DeBrie:

So I think DynamoDB, if you really want a global scale solution there, I think Dynamo is really going to be the only option for you. Another one, I think that's pretty interesting. Can't get into super specifics, but I had a customer that was, they were in an industry that was heavily affected sort of positively, I'd say by COVID right. Where usage went up quite a bit, because of COVID and just people's different changing patterns. So they knew a big selling event was coming down the road and they also had very cyclical usage patterns. So during the day, usage was much higher than during at night. During the week, usage was much higher during the weekend. And then also periods of the year would be much higher. So they had some time where usage would be much, much lower and time would be much higher.

Alex DeBrie:

And they were in one of these low usage periods, but we're saying, "Hey, in a few months, it's really going to ramp up and we need to know what we can get out of that." And just having the predictability around DynamoDB of saying, "Hey, we know that it's going to be able to scale this. We've done the math. We've calculated this out. We're not going to hit partition throughput limits. We're not going to hit general scale limits. We can actually do this thing," is really great. And then also having that ability to scale up and down during the day, during the week, during periods of the year and save a lot of money on their bill that way, I think it was just really beneficial to them and something that would've been hard with a relational database.

Rahul Subramaniam:

I agree. Another example that stands out for me is that of applications that support custom fields. Now, relational data stores require you to declare a static structure for the schema right up front. And the need for custom feels throws a wrench in those works. The most common practice to work around that particular problem is to create a separate table that stores both your custom feels and the values, but the challenge there is that then for every query that you run, you're required to do a join with that custom table. Now you really start to see the performance degradation as the size of that table increases and the joints get more and more expensive DynamoDb on the other hand, seems like such a natural fit for such applications at scale. So staying on that subject of scale, I actually wanted to bring up the conversation around infinite scale. Every time I read the blog post about Amazon Prime Day statistics, I'm astounded by the kind of transaction throughput that they push through DynamoDB. And these numbers are orders of magnitude higher than what any relational data store could possibly do today.

Rahul Subramaniam:

And despite proof points like that, most people find it really hard to digest the AWS claim that DynamoDB has potentially infinite scale. What do you tell people about the architecture of DynamoDB that makes this theoretical infinite scale possible?

Alex DeBrie:

Yeah, I think that's one of the really cool things about DynamoDB. And I love those Prime Day posts that Amazon comes up with and says, "Hey, this is what we drove on Prime Day in the numbers like you're saying are truly mind boggling." But again, the principles beneath Dynamo are going to be pretty simple and you can sort of see how it's just going to scale literally as your data grows there. So, if I think the most important thing to know about Dynamo, it has that primary key structure that we talked about, but every single item is going to have, what's called a partition key. And what this partition key is doing, is basically helping decide to which shard or node a particular item is going to go to. So when a request comes into DynamoDB, the first thing that's going to hit is just this global request router that's deployed across the entire region.

Alex DeBrie:

It handles all the tables within a particular region and it gets that item. It pulls up the metadata for that table. It hashes the partition key value that was sent in for a particular item. And then based on that, it knows which node to go to in the fleet. So each node in the fleet is going to hold about 10 gigabytes of data. And if you have a hundred gig table, you're going to have 10 different primaries behind the scenes serving that data. And in that request router, it's going to hash that partition key, say, "Oh, you're on, this item is on primary four." It's going to go there and go directly to that. So that's an 01 constant time look up to figure out, which record in the hash map, this basically belongs to. But now as you go from a hundred gigs to a terabyte, now you have a hundred different partitions.

Alex DeBrie:

It's still going to be that constant time look up at the front end to say, Hey, which partition does that belong to? Whether we have 10 partitions, whether we have a hundred partitions or a thousand partitions, it's going to be the same amount of time to get to the specific partition, which again is only holding 10 gigs of data. And then within that partition locating your particular item or locating a range of items is going to be very efficient for you. And that's how they can continue to scale those out. I want to just like push on that a little further. Like one thing that was interesting to me talking to a dyno person that reinvent this year is just how important the size of that partition is. So that size, it's only 10 gigs that they're holding on these different machines. They've got a giant fleet of machines, 10 gig partitions plus secondaries and all that.

Alex DeBrie:

But that size makes it very easy for them to recover from instance failures. If an instance fails, they can promote to a secondary, but it's also very quick to just replicate that 10 gigs somewhere else. If you have a one gig network link there, it takes 10 seconds to replicate that somewhere else. Whereas, I worked for a company, we had these giant MongoDB deployments, and each shard was going to be 150, 200 gigs sometimes. Just because of the maintenance for us, we didn't want to have a thousand shards. So it's easier to have 10 shards of 200 gig rather than a thousand shards of whatever. But it also means it's a much bigger deal, if one goes down, it's harder to sort of bootstrap a new one to get a new secondary back online, whatever that is. So I think that partitioning scheme that DynamoDB has, but then also being a fully managed service where they can put all the customers across a particular region and have these tiny little partitions share the hardware. I think that really helps what they can do there.

Rahul Subramaniam:

Yeah. It's amazing how that simplicity of structure has actually allowed DynamoDB to scale like it does. I think, if they had made it any more complex than it is today, I don't think they would have been able to build a service as scalable and flexible as DynamoDB. I think that simplicity is key over there. So that brings me to the next section where I'd love to know what your top three tips to DynamoDB customers would be?

Alex DeBrie:

Top three tips, man, that's a tough one. There are so many, and it really depends on sort of where you are in the learning process. Are you brand new? Then I have different tips on if you're maybe intermediate versus, if you're expert things like that. I would say I sort of break it into again, like three different learning levels. So, the biggest one, if you are brand new to DynamoDB, and this is their first time doing Dynamo or NoSQL data modeling, my tip would be, don't just treat it like a relational database. I think that's going to get you into trouble, but understand, how you think about DynamoDB data modeling the principles of single table design access first designed, whatever you want to call it. So I think that errors, you see there would be normalizing your data too much.

Alex DeBrie:

Using that relational model, trying to do joints in your application code rather than prejoining your data to handle your access patterns or things like that. So that's the first one. If you're brand new, that's the biggest thing I would say is understand that it's different, understand how single table design works, because that at least teaches you, "Hey, this is different. And something else is going on here." If you're a little further down that path or in the intermediate level, I would say, be careful about going too far on the other end of the spectrum. Sometimes again, like you're saying hypernormalization or denormalizing everything that can be a problem. I also see people that are so sort of in that medium range thing, maybe this is their first data model they're picking up on single table design. They put all sort of items for a customer in the same partition, all the same partition key, even if you're not accessing them together.

Alex DeBrie:

So going back to our example before customers, addresses, orders, order items, giving them all the same partition keys, they're all grouped together, even though you're never fetching a customer and the order items in the same request. So those should be in different partition keys. So make sure you're not overloading that too much. And then, if you've reached, hopefully the highest level of enlightenment, you're starting to understand DynamoDB. I would just say, make sure you're again, really thinking about the specifics of your application, do the math. That's the biggest thing I tell people, "Hey do the math and figure out what is going to be optimal for your exact application, because it's hard to give very generic advice. I can tell you the three or four or five factors you need to consider, but you need to do the math. Think about it. What makes sense for your application."

Rahul Subramaniam:

I agree. I think doing the math is great advice. Talking about math, the pricing of AWS services can get pretty complex at times since the pricing happens on multiple different dimensions, what it is your advice to the audience about mistakes that they should avoid or best practices that they should follow to ensure that they don't have an out of control bill at the end of the month.

Alex DeBrie:

I think one thing you might get into trouble is if you have too many secondary indexes global secondary indexes, we haven't talked about that too much here, but basically in Dynamo, you might have, we talked about the importance of the primary key, but you might have an item that you need to access in different ways. And what you can do is set up what are called secondary indexes. And those will re-index the data, it's sort of like a read only copy of your table that you can do these additional access patterns. And whenever you write an item to your table, it's going to get replicated out to those secondary indexes. So super useful, but watch out for having too many of those GSIs. If you have an item that you're indexing 6, 7, 8 different ways, every time you do it, right, every time you update that item, you're going to have to pay for it eight different times, which is going to cost you a lot there, especially if it's a bigger item.

Alex DeBrie:

Because they're going to charge you based on the size of that. So I think that can be a sneaky one that sneaks up on people where they have this just right amplification cost that they didn't expect and can be costly there. So look into that one. Again, I think going back, understanding the core principles and again doing the math, right. But I hate to keep reusing that phrase, but that's the big one. I think understanding the principles is like, "Hey, Dynamo is very much focused on the primary key." That's how you access your data and you want to be fetching sort of the exact data. Dynamo's charging you for the data you're accessing. But the great thing about that is you can retrieve the exact items you want in almost all cases. So then it's pretty efficient that way and sort of it matches your costs.

Alex DeBrie:

So, make sure you're modeling in a way that works. I would say avoid the filter expressions generally. I'd say that's a more advanced topic, but a lot of people think they can use filter expressions to have a more efficient data access and that's actually not going to save you in the long run, because they're going to charge you on the data that you read, even if some of it is filtered out by your filter expression. So try and make it. So the actual data you read, the primary keys that you're targeting are the actual items that you want to get there. That'll help keep your costs in check. I would say add a more fine grain level, not going to save to you a ton, but could save you 30, 40% on your bill is a lot of people start with on demand capacity with DynamoDB.

Alex DeBrie:

That's really great. You can do pay per use pricing. You don't have to pre provision capacity things like that. But at some point, if your bill gets large enough and you have predictable traffic, it's going to pay to switch to provision capacity where you're saying how much capacity you have in advance rather than paying per use. And that's going to be a lot cheaper on a fully utilized basis. It's hard to get full utilization. So that's why I say like, make sure you know your sort of traffic first and especially if it's pretty predictable and then switch over to that. But don't prematurely optimize there. I think it's fine to start with pay per use, especially if your bill is like you're saying in the free tier level or less than a hundred bucks a month on that DynamoDB bill, don't spend a bunch of your engineers time working on that. Wait until your bill gets a little bigger.

Rahul Subramaniam:

There are a couple of other examples that I wanted to share with the audience that we encountered. I think I actually find the concept of the TTL that you have in DynamoDB is a very useful mechanism of managing costs, because a lot of times, especially for DynamoDB use cases, you find that what you really want is the data for the last day, maybe a week, but stuff that's older than that. You're never going to really access it. So you might as well delete it out of your table and not just keep all that storage around. That is one. And then of course, if you do need to keep that data around, you could decide to either dump it in S3, which is what we typically did. But the other alternative now with the new infrequent access tier that you have for storage, that seems like a really good adoption. Are you seeing adoption of that being pretty common?

Alex DeBrie:

Yeah, sure. So I've definitely seen a few people that are interested in that. And just for background folks that are listening, at reinvent this year, they came out with a new storage tier for Dynamo, which is pretty interesting and unique where, in Dynamo, you're basically charged on two axis. You're charged based on the read and write units. You're actually doing the transactions against your database, and you're also charged for the storage of your items. So however bigger your items are, they'll just figure out how many gigs that is and charge you a quarter per month on something like that. And like Rahul, you're saying here, like a lot of people end up with years and years of data of which only the most recent stuff really looked at, but maybe they need to keep it around just for, just in case the customer asks for it or something like that.

Alex DeBrie:

But you see people where the storage costs actually exceeds the transaction costs or maybe at least becomes a bigger chunk of that. So they have this new storage tier, this infrequent access storage tier, where they reduce the storage costs on that particular table. And then they proportionally sort of increase the transactions cost. So for some people, if you have a lot of historical data that you need to keep around just in case, this could be a lot cheaper for you to where it reduces that storage cost, it increases your transaction costs a little bit, but not enough to sort of offset that benefit storage as well. So, it's fairly niche. I wouldn't say it's super niche, but I'd say you probably have to have a pretty big table. You probably have to have a few years of data to be doing this, but this can really help for those situations.

Alex DeBrie:

And then, like you're saying, TTL is a great thing. If you actually don't need to show that data to anyone ever, like it's not relevant at all, after a week or a month or whatever, you can throw a TTL on that item and just make it 30 days after it was created. And Dynamo will just periodically go through there and expire out those items for you. So you're no longer paying for them and it just moves the burden from you having to scan over your table and delete old stuff. It moves it onto Dynamo. They'll look through it. They'll maintain that for you. You just pay the right cost to delete it. And that's it.

Rahul Subramaniam:

Yep. With that, I'd like to thank you Alex for coming on the show, loved all the insights that you have. I love working with you on a lot of these very, very interesting use cases and problems and learned a lot from you through that process. So thank you once again, and thanks for sharing all of your knowledge around this and for writing the book at the first place. So once again, thanks.

Alex DeBrie:

Absolutely Rahul. Thanks for having me on. It's always great to see you and I love hearing your experience. You've seen a lot of cool stuff and then just, how to balance that against these, the stuff you've seen in relational databases with Dynamo, I think this is really great. So thanks for having me on, it's been great.

Rahul Subramaniam:

Great. Thanks everyone. That's a wrap.

Speaker 2:

We hope you enjoyed this episode of AWS Insiders. If so, please take a moment to rate and review the show. For more information on how to implement a hundred percent safe AWS recommended account fixes that can save you 10 to 20% off your AWS bill, visit cloudfix.com. Join us again next time for more secrets and strategy from top Amazon insiders and experts. Thank you for listening.