If you are new to Amazon AWS, and looking at their offerings, it can be bit confusing, as they have lot of services.
If you are just looking to launch a virtual server on the cloud, it is relatively straight forward, and you can use Amazon’s EC2 service.
But, when it comes to storage and database for your virtual instance on the cloud, Amazon has multiple choices.
In this tutorial, we have listed the most popular storage and database services available from Amazon.
1. Amazon S3
- Amazon S3 stands for SSS, which is Simple Storage Service.
- This is an object storage. You can store any kind of files in S3.
- The individual file size can be from 0 bytes to 5 TB.
- For simple S3 file management, you can use Amazon S3 web interface.
- For enterprise applications, you can use the REST APIs provided by Amazon from your application code to manage the files that are stored in S3.
- In S3, amazon has the concept of buckets, and you can put multiple objects in a bucket.
- For security, you can assign permissions at both bucket level and object level. You can also assign permission at user level.
- One nice feature that S3 provides is that if you have a static website, you can just host it on Amazon S3. In that case, not only it stores your html file in S3 buckets, it will also indirectly act as webserver and serve your html content.
- You can also enable versioning for objects that are stored in S3 buckets.
2. Amazon Glacier
- Amazon Glacier is for archival purpose. Use this only on situations where you don’t want to retrieve the data frequently. For example, you can store backups in Glacier.
- The storage cost of Glacier is way too less when compared to S3. But, you won’t be able to get to your data quickly, as the data retrieval process will take hours in glacier.
- Glaicer is tightly integrated with S3 buckets, which is great when you want to move old data from S3 to Glacier for cost saving.
- In S3, you can setup lifecycle management, and automatically move files that are older than X number of days from S3 to Glacier.
- Similar to buckets in S3, in Glacier, you’ll create vaults to store the data. You can assign permissions on Vaults to restrict access.
- For your enterprise data, you can use the REST API from your application to archive your data directly to Glacier. Glacier also provides API interface for Java and .NET SDK.
- Keep in mind that while storage cost is very less in Glacier, there is a separate cost associated for data retrival.
3. Amazon EBS
- Amazon EBS stands for Elastic Block Store. This is a block level storage that can be attached to the EC2 instance that you’ll spin-up in AWS.
- One of the great advantages of EBS is that you can move it around from one EC2 instance to another EC2 instance without losing the data that is stored on the EBS.
- There are three types of EBS storage:
- Magnetic volumes with maximum 40 MiB/s throughput; use this for low IO requirement applications
- General Purpose SSD with 160 MiB/s; use this for most database applications that requires good IO performance
- Provisioned IOPS SSD with 320 MiB/s; use this business critical application that requires heavy IO operations.
- You can take backup (snapshot) of your EBS volume and store it directly on S3.
- Amazon also provides the option of creating encyrpted EBS option, which is helpful when you want to encrypt your data at rest.
- These EBS volumes can be exposed to your operating system which can be mounted appropriately. For example, on Linux EC2 instnace, it can be /dev/sdb (or /dev/xvdb), and on Windows it can be C: or D: drive.
- You can also setup RAID on EC2 instance using EBS volumes.
4. EC2 Instance Store
- Amazon EC2 Instance Store will use the disk that is directly attached to the host where the current EC2 instance is running.
- But, be very careful when using an instance store, as this is a temporary storage.
- Any data stored in the instance store will be lost when you restart the instance, or when an instance crashes for whatever reason.
- You cannot detach an instance store and move it to another instance.
- The size of the instance store volume that you can create depends on the instance type. For example, in m1.small, you can create a instance store volume of 160GB.
- Some of the instance type (for example: C3, G2, HI1, I2, M3, and R) supports SSD instance storage.
- Again, instance store is different than EBS. Use instance store only to store some temporary data that you can afford to lose.
5. AWS Storage Gateway
- For most enterprise application, you’ll probably already have some kind of storage solution on your site.
- In that case, using AWS storage gateway, you can connect your on-site storage infrastructure with AWS storage services using the gateway.
- For this, you should install the AWS storage gateway software application, which comes as a VM in your datacenter.
- Once this is connected to the AWS, from the AWS console, you can create three types of storage gateway volumes and mount it on the server in your datacenter:
- Gateway-cached volumes: This will use S3 to store your primary data, while keeping a copy of the frequently used data locally in your datacenter
- Gateway-Stored volumes: This will store the primary data locally in your datacenter, and in parallel it will backup the data to AWS S3 in the form of EBS snapshot.
- Gateway-Virtual Tape Library: This will replace your local physical tape library with virtual tape library using Amazon S3 storage of Glacier.
6. Amazon RDS
- Amazon RDS stands for Relational Database Service. In RDS you’ll create a DB instance with a specific database, and select the type of compute and storage options based on your requirement.
- The DB instance can be any of the most popular databases: MySQL, MariaDB, Oracle, SQL Server, PostgreSQL, or Aurora.
- What RDS does is that it will automatically install the database, configure it, and perform the routine DB maintenance tasks like backup and patch applying.
- You can manage your DB instance from the AWS management console.
- The advantange of using RDS is that you don’t need to be a DBA to successfully run your enterprise application on a database.
- DynamoDB is Amazon’s version of NoSQL database (Similar to MongoDB).
- DynamoDB also provides an option for users to download and install local copy on your server during your application and testing phase. When you are ready for deployment, you can move it to the Amazon DynamoDB environment.
- AWS SDKs allow developers to access DynamoDB and manipulate the data from various programming languages including Java, .NET and PHP.
- From the AWS management console, you can create DynamoDB tables, load data, create queries, and perform all typical NoSQL operations from the GUI directly.
8. Amazon SQS
- Amazon SQS stands for Simple Queue Service.
- This is a fully managed message queue service from Amazon.
- Using SQS you can move your data or messages between different applications without having the applications to be always up and running.
- SQS can be used to send messages between multiple AWS services including S3, EC2, DynamoDB. You can also use Java Message Services with SQS.
- Using SQS, you can configure Dead Letter Queues, first-in-first-out (FIFO) access for your messages, etc.
- The maximum visibility timeout for a message in the SQS queue is 12 hours.
- ElastiCache is Amazon’s in-memory caching system on the cloud. Currently this supports both Memcached and Redis.
- Using this you’ll improve the application performance by caching I/O and CPU intensive queries in the memory for faster results.
- When you are using ElastiCache, it is completely integrated with all other AWS services such as Amazon RDS, EC2, etc. Just like other AWS services, you can manage ElasticCache from both management console UI or using API.
- You can also run ElastiCache cluster in your Amazon VPC (Virtual Private Cloud).
10. Amazon RedShift
- Amazon RedShift is a fully managed data warehouse solution for your enterprise business intelligence application.
- Redshift provides access to your structured data from your own existing SQL-based clients by using either JDBC or ODBC.
- When a huge query is executed on Redshift, it is distributed among multiple nodes for parallel operations.
- Depending on your needs, you can control how many number of nodes you want in your RedShift. The number of nodes can be dynamically controlled from an API call depending on a particular query that you are planning to execute.
- There are three advantages of RedShift:
- Column Data Storage; instead of storing your data in rows, it stores it by columns. Column-based systems are faster for data warehouse solutions
- Advanced compression; Similar data are stored sequentially in disk by using automatic advanced compression technique for faster data retrieval
- Massive Parallel Processing; Data and queries are distributed across multiple nodes for faster processing. The number of nodes can be easily controlled.