Sunday, January 13, 2008

Going Bankrupt With Amazon S3

You had everything planned out: your new video hosting service rocks and people love you from day one. You chose S3 for its cheap traffic and figured ads would cover all expenses. Your traffic grows rapidly and every day more sites link in, bringing swarms of new visitors. Life is just great. Your TV set has just been repossessed by the bank. Hold on, what the hell just happened?

Let's rewind and replay slowly.

Since its introduction, S3 has been used widely by new players for file storage and hosting due its cheap costs and no upfront payments. But not taking AWS Pirates into consideration, S3 can end up very expensive and hazardous to your health.

5 Rules of Thumb: ignore at your peril.

1. Never give anonymous access to your files on S3
There is never a reason to, is there? This directly translates to letting people you don't know consume bandwidth you pay for, without being able to defend yourself. It's not worth it. (Scary thought: what if a competitor wants to drive you broke?)

2. Enable access-logs on your S3 bucket
Track down leechers as soon as possible. Amazon's access log are similar to apache's and are on best-effort only. Keep a record of bandwidth used by each file you are hosting. Block those that go over-quota.

3. When possible, pass Expires to S3
Limit access to S3 storage by signing url and appending an Expires parameter. This will require users to request files through your servers, and not directly from S3; and will give you more control of who gets what and when.

4. Serve files from your own servers
Most hosting packages are equipped with a large bandwidth quota, which can also be expanded later if required. GoDaddy offers additional 500gb traffic for $20 (traffic is calculated as rx and tx combined), that's $0.04/gb, instead of $0.18/gb out.

5. Use reverse-proxy against S3
Harness the power of S3 as a secondary storage platform. Configure a reverse-proxy to download locally unavailable files from S3, and serve locally. Squid has a killer solution with lru caching.

Amazon S3 Bankrupcy


9 comments:

willCode4Beer said...

Rule 1: Never sever video or audio unless it's from an unlimited bandwidth plan.

Gil Megidish said...

@willcode4beer:

I have received an email yesterday from one of those Internet-Superman-Marketers-Gurus with a link to an flv that is stored on S3. He claims that his mailing list is tens of thousands entries long, and the video was around 250MB, that's potentially 2.5TB; and that's one email :). Just proves that he really made a lot of money ;)

(oh, and no cookie / Expires)

Mondain said...

Thanks for the tips, I am glad that I read this before I put my application online.

willCode4Beer said...

I think Amazon's service is pretty cool but, whenever you serve large files, it is very easy for bandwidth costs to get out of hand fast.

The company I use for hosting JVDS, has un-metered plans starting at 35bucks. There are also companies that specialize in hosting video and maintain streaming servers.

My point was, that for a modest fee, you should be able to find a service that gives you unlimited bandwidth. Ideally you want to be able to predict your expenses.

Larry said...
This comment has been removed by the author.
alphafoobar said...

I've been interested in using Amazon web services in projects, but I'm not sure how sensible it is running an application on Amazon. Is the data persistent, private and ultimately yours?

What project are you running using Amazon services?

Gil Megidish said...

@alphafoobar:

There's a persistentfs module (userland fs that uses fuse) that can make a mounting point persistent. All it does is make sure all writes are written to S3. No random-access writes, must always flush the entire file.

I wouldn't recommend using EC2 for anything other than slaves. Remember you don't have a phone number or email support (read the forums on aws website, people beg for amazon to restart their service or tell them why it's down.)

I only use EC2 for slaves. Maybe I'll write a post about it. I turned my website into ec2 slaves, I can fire up 1 or more if I want, and they all play nicely. I haven't restricted myself to EC2/S3 at all, I have cool message queues (another post) that keep me and aws unmarried. Heck, men are afraid of commitment, right? :)

Cheers.

Allen said...

@gilmegidish:

PersistentFS is a complete POSIX-compliant file system that uses Amazon S3 as a backing store. It is fast and efficient, and supports random access reads and writes, renames, and all other POSIX operations. A PersistentFS file system can be mounted on any Linux computer that can access S3 via the internet.

Shalom Carmel said...

TANSTAAFL. When bandwidth is 'unlimited' you will discover that you are aggressively throttled.

Seek out bandwidth abuse, but if your bandwidth is high for valid reasons take it like a man - get your check book.