Quelques digressions sous GPL...

Aller au contenu | Aller au menu | Aller à la recherche

vendredi, janvier 13 2012

Amazon AWS: Return of Experience

After spending a good part of 2011 working with Amazon AWS, EC2 and so on (and ending up not migrating to it for reason not related to the quality of their service, which is good), I've been asked to provide a return of experience on it.

I haven't been very talkative on this blog in the past few months, so I figured I could share it to the world.

Disclaimer: It's biased, and only my own personal opinion. Take it as it is, and feel free to disagree.


This recommended setup gives a good picture of what AWS is all about, but it doesn't tell you two things:

  1. Your reliability is going to go down, big time. You need to design for failure not because it's the right thing to do, but because it's going to happen. An EC2 instance can go down at any moment, often without warning. It's not different from a regular server, but the frequency is higher.
  2. The performance of a single EC2 instance is a lot lower than its equivalent in regular servers. That's not an issue for web nodes, because you can just scale them horizontally, but if you have a monolithic database it will drive you crazy.

That said, AWS gives you all the tools needed to work around those issues. I like the setup showed in this diagram, but I would make a few changes to it:

  • Do the DNS yourself, with a 3rd party like DNS Made Easy. DNS is easy and important enough so that you don't want to give it to amazon.
  • AWS load balancer is nowhere close to a good old haproxy. I would run a frontend haproxy+varnish (reverse proxy cache) in front of a farm of apache+php nodes, with your memcache instances next to them, and the DB in the back.

Speaking of the DB: in my experience, this is the most difficult part to migrate to AWS. EC2 instances give you 2 types of storage: ephemeral (local disks attached to the instance at startup and flushed at shutdown) and EBS (persistent, network attached storage). Ephemeral disks have roughly the performances of a regular desktop sata drive, and EBS volumes fluctuate from the floppy disk to the sata drive. If you have one big monolithic DB and you need more than 10,000 iops to maintain your response time below 5 seconds, you'll suffer from it. I would advice to :

  1. Reduce to a minimum the usage of databases, relational or key-value. If something looks like a file and can be stored as a file, then it's a file. Put it in S3 and access it from the S3 API (or if it's a public file, serve it via the cloudfront CDN). And you do also want to look into s3fs tools. s3fs allows you to mount a S3 bucket as a regular linux mount, very handy to rsync a local storage with S3 without having to code an API client.
  2. If you must have a relational database, try to use either SimpleDB or RDS. I've spent way too much time trying to find the good combination of EBS volume, aggregating them in RAID1 or RAID10. Same with ephemeral volumes: they are not persistent, they are lent to your by the hardware your instance is running on at a given moment. I never really managed to move our huge oracle DB to EC2. It worked, with EBS volumes, but the impact on performances was so high that we pretty much gave up. EBS volumes would have to improve by 500%, at least, before you can consider that option. However, I suspect that Amazon does not virtualize their RDS environments. They give you a database that has been designed to run on their infrastructure and it's always going to be faster than doing it on your own. Plus, they take care of the replication, which is nice.

Regarding EBS volumes: I'm guessing that at some point you will want to benchmark those yourself, so maybe my notes will help. http://wiki.linuxwall.info/doku.php/en:ressources:articles:benchmark_ebs

A word about the network: when AWS was first designed, they had this ridiculous idea that instances should get a random hostname and a random IP at startup, and the customers shouldn't have any control over that. Well, in the real world, it sucks ! And there has been enough complaints about it so that Amazon released the VPC. It's a private network inside their EC2 environment. It's basically free (as in: no additional charge) and you can control the subnets and the IPs of your instances. You still have access to the same feature a regular EC2 setup provides, to the exception of some type of instances (the cluster ones) that are not available there. It takes a little bit of time to figure out the configuration of the routing and the internet gateway, because not all instances can access the internet by default. The VPC behaves more like a regular datacenter's network, where you have a nentry point that receives the traffic and routes it to your nodes. You can then decide to protect your backend nodes from internet traffic (something you cannot really do with the regular AWS), and even connect your office network to the VPC using a regular pptp endpoint. I haven't looked recently, but I think each availability zone has its own VPC these days. So you can (should) have a mirror of your active VPC environment in a passive availability zone, ready for failover.

Regarding the firewalling and security groups: this is one really nice feature of their infrastructure. You build your firewall policy into security groups. For example, you will have 4 security groups:

  • frontend
  • web-nodes
  • memcache-nodes
  • storage-nodes

And your policy will look like:

  • frontend SG: accept from 0.0.0.0 to 80,443
  • web-nodes SG: accept from frontend to 80
  • memcache-nodes: accept from web-nodes to 1234
  • storage-nodes: accept from web-nodes and memcache-nodes to 6543

And you can dynamically add instances into each group without having to specify their individual IPs. The firewall policy is completely abstracted from the physical implementation. It's very nice and flexible to use. I don't have any particular comment to make on that, except to keep it simple and clean (but that's true for every firewall).

About the AMIs: I chose to roll out my own, based on a basic centos image. Building AMIs is easy enough if you start from an existing AMI (preferably EBS backed) and customize it. The Amazon 64 bits AMI is a clone of RHEL and is good enough. You can probably reuse some of the init scripts available out there to populate your AMI at startup, but quickly enough you will want to write your own. It works as follow: when you initialize an instance from an AMI, you have the possibility to pass user-data to the instance. In the user-data field, you can put a configuration file and have a script inside the AMI download it and parse it. In the user-data, you put the backend server to connect to, what services to start, where to get more configuration from, etc etc.... There is no predefined format for that, you can design your own as long as your script can parse it. The top of the top is to have an init script download the user data, do some basic initialization (identify itself, download access keys, etc...) and then connect to a master server and complete the configuration of the instance using puppet or chef.

Each instance can access its own user-data at http://169.254.169.254/latest/user-data (granted that you passed user-data at the initialization of the instance). User-data are private to the instance, so you can put credentials in it, but one drawback is that anybody with an access to the instance can download those data. So if a hacker breaks into your website and launches a curl command from within the web interface, he will download your user-data. Don't put your root password in there !

Using custom AMI with user-data and initialization scripts is the most powerful feature of EC2. Ones you can initialize instances on-the-fly, have them auto configure themselves and join the pool, the rest is easy. You plug whatever monitoring tool you use to your init script, and when the overall load of your pool is too high, fire up a new instance that will join the pool automatically. It's not easy to achieve (and you will always have a few minutes of delay between the launch command and the availability of the instance) but it's definitely how EC2 should be used.

It also allows you to design for failure. Your instances will crash on a regular basis (Netflix called that the Chaos Monkey), so you shouldn't store anything of value in them. Instead, have them download the last revision of the php code as part of the initialization process, and you can restart crashed node almost without feeling anything. Whether you use the AWS load balancer or Haproxy, you can add and remove nodes without service interruption. Best solution: store the user sessions contexts on a backend cluster (memcache, redis, whatever), so you can load-balance incoming request to any web node without losing the session when you switch from a node to another.

There are quite a few companies out there that provide AWS assistance. In my opinion, you will be better off architecting it yourself. Your needs are probably too specific to fall into generic canvas, and AWS is already constraining your enough so that you don't want to add another level of abstraction to your infrastructure.

That's a lot of info, I hope it clarifies your vision of AWS a bit. I should finish by saying that AWS is a lot of fun ones you gain enough control over it. Let me know how it goes, and if I can help in any way.

Cheers,

Julien

vendredi, juillet 22 2011

Using ssh-agent with a relay to create a user account on multiple servers

Welcome AWS and its many EC2 instances. The idea of maintaining a large number of instances is nice, except when you realize that you forgot to create a user account and your environment is already running.

In that case, you have two choice: rebuild the AMI and terminate/re-initiate all your instances, or use SSH capabilities to perform the task on multiple systems.

I chose the later, essentially because, a few months back, I was interviewing with Google and one of their engineers asked me something similar (how would you edit a file on thousands of machine at once). At the time, I didn't fully know the answer. Now that I do, if anybody from Google is reading me, please call back, you have my number ;)

Back to our initial problem, the solution is a mix of SSH commands, sudo, and some bash fun.

SSH Relay

The diagram above shows the infrastructure. The idea is to create the accounts without leaving the laptop. So the first barrier we have to deal with is how to connect to a server through a relay without interacting with the relay at all.

ssh-agent magic

ssh-agent is a mandatory tool for any sysadmin. It allows you to cache your SSH keys in your session, thus avoiding the need to unlock the key every time your connect to a destination. My ssh-agent is started with my session (more explanation here). So every time I log in, I just need to add my private key into ssh-agent. The command is simple:

julien@laptop:~$ ssh-add ~/.ssh/id_dsa_jve
Enter passphrase for /home/julien/.ssh/id_dsa_jve: 
Identity added: /home/julien/.ssh/id_dsa_jve (/home/julien/.ssh/id_dsa_jve)

And from there on, I can ssh into any destination without a password prompt

julien@laptop:~$ ssh security-relay.linuxwall.info
Last login: Fri Jul 22 14:43:22 2011 from 62.15.98.128
[julien@security-relay ~]$

But that doesn't give me the hability to ssh into the destination server directly. If I try to pipe a second SSH command in the first one, here is what happens:

julien@laptop:~$ ssh -t -t security-relay.linuxwall.info ssh 10.1.2.20
Permission denied (publickey,gssapi-with-mic).
Connection to security-relay.linuxwall.info closed.

I can successfully connect to security-relay.linuxwall.info but the connection to 10.1.2.20 fails with a Permission denied. This makes sense because the second commands is executed like any normal bash command on the security-relay machine, and because there is no ssh-agent running in my security-relay session, the command has no key to connect to 10.1.2.20.

The inefficient solution here would be to transfert my private key to the security relay, but I don't want that. SSH provides a more elegant way of doing this by forwarding your local agent status to your remote session. You can activate this feature by changing a parameter in the client configuration of ssh:

julien@laptop:~$ cat .ssh/config 

Host security-relay.linuxwall.info
    ForwardAgent yes
    ForwardX11 no
    SendEnv LANG LC_*
    HashKnownHosts no
    GSSAPIAuthentication yes
    GSSAPIDelegateCredentials no

ForwardAgent yes will do the trick and allow you to SSH in the remote server using your local private key.

julien@laptop:~$ ssh -t -t security-relay.linuxwall.info ssh -t -t 10.1.2.20
Last login: Fri Jul 22 15:13:58 2011 from 10.1.4.10

[julien@10.1.2.20 ~]$ 

Note: the -t option is mandatory, otherwise ssh complains about the tty.

     -t      Force pseudo-tty allocation.  This can be used to execute
             arbitrary screen-based programs on a remote machine, which
             can be very useful, e.g. when implementing menu services.
             Multiple -t options force tty allocation, even if ssh has no
             local tty.

Launch a root command remotely.

Now that we can automate the connection, there is another problem we have to deal with: useradd requires root permission. And to gain root permission, we can either enter the root password, or enter the sudo password.

sudo has the -S option to take the password from stdin. So we can pipe the password from the command line using:

julien@laptop:~$ echo "mysudopassword"|ssh -t -t security-relay.linuxwall.info sudo -S 'tail /var/log/messages'

[... output of /var/log/messages...]

But there are two problems with this method. First, the password become visible on the command line, and stored in bash_history afterward. You do not want that.

We can use read instead. read will prompt for the password and store it in a variable that we can echo to our command. The password will still be displayed in the local terminal, but not in the bash_history or list of processus.

julien@laptop:~$ read -s -p "password: " rootpassword && echo $rootpassword|ssh -t -t security-relay.linuxwall.info sudo -S 'tail /var/log/messages'

[... output of /var/log/messages...]

Now, the second problem: I haven't found a way to pass a variable through the SSH relay. I can pass one level of SSH but not the second one. I've looked around but didn't find an option for this, so if somebody knows a way, I'm all ears.

To overcome that, we have to open a little security hole, and temporarily store the password on the security-relay, then proceed in two steps instead of one:

  1. launch an ssh command that store the password in a file on the relay
  2. launch a second commant that cat the file and pass the variable to the destination sudo command

We don't need to cat the password file more than once, by doing it once only, we open the sudo session on the destination server and reuse that session afterward.

The first command looks like that

julien@laptop:~$ read -s -p "password: " rootpassword && ssh -t -t security-relay.linuxwall.info "env rootpassword=\"$rootpassword\" |echo $rootpassword > /home/julien/laptopsudo && chmod 400 /home/julien/laptopsudo"

We first store the password using read, then pass that environment variable to the remote session and store it in a local file called /home/julien/laptopsudo.

It's not ideal, and I'm aware of the security issue. but we reduce the risk by keeping that file for a few seconds only (the duration of the operation), and limiting it's permissions to 400.

[julien@security-relay ~]$ ls -al laptopsudo 
-r-------- 1 julien julien 11 Jul 22 17:16 laptopsudo

Note that if anybody has a better solution, I'm definitely interested.

The subsequent commands read the password from the local file and execute the sudo command in the destination server. However, sudo is a bit tricky here as well. By default, sudo refuses to execute a sudo command if the user doesn't own a proper tty. So to allow this behavior, we need to update /etc/sudoers. I must admit I haven't read too much about the security implications of this. But considering that my servers are not exposed, and no user is able to open a session on them, I imagine this is fine.

On the destination server, edit /etc/sudoers as follow:

[julien@10.1.2.20 ~]$ sudo sudoedit /etc/sudoers
[sudo] password for julien:

[...]
     55 # Disable "ssh hostname sudo <cmd>", because it will show the password in clear.
     56 #         You have to run "ssh -t hostname sudo <cmd>".
     57 #
     58 #Defaults    requiretty
     59 Defaults    !requiretty
[...]

With requiretty disabled, we can now run the following command

julien@laptop:~$ ssh -t -t security-relay.linuxwall.info "cat /home/julien/laptopsudo | ssh 10.1.2.20 sudo -S tail /var/log/messages"

Creating the user

It's a several step process, but now that we can launch sudo commands on the destination server, it's fairly straighforward.

  1. sudo as root create the user with useradd
  2. sudo as the user and create it's .ssh
  3. copy the public key locally
  4. move the public key to the user's authorized_keys file
  5. destroy the sudo session

It can be done in a few commands. I put everything in a bash script with variables to make it look cleaner and reusable.

#!/usr/bin/env bash
# connect to a dest server through a ssh relay to create a new user
server_list="10.1.2.20 10.1.2.21 10.1.2.22 10.1.2.23 10.1.2.24"
user_to_create=spongebob
user_group=bobleponge
user_public_key='ssh-rsa AAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxWt1RZ0n2ee3dzPepNODw== spongebob'
ssh_relay=security-relay.linuxwall.info


sudo_password_file=sudopassword$(date +%s)

echo "starting batch processing, please enter your sudo password"
read -s -p "password: " rootpassword && ssh $ssh_relay "env rootpassword=\"$rootpassword\" |echo $rootpassword > $sudo_password_file && chmod 400 $sudo_password_file"

for server in $server_list; do

    echo "creating $user_to_create in $server"
    if [ ! -z $user_group ]; then
        ssh -t -t $ssh_relay "cat $sudo_password_file | ssh $server \"sudo -S -u root /usr/sbin/useradd -d /home/$user_to_create -g $user_group -m -s /bin/bash $user_to_create\""
    else
        ssh -t -t $ssh_relay "cat $sudo_password_file | ssh $server \"sudo -S -u root /usr/sbin/useradd -d /home/$user_to_create -m -s /bin/bash $user_to_create\""
    fi

    ssh -t -t $ssh_relay "ssh $server \"sudo -u $user_to_create mkdir /home/$user_to_create/.ssh\""

    ssh -t -t $ssh_relay "ssh $server \"echo $user_public_key > $user_to_create.authorized_keys\""

    ssh -t -t $ssh_relay "ssh $server \"sudo mv $user_to_create.authorized_keys /home/$user_to_create/.ssh/authorized_keys\""

    ssh -t -t $ssh_relay "ssh $server \"sudo chown $user_to_create:$user_group /home/$user_to_create/.ssh/authorized_keys\""

    ssh -t -t $ssh_relay "ssh $server \"sudo chmod 740 /home/$user_to_create/.ssh/authorized_keys\""

    ssh -t -t $ssh_relay "ssh $server \"sudo -k\""
done

ssh $ssh_relay rm $sudo_password_file

And that's it ! It took me longer than expected to get to this result, I wasn't expecting sudo and ssh do be so complex to deal with, but the results is quite pleasant. And once you have the knowledge, reusing that code for other tasks is a piece of cake.

Cheers

lundi, avril 4 2011

Bzip and my 4 cores

Inter core i5, what a marvelous beast. 4 CPU cores in one tiny laptop. The problem is to use them properly. And when I had to compress a 700MB log file a few days ago, I realized that not all the tools on Linux are multi-core friendly.

Today, a fellow PLUG member pointed me to lbzip2, a multi-threaded implementation of bzip2. I just gave it a quick shot and the results are interesting:

Initial file:

$ ls -s jmeter-server-node1.log --block-size=1
689274880 jmeter-server-node1.log

=== with bzip2 ====

$ time bzip2 -z -9 jmeter-server-node1.log

real	8m33.220s
user	8m31.444s
sys	0m0.880s

$ ls -s jmeter-server-node1.log.bz2 --block-size=1
1589248 jmeter-server-node1.log.bz2

$ time bunzip2 jmeter-server-node1.log.bz2

real	0m35.801s
user	0m33.662s
sys	0m0.964s

=== with lbzip2 ====

$ time lbzip2 -n 4 -z -9 -S jmeter-server-node1.log 

real	5m37.425s
user	20m57.227s
sys	0m5.016s

$ ls -s jmeter-server-node1.log.bz2 --block-size=1
1601536 jmeter-server-node1.log.bz2

$ time lbzip2 -n 4 -d jmeter-server-node1.log.bz2 

real	0m20.370s
user	1m15.697s
sys	0m1.316s

Compression is of the same level, but I'm surprised to see that while lbzip2 is 65% faster, it also uses 250% more user time than bzip2. The efficiency per-core is a lot lower, but I'm happy to be using all my cores.

jeudi, mars 24 2011

KVM/Qemu lesson of the day

Young impatient jedi: the disk cache in your image XML configuration file you must remove, or network bandwidth slower than jabba at the NewYork marathon experience you will !

root@jvegln:/etc/libvirt/qemu# cat OpenSolarisGLN-i386.xml

[...]

 <devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>

vendredi, mars 4 2011

Quantum Random Bit Generator Service

Just a quick note about a service I discovered today: Quantum Random Bit Generator Service http://random.irb.hr/.

It's basically a random number generation service accessible online. So you just register an account, download the command line binary and get up to 100MB of true randomness per day for free.

$ ./qrand -u julien -p secretpassword -y -c 1000 -o test 
Downloaded and saved 1.95 KiB (100.00% of requested) in 0.282 sec (0.007 MiB/s). 

julien@arael:~/Downloads/QRBG QRand Utility/bin$ file test 
test: data

julien@arael:~/Downloads/QRBG QRand Utility/bin$ cat test 
[.........bunch of random junk........]


What is really cool, is that you will never get that quality of randomness on your operating system.

... we have used fast non-deterministic, stand-alone hardware number generator relying on photonic emission in semiconductors. ...

Big downside: no SSL. So you have no way to check if bigs are modified along the way...

But that could still be useful for diskless systems. I'm sure there is a way to plug that into /dev/random. Just give me a few days ;)

- page 1 de 25