David O'Dell's space

David O'Dell's space

David O'Dell  //  

May 17 / 10:41am

route requests by country code using nginx and the geoip module

I have a client that wanted to limit traffic to their site to only come from the US.

Right away I told them we could use a module in nginx that uses the maxmind geoip library.

It turns out I was right nginx does have a geoip module!

Lucky for me.

Anyway it turns out that using the module is really simple.

After installing it I simply told nginx to route all requests from the US to unicorn and the rest get redirected. 

# send US traffic to unicorn

  if ($geoip_country_code = US) {

      proxy_pass http://unicorn;

      break;

    }

# all other traffic gets redirected
rewrite     ^(.*)   http://foosite.com$1 permanent;

BTW yes hacksors their are a lot of ways of getting around this I know but it should be effective for most requests.

 

 

Jan 8 / 11:56am

dynamic preseed file for ubuntu using sinatra

To build ubuntu physical ubuntu servers we use ubuntu preseed.

This works great but if you use a static preseed file you end up building a host that doesn’t have its hostname or static ip address set. This means that you have to manually set it afterward and we decided to automate it.

BTW it took us a while to figure out how to set a static ip in a preseed file. We blogged about it here: network-preseeding-debianubuntu-with-a-static

To do this we wrote a small sinatra app that dynamically generates the preseed file with the hostname and static ip address.

This is done by looking up the mac address of the requested host from the arp table and comparing it to a pipe delimited file that contains the mac address, what the static ip should be and its hostname.

The list is stored in a file named ip2mac.txt and was populated by a script.

The ip2mac.txt file looks like this:

172.28.0.71|a4:ba:db:35:e6:09|chi-devops11a
  172.28.0.72|78:2b:cb:03:c5:44|chi-devops11b

Instead of calling a static preseed file from the pxelinux.cfg/default file we instead make a request to the sinatra app which generates it dynamically. The line in the default file we use looks like this:

append console=tty0 console=ttyS1,115200n8 initrd=ubuntu-10.04-server-amd64-  initrd.gz auto=true priority=critical preseed/url=http://172.27.0.115:4567/lucid-preseed-noraid interface=eth0 netcfg/dhcp_timeout=60 console-setup/ask_detect=false console-setup/layoutcode=us console-keymaps-at/keymap=us locale=en_US --

When the request is made the sinatra app does the following:

*  1. looks up the mac address of the request from the apr table
*  2. compares the mac address to the matching line in ip2mac.txt
*  3. uses the ip and hostname to populate hostname and ip variables in the preseed file
*  4. returns the preseed file to the host making the request

The code:

require 'rubygems' # skip this line in Ruby 1.9
  require 'sinatra'
  require "erb"
  require 'logger'

  def log(message)
    flog = Logger.new('foo.log')
    flog.info(message)
  end

  def lookup_mac(mac)
    rr = Array.new
    hostfile = File.open("ip2mac.txt","r")
    hostfile.readline
    hostfile.each do |line|
      list_ip,list_mac,name = line.split('|')
      if mac.match(list_mac)
    rr.push(list_ip)
      end
    end
  return rr[0]
  end

  def get_mac_address()
    ip =  @env['REMOTE_ADDR']
    cmd = "arp -n " + ip.chomp + " | grep -v Address | awk '{print \$3}'"
    mac  = `#{cmd}`
   return mac
  end

  def rev_lookup(ip)
    cmd = "host " + ip + " | awk '{print \$5}'"
    hostname = `#{cmd}`
fqdn = hostname.chop.chop
return fqdn
  end

  get '/lucid-preseed-noraid' do
    mac = get_mac_address()
    log(mac)
    ips = lookup_mac(mac)
    log(ips)
    fqdns = rev_lookup(ips)
    @ip = ips
    @fqdn = fqdns
    log(fqdns)
    erb :lucid_preseed_noraid
  end

  get '/lucid-preseed-nosrv' do
    mac = get_mac_address()
    log(mac)
    ips = lookup_mac(mac)
    log(ips)
    fqdns = rev_lookup(ips)
    @ip = ips
    @fqdn = fqdns
    log(fqdns)
    erb :lucid_preseed_nosrv
  end

  get '/' do
    "ops11"
  end

To start the sinatra app just run the following:

ruby preseeder.rb
Dec 12 / 4:57pm

access nested node attributes in a chef recipe

If you use chef bookmark this page as you will need to access nested keys at some point. Chef uses ohai to build a hash of each host chef-client is installed on. In a recipe this is stored in a hash named node. If you want to access a value for a key its simple.

ip = node['ipaddress']

Also if you want to determine if a key exists before you try and access it you can use attribute?()

node.attribute?('ipaddress')

BTW in most cases you will want to make sure the key exists because if it doesn’t chef-client will throw an error.

For nested keys its a bit more difficult, first of all you should always make sure the keys exists. This involves using the has_key? method in nested if statements. Then you can just pull the value from the keys. Below is one way to do it in a recipe. In this example I am making sure the keys filesystem>/dev/sda6>mount exist in the node hash. Then once I’m sure the hash exists I pull out the value.

if node.has_key? "filesystem"
     if node["filesystem"].has_key? "/dev/sda6"
        if node["filesystem"]["/dev/sda6"].has_key? "mount"
           if node['filesystem']['/dev/sda6']['mount'] == '/srv'
                execute "foo" do
                command "touch /tmp/nested_keys_exist!"
                action :run
              end
           end
       end
     end
  end
Dec 2 / 1:05pm

riak cluster backup script with compression #riak

This script will create a compressed back up of a riak cluster and keep the previous days copy. I still have to add restoring as an option.

#!/usr/bin/env ruby

  t = Time.new
  f = t -86400
  today = t.strftime("%Y-%m-%d")
  yesterday = f.strftime("%Y-%m-%d")

  def delete_old()
    unless Dir.glob('/net/fs11/srv/posterous/nfs/riak/*-old').empty?
      l = Dir.glob('/net/fs11/srv/posterous/nfs/riak/*-old')
      puts "deleting oldeest backup"
      File.delete(l[0])
    end
  end

  def rotate_last(yesterday)
    f = "/net/fs11/srv/posterous/nfs/riak/riak_backup-" + yesterday.chomp  + ".bz2"
    t = f + "-old"
    if  File.exists?(f)
      puts "rotating old backup to old"
      File.rename(f,t)
    end
  end

  def run_backup(today)
    puts "creating backup file"
    dump = "/usr/sbin/riak-admin backup riak@172.27.0.113 riak       /net/fs11/srv/posterous/nfs/riak/riak_backup-" + today.chomp + " all"
    `#{dump}`
  end

  def compress(today)
    puts "compressing backup"
    compress = "/usr/bin/pbzip2 /net/fs11/srv/posterous/nfs/riak/riak_backup-" + today.chomp
    `#{compress}`
  end

  delete_old()
  rotate_last(yesterday)
  run_backup(today)
  compress(today)
Oct 25 / 3:35pm

How to create custom graphs with Munin

Munin is lacking many features that cacti has but one thing its really good at is creating custom graphs. Basically all you need is a script written in any language that when run will print out the values and when given the config argument will print the config for the graph. In the example below I am graphing the number of unicorn processes running on a box and the number of that are busy. The values:

./unicorn_inuse 
cap.value 21
inuse.value 11

You can see above I am getting 2 values to graph, cap.value is the total number of unicorn processes running and inuse.value is the number that are busy.

The config:

./unicorn_inuse config
graph_title Total Unicorns in use
inuse.type GAUGE
inuse.label Unicorns in use
inuse.draw LINE1
graph_category Unicorn
graph_args --base 1000 -l 0
graph_scale no
cap.label Total Unicorns
cap.draw LINE2
cap.type GAUGE

Not too many details in the config but graph_category is how to put graphs in a specific bucket in the munin UI.

The graph: Alt text The code:

#!/usr/bin/env ruby  

def get_total()
  cmd = 'ps aux| grep capuser | grep unicorn | wc -l'
  output = `#{cmd}`
  num = output.match(/\d+/)
  return num
end

def get_chillin()
  cmd = "ps aux| grep capuser | grep unicorn | grep 'chillin'| wc -l"
  output = `#{cmd}`
  num = output.match(/\d+/)
  return num
end

def config()     
  puts 'graph_title Total Unicorns in use'    
  puts 'inuse.type GAUGE'   
  puts 'inuse.label Unicorns in use'
  puts 'inuse.draw LINE1'
  puts 'graph_category Unicorn'   
  puts 'graph_args --base 1000 -l 0'    
  puts 'graph_scale no'  
  puts 'cap.label Total Unicorns'
  puts 'cap.draw LINE2'
  puts 'cap.type GAUGE'
end    

argu =  ARGV[0]     
if argu == 'config'     
  config()     
else     
  total = get_total()    
  chillin = get_chillin()    
  inuse = total[0].to_i - chillin[0].to_i
  puts "cap.value " + total[0].to_s     
  puts "inuse.value " + inuse.to_s     
end
Oct 14 / 2:42pm

3 ways to push data to graylog2

If you are a sysadmin or developer and you haven’t heard of graylog2 then your missing out. Graylog2 takes log data(or what ever you want to throw at it), stores it for you and allows you to search it. It does this by using mongodb as its backend and providing a web interface written in rails to categorize and search it. In my case its very useful. I manage servers in 4 physical locations, slice host, rackspace, rackspace cloud and EC2. I needed a way to keep all of the system logs in one place with out having to work too hard at it. Graylog2 was my solution.

So far I use 3 different methods to write data to graylog2.

  1. rsyslog over UDP
  2. piping data over net cat
  3. Using the GELF gem which is specific to graylog2

(1) rsyslog over UDP This is the easiest one by far, and used to write system log data. On ubuntu all I had to do was disable syslog, enable rsyslog and add this one line to /etc/rsyslog.conf

*.*       @graylog2.posterfoo.com

Thats all I had to do. BTW if you want to send the same data over TCP do the following instead.

*.*       @@graylog2.posterfoo.com

(2) piping data over net cat This one is also easy to use, just pipe data to net cat provided with a logging facility and hostname. In the example below I am piping a log file to facility 7(debug) with from the hostname foo.foo.com

#!/bin/sh
  tail -F -q /var/log/nginx/accesslog | \
  while read -r line ; do
  echo "<7> foo.foo.com $line" | nc -w 1 -u graylog2.posterfoo.com 514
  done

Thats it. Once in graylog2 you can sort/search by hostname, logging level or regex on the data itself.

(3) Using the GELF gem which is specific to graylog2 This method provides the most flexibility in that you are allowed to create custom fields. In the example below I am parsing the access_log before I submit to graylog2 using the GELF gem. This results in custom fields which can be used to categorize and sort such as method(GET,PUT,etc..), uri, size, referrer, etc…

#!/usr/bin/ruby
  require 'rubygems'
  require 'gelf'

  def send_gelf(ip,method,uri,code,size,referral)
  line = ip + " " + method + " " + uri + " " + code + " " + size + " " + referral
  n = GELF::Notifier.new("graylog2.posterfoo.com", 12201)
  n.notify!(:host => "prod-nginx", :level => 1, :short_message => line, :_ip => ip, :_method => method, :_uri => uri, :_code =>   code, :_size => size, :_referral => referral)
  end

  ARGF.each do |line|
  x = line.split(/\s+/)
  send_gelf(x[0],x[7],x[8],x[10],x[11],x[12])
  end
Sep 22 / 9:49am

review: NFL Sunday ticket on PS3

I'm a huge Patriot's fan but I live in California.

I have comcast in my home for cable and internet. I hear a lot of complaints about comcast but I've never had a poor experience and didn't want to switch to direcTV just because of the Patriot's.

Luckly with the Sunday Ticket being availble on PS3 ( a few days before the season began) I didn't have to switch.

The good:

  • Installation and setup was really easy, just had to download the app, which was really small and add the money to my playstation wallet
  • Every game is available except for local games, there is also inline stats while the game is on.
  • Includes NFL red zone which will show all touch downs live for all games.
  • Includes the ability to pause, rewind, fast foward the game almost like a DVR. I say almost because you can't record the game.
  • Image quality is pretty good, digital qualty at the miminum which would throttle up to HD after a few minutes.
  • I get to watch every Patriot's game!!

The bad:

  • Cost, being a father of 2 kids with a mortgage it was a really tough decision to spend $340 to watch what amounts to 16 games. I decided not to get it until my wife reminded me that as a father I do very little for myself and to get it. I did.
  • lack of pregame and post game coverage. Didn't think I would miss this but the broadcast only runs from kickoff to the end of the game, other than that its not avaliable. Would be nice to at least get a 15 minutes pre/post game.
  • Can't record games. If I'm paying for the content it shouldn't matter when I watch it, would be nice to have to ability to record.
  • Playstation network, given all the security breaches to the network I felt really uncomfortable putting in my credit card data.

Summary:

The biggest problem is the cost however there is no way around it, even if you have direcTV the cost is the same. This is just a convienant work around to not having to switch your provider. If you really don't want to miss games and can afford it, this is a pretty good solution which isn't perfect but less expensive then going to a local sports bar at 10am on a Sunday morning.

 

 

Sep 5 / 6:18pm

How to scale varnish horizontally with haproxy

We currently rely on varnish to serve up our posts and other pages which are largely static. In fact 40% of requests to our site never hit our web servers as they are served out of varnish’s cache first. For redundancy and also to scale varnish we run two instances(soon to be 3). We initially used varnish’s hashing algorithm based on uri, this worked fine and specific pages were only stored on one varnish. The problem we ran into was when we had to purge a page we ended up sending the command to both varnishes. This causes several problems one is which it simple isn’t scalable, image sending the same purge command to 10+ instances. Another problem with it was the size of the purge.list was twice what is should be. If you manage varnish you know that when the purge list gets too big varnish stops working.

What we decided to do was to direct requests based on the first character of the host name. This works for us because each user has their own subdomain name. Now ont only do all pages for a single user exist on one varnish instance but we can accurately direct the purge request too. One small note if you are going to do this based on uri instead of hostname you will need to edit the regular expression to use the second character as the first will always be a forward slash. In our config below you can see that all hostnames starting with 0-9 & a-j live on web11, everything else lives on web12, in the case of posterous.com requests(which are not cachable we round robin between the two webs. The haproxy config for this is below:

backend posterous_http_web11
mode http
server web11
web11:80 check

backend posterous_http_web12
mode http
server web12
web12:80 check

backend posterous_http_all
mode http
server web11
web11:80 check 
server web12
web12:80 check

frontend posterous_httpdoor 172.27.0.20:80

acl posterous_com_root hdr_beg(host) -i posterous.com
acl a-q_hostnames hdr_beg(host) -i 0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j

use_backend posterous_http_all if posterous_com_root
use_backend posterous_http_web11 if a-q_hostnames 

default_backend posterous_http_web12
Aug 5 / 3:54pm

How to capture all queries on a very busy MySQL server without adding further strain.

We recently had capacity problems where too large a percentage of read queries where going to our master MySQL server instead of the read-only slaves. What I needed to do was capture the queries on the very busy server without consuming more resources to the disk. I started at first using tcpdump to capture the inbound queries.

sudo tcpdump -s0 -A dst port 3306 and src host app11| strings | grep SELECT| sed 's/^.*SELECT/SELECT/'

This worked really well but I needed to run this for an hour or so to get a decent sample size and couldn’t use the local disks on because they were already at capacity. What I ended up doing was piping the output from tcpdump through ssh.

ssh $TO_HOST cat -  ">" $OUT_FILE

The whole process looks like this.

sudo tcpdump -s0 -A dst port 3306 and src host app11| strings | grep SELECT| sed 's/^.*SELECT/SELECT/' | ssh log11 cat -  ">" db11m_query_log.02AUG2011_3:30-4:00

This allowed my to capture ~100MB file without consuming more IO resources on the local disk.

Jun 14 / 10:02am

create a simple graph with MySQL and munin

Munin is lacking many features that other graphing suites have but when it comes to creating custom graphs it excels. To create a custom graph all you need is a script written in any language that outputs the value you are trying to graph and when you supply the config argument it should return its configuration. In this example I’ll create a graph from a MySQL query. The query is returning the total number delayed jobs in our queue. run script with no argument to get the value:

$ ./delayed_jobs_total
totaljobs.value 228964

run the script with the config argument:

$ ./delayed_jobs_total config
graph_title Total Delayed Jobs
totaljobs.type GAUGE
totaljobs.label TotalJobs 
graph_category delayed_jobs
graph_args --base 1000 -l 0
graph_scale no

The code:

#!/usr/local/bin/ruby  
require 'rubygems'   
require 'mysql'  
hostname = '127.0.0.1'    
username = 'REMOVED'    
password = 'REMOVED'    
databasename = 'delayed_job'  
my = Mysql.new(hostname, username, password, databasename)     
def total_count(my)    
rs = my.query('select count(*) from delayed_jobs')      
row = rs.fetch_row    
return row    
end     
def config()     
puts 'graph_title Total Delayed Jobs'    
puts 'totaljobs.type GAUGE'   
puts 'totaljobs.label TotalJobs'      
puts 'graph_category delayed_jobs'   
puts 'graph_args --base 1000 -l 0'    
puts 'graph_scale no'  
end    
argu =  ARGV[0]     
if argu == 'config'     
config()     
else     
total = total_count(my)    
puts "totaljobs.value " + total[0].to_s     
end    
my.close