Ansman

ClearDB MySQL + Heroku + UTF-8 = True

It all started when I wanted to select unique rows from a table.

In MySQL or SQLite you’d simply do SELECT * FROM table GROUP BY foo but not in postgres, one of many reasons why I hate postgres.

In postgres you do SELECT DISTINCT ON (foo) *, something SQLite doesn’t support

So I decided to switch to ClearDB on heroku, a cloud MySQL implementation.
Now however my UTF-8 got mangled when I entered it into the database. Here’s an example:

irb(main):001:0> c = Card.new input: 'åäö'
=> #<Card id: nil, input: "åäö", output: nil, card_type: nil, stack_id: nil, created_at: nil, updated_at: nil>
irb(main):002:0> c.input.encoding
=> #<Encoding:UTF-8>
irb(main):003:0> c.save
   (2.7ms)  BEGIN
  SQL (3.6ms)  INSERT INTO `cards` (`card_type`, `created_at`, `input`, `output`, `stack_id`, `updated_at`) VALUES (NULL, '2012-05-13 13:44:44', 'åäö', NULL, NULL, '2012-05-13 13:44:44')
   (24.8ms)  COMMIT
=> true
irb(main):004:0> c = Card.find c.id
  Card Load (3.0ms)  SELECT `cards`.* FROM `cards` WHERE `cards`.`id` = 31 LIMIT 1
=> #<Card id: 31, input: "åäö", output: nil, card_type: nil, stack_id: nil, created_at: "2012-05-13 13:44:44", updated_at: "2012-05-13 13:44:44">
irb(main):005:0> c.input.encoding
=> #<Encoding:ASCII-8BIT>

It took an hour to figure out it is the connectors fault (mysql) and the solution is to use the mysql2 connector.

So how is this done on heroku?

The question is how to switch to the mysql2 connector on heroku since they inject their own database.yml to work.

A simple cat of database.yml showed this:

<%

require 'cgi'
require 'uri'

begin
  uri = URI.parse(ENV["DATABASE_URL"])
rescue URI::InvalidURIError
  raise "Invalid DATABASE_URL"
end

raise "No RACK_ENV or RAILS_ENV found" unless ENV["RAILS_ENV"] || ENV["RACK_ENV"]

def attribute(name, value, force_string = false)
  if value
    value_string =
      if force_string
        '"' + value + '"'
      else
        value
      end
    "#{name}: #{value_string}"
  else
    ""
  end
end

adapter = uri.scheme
adapter = "postgresql" if adapter == "postgres"

database = (uri.path || "").split("/")[1]

username = uri.user
password = uri.password

host = uri.host
port = uri.port

params = CGI.parse(uri.query || "")

%>

<%= ENV["RAILS_ENV"] || ENV["RACK_ENV"] %>:
  <%= attribute "adapter",  adapter %>
  <%= attribute "database", database %>
  <%= attribute "username", username %>
  <%= attribute "password", password, true %>
  <%= attribute "host",     host %>
  <%= attribute "port",     port %>

<% params.each do |key, value| %>
  <%= key %>: <%= value.first %>
<% end %>

The key line here is adapter = uri.scheme, you need to set the schema of the URL to mysql2.

$ heroku config | grep CLEARDB
CLEARDB_DATABASE_URL => mysql://XXXXXXXX@us-cdbr-east.cleardb.com/heroku_XXXXXXXXXXXX?reconnect=true
$ heroku config:add DATABASE_URL="mysql2://XXXXXXXX@us-cdbr-east.cleardb.com/heroku_XXXXXXXXXXXX?reconnect=true"

And there you go, you will need to convert all the rows to UTF-8 though since they are converted on input. For me I just dropped my tabled and loaded from fixture data since I’m in development anyway.

Comments

Fixing WiFi sync when it breaks

I got the apparently notorious problem when my iPhone doesn’t show up in the sidebar of iTunes even though my iPad shows just fine.

I tried an array of fixes but nothing worked until I found one that did.

I plugged the phone in via USB to my laptop and unticked and then reticked the “Sync with this phone over Wi-Fi” and when I unplugged it still showed in the sidebar.

Just hope it works the next time too…

Comments

Why I hate (most) cross platform software

All platform have standards. Some have their window buttons on the left side others on the right, some have built in keychains, some don’t.

These standards are good, it makes you feel familiar with your computer. When you download a new application you feel right at home with all the buttons where they should be.

There is a problem however, standards usually requires you to have code that is written specifically for a certain platform.

Why cross platform software sucks

Most cross platform software developers don’t want to write specialized code for a certain platform, it’s more code to maintain and you have to learn about the platform in question. So the solution is to use a cross platform library for these type of things.

Well, this will make sure your application never follows standards.
Sure, certain standards can still be followed but overall it will never feel completely right.

Integration with platform specific services

Take Firefox for example, I think it’s really bad to be honest. It’s slow, ugly and doesn’t feel quite right on any platform in my eyes.
On OS X you have a keychain to store your passwords, quite handy. The problem is that Firefox has its own keychain.

The same even goes for the SSL certificate store, Firefox comes with its own store so if you add a certificate to the trusted list in the OS it won’t show as valid in Firefox.

How about a uniform look?

Some might argue that Firefox should look and work the same on all platforms.
I say that couldn’t be more wrong.

Most people only use one platform so they don’t care if the application looks the same on all platforms.
Besides, even if they use more than one platform they are more likely to want that app to conform to the platforms standard rather than the applications own.

If they have a completely custom UI the are usually really bad, why re-invent the wheel?

An example of this is FileZilla, it works great but just doesn’t look or feel native.

This is usually why I don’t like open source software, they often look like they are mid development.

All of them aren’t bad

I think Google Chrome is the perfect example of cross platform software done right.

It syncs perfectly with the OS X keychain, uses native certificate store, supports full screen in OS X.

It does have a custom settings page and have recently introduced native printing but I’ll let it slide for now.

But a lot of them are

I looked through my applications folder and found these cross platform applications:

  • Dropbox (mostly hidden but still good but integrates nicely)
  • Eclipse (dear god, writing graphical applications in Java, seriously?)
  • Google Chrome (great)
  • FileZilla (Really bad, doesn’t look right or integrate)
  • iTunes (bad, they do it the OS X style even in windows)
  • Skype (Custom UI, bad for so many reasons)
  • Spotify (OK, custom UI though)
  • Steam (Seems to be a completely custom UI resulting in a bad UX)
  • Sublime Text 2 (fairly good, uses custom settings which is OK)
  • VLC (OK but terribly slow to implement new OS features)

Conclusion

Don’t be lazy when writing software, take the time to write it custom for each platform. This will guarantee that your users will have a good experience no matter which platform they are on.

Especially seeing as the UI is most of the time just connecting the dots to the backend.

Comments

Python, sometimes I truly hate you

I’ve spent nine months happily writing stuff like if foo is 0 which works fine, usually…
But take a look at this:

>>> x = 3
>>> y = 3
>>> x is y
True
>>> x = 4711
>>> y = 4711
>>> x is y
False

The whole problem started when I did a if response.status is 404 which never fired even when it was 404.
I have followed the official python style guide pretty much to the letter and it clearly states that is can only be used with singletons like None but since the first example works I assumed that integers are singletons in python too, this was bad assumption.

After doing some digging I found this answer on StackOverflow. Turns out that all integers between -5 and 256 are actually in a big dictionary so doing if foo is 0 is fine since all 0s are the same object but when I compared to 404 it’s not in the dictionary so most 404 integers will a new object.

Even though comparing something to 0 using is is fine I won’t do this after seeing how fragile it is so take care when doing your comparisons!

Comments

Out with the old, in with the new

After reading this article I decided to migrate to nginx.

I had never had any problems with apache but it always annoyed me that the memory usage was so high and it’s fun to try new things.

Before I started I had a few requirements though:

  • It had to support Rails
  • It had to support PHP (I left PHP a long time ago but stuff like wordpress require it)
  • It had to support reverse proxy (for Splunk)

After some digging I found:

  • Rails - supported via passenger or reverse proxy and mongrel
  • PHP - supported via fastcgi
  • Reverse proxy - Built in!

Getting started

Installing nginx is really easy! There is only one downside really, if you want to use passenger you can’t use ubuntu’s repositories seeing as they don’t come with passenger built in.
This is because, unlike apache, all modules must be compiled into nginx.

I did find this blog though which had their own repository that supports it, awesome!

Migrating from apache

It was up and running pretty fast, now came the task of migrating from apache to nginx.

Here is my default virtual host in apache:

<VirtualHost *:80>
        ServerName www.ansman.se
        ServerAlias ansman.se

        ErrorLog ${APACHE_LOG_DIR}/www.ansman.se-error.log
        CustomLog ${APACHE_LOG_DIR}/www.ansman.se-access.log vhost_combined

        DocumentRoot ${APACHE_WWW_DIR}/main
        <Directory ${APACHE_WWW_DIR}/main>
                Options FollowSymlinks
        </Directory>

        RewriteEngine on
        RewriteCond %{HTTP_HOST}   ^ansman\.se
        RewriteRule ^/(.*)$ http://www.ansman.se/$1 [L,R=permanent]
</VirtualHost>

My goal was to create something equivalent in nginx, this is what I came up with:

server {
  listen          80;
  server_name     ansman.se;
  rewrite ^       $scheme://www.ansman.se$request_uri? permanent;

  access_log      /var/log/nginx/www.ansman.se-access.log;
  error_log       /var/log/nginx/www.ansman.se-error.log;
}

server {
  listen          80;
  server_name     www.ansman.se;
  root            /var/www/main;

  access_log      /var/log/nginx/www.ansman.se-access.log;
  error_log       /var/log/nginx/www.ansman.se-error.log;

  include blocks/php;
  include blocks/general;
}

The only thing I really miss from apache2 is start time variables (such as the ${APACHE_WWW_DIR} above), all variables in nginx is evaluated in runtime so you can’t specify the root directory for logs for example.

Blocks

To avoid code (conf) repetition I created a folder called blocks in my nginx directory. In there I could add files that contained things that were meant to be included from virtual hosts.

One example is the block called general. It contains:

location = /favicon.ico {
  log_not_found off;
  access_log off;
}

location = /robots.txt {
  allow all;
  log_not_found off;
  access_log off;
}

location ~ \.php$ {
  deny all;
}

# Deny all attempts to access hidden files such as .htaccess, .htpasswd, .DS_Store (Mac).
location ~ /\. {
  deny all;
  access_log off;
  log_not_found off;
}

Then in my virtual hosts I just write include blocks/general; and it will use those default values.

PHP

PHP was really easy to set up. All you need is fastcgi installed and running. My instance is running a unix socket at /tmp/fastcgi.socket

All I did was add a block that contained:

location ~ \.php$ {
  allow         all;
  include       blocks/fastcgi_params;
  fastcgi_pass  unix:/tmp/fastcgi.socket;
  fastcgi_index index.php;
  try_files     $uri $uri/ =404;
}

The last line is important to avoid a security hole.

To use PHP on a server I just add include blocks/php; to my server block.

Rails

I chose passenger over mongrel as it seems to be the preferred way of deploying rails apps now a days.

All I needed was to add this to conf.d/passenger.conf:

passenger_root /usr/local/rvm/gems/ruby-1.9.2-head/gems/passenger-3.0.9;
passenger_ruby /usr/local/rvm/wrappers/default/ruby;

And then in my server block I just write passenger_enabled on;

Reverse proxy

Again it was really simple. Here is my config for my Splunk instance

server {
  listen          80;
  server_name     foobar.ansman.se;

  rewrite ^       https://foobar.ansman.se $request_uri? permanent;

  access_log      /var/log/nginx/splunk.ansman.se-access.log;
  error_log       /var/log/nginx/splunk.ansman.se-error.log;
}

server {
  listen          443;
  server_name     foobar.ansman.se;

  access_log      /var/log/nginx/splunk.ansman.se-access.log;
  error_log       /var/log/nginx/splunk.ansman.se-error.log;

  location / {
    proxy_pass        http://127.0.0.1:8000;
    include           blocks/proxy;

    auth_basic            "Restricted";
    auth_basic_user_file  /var/www/splunk/.htpasswd;
  }

  include blocks/general;
}

And in my proxy block I have:

proxy_redirect          default;
proxy_set_header        Host            $host;
proxy_set_header        X-Real-IP       $remote_addr;
proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
client_max_body_size    10m;
client_body_buffer_size 128k;
proxy_connect_timeout   90;
proxy_send_timeout      90;
proxy_read_timeout      90;
proxy_buffers           32 4k;

Trials and tribulations

The only “major” problem I had was passenger. Turns out you need your rails applications to require execjs and some JS runtime (I chose therubyracer) or you’ll get an exception when running the application.

Was it worth it?

I did some simple stress testing using the ab utility.

The commands I ran where:

  • ab -n 10000 -c 100 localhost/index.html
  • ab -n 10000 -c 100 localhost/index.php

Static content (index.html)

Apache

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache/2.2.16
Server Hostname:        localhost
Server Port:            80

Document Path:          /index.php
Document Length:        13 bytes

Concurrency Level:      100
Time taken for tests:   2.416 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2390000 bytes
HTML transferred:       130000 bytes
Requests per second:    4139.51 [#/sec] (mean)
Time per request:       24.157 [ms] (mean)
Time per request:       0.242 [ms] (mean, across all concurrent requests)
Transfer rate:          966.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       6
Processing:     5   24   4.2     23      48
Waiting:        5   24   4.2     23      47
Total:         10   24   4.2     23      48

Percentage of the requests served within a certain time (ms)
  50%     23
  66%     26
  75%     27
  80%     27
  90%     28
  95%     30
  98%     34
  99%     39
 100%     48 (longest request)

nginx

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        nginx/1.1.0
Server Hostname:        localhost
Server Port:            80

Document Path:          /index.html
Document Length:        25 bytes

Concurrency Level:      100
Time taken for tests:   0.542 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2570000 bytes
HTML transferred:       250000 bytes
Requests per second:    18461.29 [#/sec] (mean)
Time per request:       5.417 [ms] (mean)
Time per request:       0.054 [ms] (mean, across all concurrent requests)
Transfer rate:          4633.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   0.7      2       7
Processing:     1    3   1.5      3      16
Waiting:        1    3   1.4      3      15
Total:          3    5   1.5      5      22

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      5
  75%      5
  80%      6
  90%      6
  95%      7
  98%     10
  99%     15
 100%     22 (longest request)

Dynamic content (index.php)

Apache

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache/2.2.16
Server Hostname:        localhost
Server Port:            80

Document Path:          /index.php
Document Length:        13 bytes

Concurrency Level:      100
Time taken for tests:   2.027 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      2390000 bytes
HTML transferred:       130000 bytes
Requests per second:    4934.59 [#/sec] (mean)
Time per request:       20.265 [ms] (mean)
Time per request:       0.203 [ms] (mean, across all concurrent requests)
Transfer rate:          1151.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       6
Processing:    16   20   4.1     19      63
Waiting:       16   20   4.1     19      63
Total:         16   20   4.5     19      68

Percentage of the requests served within a certain time (ms)
  50%     19
  66%     20
  75%     22
  80%     22
  90%     24
  95%     26
  98%     28
  99%     36
 100%     68 (longest request)

nginx

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        nginx/1.1.0
Server Hostname:        localhost
Server Port:            80

Document Path:          /index.php
Document Length:        168 bytes

Concurrency Level:      100
Time taken for tests:   0.545 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Non-2xx responses:      10000
Total transferred:      3400000 bytes
HTML transferred:       1680000 bytes
Requests per second:    18347.55 [#/sec] (mean)
Time per request:       5.450 [ms] (mean)
Time per request:       0.055 [ms] (mean, across all concurrent requests)
Transfer rate:          6091.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   0.6      2       6
Processing:     1    3   0.7      3       8
Waiting:        1    3   0.9      3       8
Total:          3    5   0.7      5      11

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      6
  75%      6
  80%      6
  90%      6
  95%      6
  98%      8
  99%      8
 100%     11 (longest request)

I think this data speaks for itself! :)

Comments

Coolness in Python

I have a sort of love/hate relationship with Python.

I love the readability of the code, just take:

for item in collection

or

if 'string' not in 'some string' 

for example. What I really don’t like though is the use of magic methods such as __len__ for instance and their import system can be a pain.

But during my work I have found some pretty neat stuff with python.

Decorators

A lot of people know about decorators in python, most of us has heard of @property or @classmethod but I think fewer people knows how they actually work.

A decorator is just a function. What happens is at “compile time” the decorator is called with the function as the first argument.
The decorator should then return another function that will be called when the function is called.

I think this is easiest explained with some examples:

def example_decorator1(fn):
    print 'example_decorator1({0})'.format(fn)
    return fn

@example_decorator1
def test_function1():
    print 'test_function1'
    pass

print '-------'
test_function1()

def example_decorator2(fn):
    print 'example_decorator2({0})'.format(fn)
    def wrapper(*args, **kwargs):
        print 'wrapper'
        return fn(*args, **kwargs)
    return wrapper

@example_decorator2
def test_function2(arg1):
    print 'test_function2({0})'.format(arg1)

print '-------'
test_function2(3)

When I run this I get this output:

example_decorator1()
-------
test_function1
example_decorator2()
-------
wrapper
test_function2(3)

This allows for some really cool stuff:

def italic(fn):
    def wrapper(*args, **kwargs):
        return '<i>' + fn(*args, **kwargs) + '</i>'
    return wrapper

def bold(fn):
    def wrapper(*args, **kwargs):
        return '<b>' + fn(*args, **kwargs) + '</b>'
    return wrapper

@bold
@italic
def get_greeting(name):
    return 'Hello, {0}'.format(name)

>>> print get_greeting('world!')
<b><i>Hello, world!</i></b>

Proxy classes 

Sometimes you need to wrap an object but you don’t want to create unnecessary dependencies.

A proxy class wraps an objects and forwards some ore all function calls and attribute accessing to the wrapped object, all you need to do in python is this:

class RootClass(object):
    def foo(self):
        print 'RootClass#foo'
    
    @property
    def bar(self):
        print 'RootClass#bar'
        return 3
    
    @bar.setter
    def bar(self, value):
        print 'RootClass#bar.setter({0})'.format(value)

class ProxyClass(object):
    def __init__(self, object_to_wrap):
        self._object_to_wrap = object_to_wrap

    def __getattr__(self, key):
        print 'ProxyClass#__getattr__({0})'.format(key)
        return getattr(self._object_to_wrap, key)
    
    def __setattr__(self, key, value):
        print 'ProxyClass#__setattr__({0}, {1})'.format(key, value)
        setattr(self._object_to_wrap, key, value)

There is probably something that this proxyclass doesn’t handle, but it seems to handle most things pretty well.

Here’s some examples using these classes:

>>> obj = ProxyClass(RootClass())
>>> obj.foo()
RootClass#foo
>>> print obj.bar
RootClass#bar
3
>>> obj.bar = 5
RootClass#bar.setter(5)
>>> obj.baz # Raises AttributeError error
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'RootClass' object has no attribute 'baz'

Creating a factory with the __new__ function

Some times the proxy class doesn’t always cut it. It can make the code hard to read and it makes documentation quite hard.

The __new__ function is a class level function that is called to create the object request while the __init__ method is used to initialize an object.

This means you can sometimes skip the proxy class all together:

class A(object):
    def __new__(cls, b_class):
        class_to_create = cls._get_class(b_class)
        return super(A, cls).__new__(class_to_create)
    
    @classmethod
    def _get_class(cls, b_class):
        if b_class:
            return B
        else:
            return C

class B(A):
    pass

class C(A):
    pass
    
>>> print A(True)
<__main__.B object at 0x101b95ad0>
>>> print A(False)
<__main__.C object at 0x101b95ad0>

Abstract classes in Python

One thing I really missed from other languages was interfaces and abstract classes but a few weeks ago a found the nifty abc module (kinda sound like programming for pre school kids) which stands for Abstract Base Class. It allows you do create abstract classes with abstract methods and properties

Here is an example:

from abc import ABCMeta, abstractmethod, abstractproperty

class Base(object):
    __metaclass__ = ABCMeta

    @abstractmethod
    def foo(self):
        pass
    
    @abstractproperty
    def bar(self):
        pass
    
class Derived(Base):
    def foo(self):
        print 'foo'
    
    @property
    def bar(self):
        print 'bar'

>>> Base()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: Can't instantiate abstract class Base with abstract methods bar, foo
>>> o = Derived()
>>> o.foo()
foo
>>> o.bar
bar

Proxy imports

I don’t know why this wasn’t obvious to me right from the beginning (since I started coding in C++ and Java) but you can do proxy import.

If you have this setup:

bar.py

def bar():
    pass

foo.py

from bar import *

You can do:

from foo import bar
bar()

That’s it for now, I’ll be sure to post more as I discover new cool stuff.

Comments

The art of writing a kick ass framework

Don’t get me wrong, Splunk is a great product and has a great QA team but the framework we use for testing Splunk sucks. With testing framework I mean they layer between a test and Splunk, not py.test

I could go on for hours on what’s wrong with it but the highlights are:

  • It isn’t object oriented (It’s 2011, not OOP? Really?…).
  • There is no standard, not even close.
  • Hard to debug.

I’ve spent so much time tracking down bugs in our framework. I want to be able to say “If a test breaks something in Splunk has changed/is broken”. Today it’s more like “Oh, our framework broke again”.

Road work ahead

Well the good thing is we know it’s bad and we know something needs to be done. Interns are like children, we’re honest until we’re taught not to be.
We’ll point out bugs until we’re taught it’s just the way it is

Luckily Splunk (and Boris) is awesome so when we told Boris that we wanted to rewrite the framework he answered like he always does; make it happen.

Helmut is born

So here we are, two interns redoing something an entire team will use for the foreseeable future… damn. 

I’ve used great frameworks before (jQuery, Cocoa and Rails to mention a few) but I’ve also used bad frameworks. So what makes a framework good?

I would say a good framework does what you expect, nothing more, nothing less. It also does it in the way you think.
If a framework is consistent you will be able to guess how to do things, this is easier said than done though.

Going back to school

You should learn from the pros. You might not agree but I think Apple makes the most user friendly products out there, doesn’t matter if it’s a framework or a laptop.
The reason for this, I imagine, is because they start out with how the user should use the product, not how the product should be implemented/made.

Test driven development does just this. The basics is you write tests on how you expect things to work before you actually write the code to do the thing you want. This has several advantages:

  • You find any design faults quickly.
  • You know what you need before you even start writing the implementation.
  • You have tests that you can run as soon as you’re done.

What do you mean you don’t want to do 4 lines of setup?

I’ve learned easy doesn’t have to mean less powerful. When we started writing our framework this was our initial way of doing searches:

from helmut.splunk import Splunk
from helmut.connector.sdk import SDKConnector
from helmut.manager.jobs import Jobs

splunk = Splunk('/opt/splunk')
connector = SDKConnector(splunk, username='admin', 
                         password='notchangeme', 
                         namespace='admin:search')
connector.login()
jobs = Jobs(connector)
job = jobs.create('search * | head 10')

It’s really powerful, you can change pretty much every setting imaginable which we thought was awesome. It was not until we started writing tests for this as we became aware that this was bad design.

First of all, most tests doesn’t care what username/password they use or what namespace they are searching through. They also don’t care what connector they connect through so why should every test care about this?

After some discussions this is what we came up with:

from helmut.splunk import Splunk

splunk = Splunk('/opt/splunk')
job = splunk.jobs.create('search * | head 10')

Holy line count, Batman!

We just cut our example from 8 lines to just 3 lines by assuming that most people will want the default settings, something we already know is true.

Everything in the first example is still valid though, you can still override everything (or just some things) depending on what you care about.

If we had started out writing tests we would have discovered this much sooner.

Tests = Examples

Good tests are both example and tests.
Just like a function should do one thing only and do it well a test should test one thing only and test it well.

If you write a test well it serves as a great piece of example code for that specific functionality.

splunk.test_everything()?

When I started coding I wanted frameworks to be all knowing and all seeing, something I quickly learned was very wrong.
If a framework does too much for you it becomes hard (or even impossible) to do things your way.

This is sometimes the case with Rails (I love Rails though!), everything is peachy as long as you do as Rails want but when you want something else stuff might get hard.

I think a framework should help the user accomplish what they want to do (whatever that may be) but not do everything at once. The framework should do exactly what you ask of it, nothing more, nothing less.

This is especially true for a testing framework.
If the stuff the tester sees is too different from what we get back from Splunk how do you know we didn’t cause (or fix) any bugs?

When I started writing the framework I made sure the connector logged in when you created it but that is doing too much. What if you don’t want to log in? What if you are testing an invalid username?

Practice makes perfect

I’m in no way an expert, I’m learning every single day and that’s the beauty of it.
The best way is, and has always been, to practice. Every class you write teaches you something, the important thing is you learn and take care of that knowledge.

There will be more articles on Helmut as the project continues.

Comments

Internship

I mentioned in my intro post that I was doing an internship in San Francisco and maybe I should elaborate a bit more.

The company I’m interning with is called Splunk and they do some pretty amazing stuff.
I first heard about Splunk from my wonderful cousin, he had interned with Boris (the guy who got all the interns) at LucidEra back in the days and he told med that Boris now works at a company called Splunk.

Seeing as I’d never heard of Splunk I asked what they do and why I should go and do an internship with them and got a “They do cool stuff back”, fair enough. This talk become an plan and that plan become reality and in February of 2011 away I went.

So what does Splunk do?

Splunk’s slogan is “Making machine data searchable” and that’s pretty much it in a nutshell.

Imagine that you have a couple of Servers and you want to know how they are feeling, maybe one of them got the sniffles or maybe one is under attack, wouldn’t it be nice if you could know this?

Now a days most actions on a server produces a log line somewhere. When you log in to a server something similar to this is printed in your logs:

Nov  4 06:39:35 Ansbox sshd[32680]: Accepted publickey for nicklas from 98.210.152.237 port 56674 ssh2
Nov  4 06:39:35 Ansbox sshd[32680]: pam_unix(sshd:session): session opened for user nicklas by (uid=0)

And when someone fails to log something like this is printed:

Nov  4 06:43:44 Ansbox sshd[1126]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=c-98-210-152-237.hsd1.ca.comcast.net  user=nicklas
Nov  4 06:43:46 Ansbox sshd[1126]: Failed password for nicklas from 98.210.152.237 port 56684 ssh2

So now all you need to do is set a limit. Let’s say 5 or more failed login attempts within an 10 minutes requires further investigation so we’ll just set up an alert for when this happens and you’ll get an email or a text so you can look into it

How I fit in to all of this

I belong to the group called Engineering Services mean is the common name for everyone that provide services to the engineering team, this includes QA which is really what I’m in.

Finding bugs is fun! It’s as if someone gives you a pickaxe and tells you to go nuts on their brand new car, the more damage the better.
Every time I break something that’s good, it means something isn’t working the way it should.

Like most companies nowadays Splunk promotes free thinking (I know, crazy right) and Boris loves ideas so I’ve gotten to do some pretty awesome stuff during my internship.

And my axe!

I almost forgot to mention I’m not here alone, I’ve got my trusty comrade Pärham. We used to have Tobias too but he went back to Sweden :’(

But we did get 3 new interns from Sweden around a month ago; Petter, Emre and Marcus, together we’re the Swedish Intern Mafia, the A team, the crème de la crème.

Comments

One two, one two…

Seeing how this is my first blog post (ever) I though I ought to introduce myself a bit.

My name is Nicklas Ansman Giertz, I’m a 22 year old aspiring software engineer from Stockholm, Sweden.
Currently I’m in San Francisco for an internship I have been here for almost 9 months now.

When I get back to Sweden in December I can look forward to three more awesome years at The Royal Institute of Technology before I have my masters in Computer Science (yay!)

Y u no blog about good stuff

I see this a sort of a diary except more public. This will be a place where I can dump ideas or make note of things I learned.

But if anyone find what I write interesting/funny/educational/useful then that’s just awesome!

I find that one of the most useful sources of information on the interwebs when it comes to programing stuff is random blogs this is my chance to repay the favor to all those bloggers that has helped me during all these years.

Ah, sweet Content, where doth thine harbour hold?

Seeing as I’m a software engineer by profession I will of course blog about tech stuff but that doesn’t mean I won’t blog about more personal stuff too.
I’ll blog about stuff I think is interesting and useful, let’s just leave it at that.

Comments