freenerd

i'm johan. i do things with computers. i'm from berlin

Joining Mapbox

This week I joined the team of Mapbox. Mapbox builds great tools to create fantastic maps of the world, your country or your neighbourhood. The maps are highly customizable, with beautiful styles and your own data sets. Put them into your mobile app or display them on your website. Foursquare, Pinterest and others are already using it. And you should really have a look at the great guides and get started. Below is a satellite map of Berlin-Kreuzberg with my co-working space co.up marked up:

At Mapbox I'll be working with their infrastructure team, making sure that the maps are processed and served reliably and fast. Mapbox runs on AWS and I'm very keen on learning more about the paradigms of infrastructure engineering that entails. I'll stay in Berlin, but you'll likely be able to meet me often in Washington, DC and San Francisco.

Maps have been fascinating to me for a long time. Back in the days when I got my first mobile phone, I was intruiged by the fact that I would have access to a searchable map of our planet at any time. Maps are the user interface to the world around us and a key part of how we discover, plan and move. Also they are a prime means of revealing information. Therefore they have an imense impact on our lives. Still, working with maps is cumbersome and this is what Mapbox is changeing.

Mapbox as a company was build with two important aspects in mind: Open Source and OpenStreetMap.

The Mapbox Github org currently has over 100 public projects (like the core map design tool Mapbox Studio and sqlite3 bindings for Node.js). Combined with that, Mapbox internally organizes mainly around Github, working much like an internal Open Source project. All information is in repositories, wikis and issues. And after spending a week with it, I couldn't be happier. The level of transparancy is amazing. My Github feed truly is the pulse of what everyone in the company is up to.

Mapbox is a huge consumer and contributor to OpenStreetMap, the Wikipedia for map data. Mapbox has worked on better tooling (like the id editor) and has an own team contributing data (like buildings in NYC). And as outlined very well here, the world needs OpenStreetMap.

So all of these aspects made it clear to me that Mapbox is a company I want to be a part of. I'm happy I get the chance to do so. And if you have any questions about maps, cloud infrastructure or working at Mapbox, keep them coming. I'm happy to answer.

(Update 13th Oct 2014: The welcome blog post on mapbox.com is online here)

Master's Thesis: On Dependability Modeling in a Deployed Microservice Architecture

Master's Thesis Cover

Today I handed in my Master's Thesis for IT-Systems Engineering. Seven months of doubt and hard work, breakthroughs and setbacks finally come to a conclusion. It's a good feeling.

I posted about the core content of the thesis earlier already. The final title is "On Dependability Modeling in a Deployed Microservice Architecture" and here is the final abstract:

The microservice architectural style is a common way of constructing software systems. Such a software system then consists of many applications that communicate with each other over a network. Each application has an individual software development lifecycle. To work correctly, applications depend on each other. In this thesis, we investigate how dependability of such a microservice architecture may be assessed. We specifically investigate how dependencies between applications may be modeled and how these models may be used to improve the dependability of a deployed microservice architecture. We evaluate dependency graphs as well as qualitative and quantitative fault trees as modeling approaches. For this work the author was embedded in the engineering team of “SoundCloud”, a “Software as a Service" company with 250 million monthly users. The modeling approaches were executed and qualitatively evaluated in the context of that real case study. As results we found that dependency graphs are a valuable tool for visualizing dependencies in microservice architectures. We also found quantitative fault trees to deliver promising results.

You can download the whole thesis here (pdf/118 pages/1.5 MB). The Latex source is on Github. Both the thesis itself and the source are licensed under the Creative Commons BY-NC-ND 4.0 License.

Thanks to Peter Tröger for sending me on the journey for this thesis. Thanks also to all engineers at SoundCloud, especially Peter Bourgon for embedding me in his team and Alexander Grosse for allowing me to do the research with the company. Also thanks to nxtbgthng for letting me use their office and Lea and Hannes for proof-reading.

Modeling Dependencies in a Microservice Architecture

This post summarizes my master's thesis at a high level. More info on the thesis is here.

(Note: This post is a work-in-progress. I will update it in April/May 2014, as the thesis work is concluded)

Introduction

When reasoning about the availability of a service in a microservice architecture, the basis for discussion often are the opinions of the engineers involved. Information used are the assumptions made while building the service, experience from operations (often visualized through graphs) and the occasional diagram (of dataflow or dependencies). This data tends to be (by its nature or the way it is selected) subjective and makes it difficult to get a holistic view.

In my thesis I tried to investigate a more structured approach, that should deliver more objective data.

My starting point was to find out which services exist and how these depend on each other. This can be modeled as a directed graph of dependencies. This graph allows the construction of faul trees for individual service failures. When these are annotated with failure probabilites, it is possible to calculate the probability of failure for the service.

I investigated to which extent it is possible to run this process automatically in an existing microservice architecture. I did this in a case study with a company from Berlin.

Building a Dependency Graph

I model dependencies between applications (service providers and service consumers) as a directed graph. Nodes are applications and edges are dependencies.

A dependency from A->B implicates, that for A to work correctly, B has to work correctly. Following is a very simplified example of a web application:

Web -> MySQL
Web -> Recommendations
Recommendations -> ElasticSearch

Directed Graph

So how might this graph be generated? Following are some of the approaches I tried.

Manual annotation The dependencies might be written down by the engineers themselves. Given that each application has a repository, a practical implementation of this approach could be to host a dependency annotation file in each application's repository. The problem with this method is, that the correctness of the graph depends on people maintainting the annotation files. Thus there might always be a (actually existing or at least feared) gap between reality and the graph. It's not a technical, but a human problem. In my case study, I did the manual annotation for most applications.

From source code I did not find a fitting method to derive service dependencies from source code. One approach assumed that all service dependencies are encapsulated in shared libraries. That is not the case (think http calls via standard library), but even if it would be true, we'd still need to map the actual service to the library, like "shared library mysql2 encapsulates a service dependency to application MySQL". Another approach depended on how service discovery happens. Given service discovery happens via static identifiers in the code, we could parse these out. We then know the service dependency, if it is possible to derive the service name from that static identifier. An example is service discovery via DNS, where it is also assumed that the domain names follow a specific schema. For my case, that schema should include the service name, like <service>.internal.example.com. In my case study, the source code mostly did not include service discovery identifiers, but got them assigned through the environment.

From application deployment environment Given that the application gets its service discovery identifiers from the environment, we can use the same mechanism as in the previous section to detect the service dependency. The environment (also called configuration) is usually compiled during the deploy of the application. Examples for this are via chef (with configuration files read by the application) or via a deployment system (with environment variables read by the application). In my case study, only some services went through service discovery but a majority of services was addressed through "physical addresses" (hostname and port). I attribute the low coverage of service discovery to the fact that it was only recently introduced in the company and sees slow adoption by the engineers.

From network connections Given that the applications are deployed and communicate with each other on the network, we might use that traffic to identify dependencies. For example we might capture network connections via sockstat or netstat on application hosts. A connection might then look like this: source_pid source_ip:port -> destination_ip:port. To determine the source application, we may use the source process id. In my case study, I was able to derive the application name from the process id through an implementation detail of the deployment system. To determine the destination application, we need inverse service discovery, which turns an ip:port pair into a semantic service name. As mentioned before, in my case study I found a low adoption of service discovery, therefore this approach yielded sparse resuls as well.

To conclude, only the manual annotation resulted in a usable graph.

More approaches do exist: For example using a traceing system like Google's Dapper that tracks all network calls and is annotated with application identifiers might allow the extraction of service dependencies.

Constructing a Fault Tree

Before we construct the Fault Tree, we have to set some assumptions: As failure semantics we assume, that there is no fault tolerance. The failure of a service leads to the failure of all applications that depend on it. Also, an application might have two reasons for failure: either because one of the services it depends on fails, or because of "inherent" failure, e.g. a bug in the software.

Let's construct a fault tree, based on our previous dependency graph. The node, whose failure we are interested in, becomes the top event (in our example Web). From there, we create the "inherent" basic event for that service and an intermediate event for each dependency. All are connected to the top event via an OR-gate. We recursively continue this process for all dependencies. Here is the fault tree graphic:

Directed Graph

Due to the assumed failure semantics and the resulting algorithm, every application constitutes a single point of failure. Also the fault tree is significantly larger than the dependency graph. Thus, I conclude that such a fault tree alone is not a meaningful tool for modeling availability. But what if the Fault Tree had failure probabilities on it?

Putting numbers on a Fault Tree

Basic events in a fault tree may get failure probabilities assigned. Based on these and the structure of the fault tree, the failure probability of the top event can be calculated.

I investigated two approaches:

  • Historical availability data I wrote about the problem of measuring availability here. In my case study, collecting availability data was difficult, since there is no conclusive measurement regime for internal services in place.
  • Code churn There is some evidence 1 that code churn might be a reasonable metric to predict defects in code. I'm still evaluating this approach.

Both approaches seem to be able to support monitoring the availability threats an application faces from its dependent applications. This in turn could aid architectural design decisions.

Conclusion

Generating the dependency graph is the core of this modeling process. I showed several approaches that should enable the automated generationg of that graph from an existing microservice architecture. In my case study, these were inhibited by a heterogenous environment, especially in regards to the use of service discovery.

The structural fault tree seems to have less usefulness in practice than the dependency graph. On the other hand, a fault tree with failure rates might be a helpful tool in monitoring changes to the availability of an application.

The thesis operates under strong assumptions regarding application failure propagation. Extending it with fault tolerance mechanisms will be an interesting future work, as well as doing a case study in a more homogenous environment.

References

  • 1 Nachiappan Nagappan, Thomas Ball. Use of relative code churn measures to predict system defect density. 2005. pdf

On Measuring the Availability of Services

This post is part of a series of posts in the context of my master's thesis in computer science. Check this post for an overview.

Context

Given we have a software system that is running as a Microservices Architecture (as recently summarized by Lewis & Fowler 1). Similar to their definitions, I define the following:

  • A component is a unit of software that is independently replaceable and upgradeable.
  • A service is a component, that provides an interface for other out-of-process components to connect to, via a mechanism such as a web service request or a remote procedue call.
  • At runtime, each service might have many instances. That might be on the scale of only one instance to hundreds or thousands.

We assume these instances run in one network on many hosts. Services might depend on each other. For the context of this post we assume all communication happens via HTTP, but all ideas here should be independent of protocol.

Availability

When discussing the dependability of a software system, availability is a common aspect to evaluate. Let's look at one availability definition (from 2):

Availability is the readiness for a correct [behavior]. It can be also defined as the
probability that a system is able to deliver correctly its service at any given time.

Behavior here is seen as fullfilling the expectation of the user, which usually is captured in a specification. Given we have a request/response style communication, the specification would include all possible requests and their valid responses. If the service behaves in a way not specified, we speak of a failure of the service. The specification might also include failures (for example an HTTP response with a 500 status code).

When speaking about the availability of a service in practice, we usually would like to reduce that into one number. This comes out of the desire to compare availability, for example how a certain change impacts the availability of a service. In the definition above, availability is defined as a probability. When measuring availability, we usually base it on historical data with this formula:

Availability = Uptime / (Uptime + Downtime)

This will give a number between 0 (always down) and 1 (always up). Interpreted as a percentage, this yields the famous x-nines, like 99.99% ("4 nines") availability. It is important to note, that availability is always defined over a period of time (called mission time), for example for the last 24 hours or the last calendar month. This implies that we may look at availability only in hindsight, based on historical uptime and downtime data.

Let's look at an example of a day:

Mission Time = 24h
Uptime = 23h 50m = 85800s
Downtime = 0h 15m = 900s

Availability = 85800s / (85800s + 900s) ≈ 0,993055 ≈ 99.3 %

One assumption from the above definition is, that a service at any given time might be either up or down (if we'd allow both at the same time, we might get availability numbers over 1). So the next question becomes, how do we practically measure this?

How do we do time?

In the above definition of availability, we used absolute numbers for representing uptime and downtime. But how do we get these numbers?

The usual representation for this is a time series. It assumes a fixed interval of time. To each interval (or its end therefore) we assign the current availability state. For example, the interval could be 1 second. For each second, we would save if the service was up or down. To calculate the uptime, we sum up all seconds with state up within our mission time.

Here is an example: we have a time interval of 1 second and look at the availability for a mission time of 8 seconds. The time series might then look like this:

Mission time = 8s

Time series (u=up/d=down):
time |1|2|3|4|5|6|7|8|
state|u|u|u|u|d|d|u|u|

#up    = 6
uptime = 6s
#down    = 2
downtime = 2s

availability = 6s / (6s + 2s) = 0.75 = 75%

Next, we will look at the actual acts of measuring.

Heartbeat

A Heartbeat is a periodic message, signaling the current state of operation. In our context, it usually involves a client (which gives the heartbeat) and a server (which collects the heartbeat). Heartbeat gives us a classic time series: a server notes the client as up when it sees a valid heartbeat message for a given period and down when none at all or only a failure heartbeat message is seen.

There are two communication patterns for the heartbeat:

  • In a push-based heartbeat, the client reports to the server. Thus, the client has to implement the logic for sending that heartbeat message, based on the heartbeat protocol of the server. An example is an HTTP POST to an endpoint on the server.
  • In a pull-based heartbeat, the server requests the client regularly. The server might either query a dedicated heartbeat endpoint on the client or use an existing endpoint defined in the specification.

A problem with a pull-based Heartbeat is, that it only assures the correctness of a subset of functionality by the client. If the heartbeat endpoint works, it is not verified that the whole client adhers to the whole specification. A failure example is, that each service the client exposes, might have different external dependencies. For example endpoint A might depend on a database and endpoint B on an external API. If the database is unreachable, endpoint A will fail whereas endpoint B will work as expected. Depending on which endpoint would be used, the availability measurement would deliver different results.

Especially for web services, there are a multitude of companies doing heartbeats for you. An older list can be found here. A more sophisticated example is Runscope Radar, which does heartbeats by running a whole test suite against the service, therefore verifying the specification.

How do we do time with events?

Time series are based on regular time intervals. This means each interval may only get assigned one value. This is no good to us if we want to work with event data, which is a common case when request/response communication is involved. There will likely be many clients doing request within each time interval.

To solve this problem, we may aggregate the events for a time period. As an example, let's use HTTP status codes, which have the nice property that they include codes for failure.

time |  1|  1|  3|  4|  4|  6|  7|  7|  9|
code |200|500|404|200|500|200|500|500|500|

period=5s

period[0-4]status[200] = 2
period[0-4]status[404] = 1
period[0-4]status[500] = 2

period[5-9]status[200] = 1
period[5-9]status[404] = 0
period[5-9]status[500] = 3

In this example, we summed up the status codes for each period as a counter. Each status code represents an own time series over these counts.

To use this for availability purposes, we need to condense all these time series to one number. The actual formula for this highly depends on the use case, especially on which behavior is expected and which is not. For the given example we might say that we see the service failed if there are more status codes >= 500 than status codes < 500. For the previous example, period[0-4] would be up and period[5-9] would be down.

As a benefit over heartbeats, this method is based on the actual interaction with the service, therefore providing real-world testing of the specification.

So how do we get hold of these counts?

Count on the service

Responses are captured on the service instances, usually within the instance process. This has the problem that the service might fail in a way that no counts are collected anymore. An example is a kernel panic.

Count on the clients

Responses are captured on the service clients. This might happen either within the instance process or on the network path (for example on a load balancer like HAProxy). This will also detect crash failures of the service.

Both counting methods should gather their data in a central place, given that they will have to run on instances of which we have many. One example for a program doing that aggregation is statsd. It first aggregates counts in each instance process (via a shared library statsd client), then aggregates these aggregates on a statsd server, which eventually writes the time series to a database like graphite.

Inherent problem of measuring

Whenever we measure the availability of a system, we are actually measuring many things at the same time:

  • The availability of the measured system (for example a service)
  • The availability of the communication medium (for example network, with switches on the way)
  • The availability of the measuring system (for example the heartbeat system)

In a perfect world, we assume the measuring system and the communication medium to be perfect and never break. In practice, they do fail and their failures might impact the correctness of the gathered data, especially when they are not detected and thus are assumed to be a failures of the measured service.

Other ways of measuring availability

I'm sure there are more commonly used ways of measuring; please add in the comments.

References

My Master's Thesis

In December 2014 I started working on my master's thesis in IT-Systems Engineering. By April 2014, I started writing. Thoughts have the downside of being hard to verify. So I need more people to read this and give feedback, so I can better fight the neverending battle against doubt. And the most scalable way seem to be public blog posts.

This post is a Table of Contents for all the posts around the thesis.

content posts

meta posts

  • The Story of the Thesis (to be written)
  • Lessons learned (to be written)

Accessing the GitHub API with Golang

So you want to access the Github API with Go? This post should give you some pointer on how to do that. It is deliberatly entry level aimed at API beginners (mostly because I want to get better at writing posts for novices).

In this post, I'm not using the go-github client library, but instead all interaction is done with go's standard net/http library. I do that to show the exact interaction happening with the API (and because using SDKs has its own problems).

The first question you have to answer: What do you want to access?

  1. Public data
  2. Private data your Github account has access to (like private repos)
  3. Private data from other people's Github accounts

The third case is a bit more complicated, since it requires OAuth 2 for an authentication flow. I'll cover that in another blog post. So let's focus on the first two cases for now.

Reading the Github Docs

At developer.github.com you'll find the official Github API docs. Go to Documentation or Reference and you'll get infos on all the HTTP endpoints that are available via the API. Find an endpoint you are interested in. Some endpoints may only work when you are logged in. The docs tell you (sometimes) if authentication is needed. Sometimes they don't (note to self: submit a pull request that adds that information to the Github docs). If you are not sure, just see if that information is publicly available on the Github webpage, since the API and the website basically have the same public/private restriction.

So you found an endpoint to query? For example the user endpoint is a nice starter.

When working with remote APIs, a good idea is to query them "by hand". Everyone's favorite tool for that is the curl, use it on your command line like this:

$ curl -i https://api.github.com/users/freenerd
HTTP/1.1 200 OK
Server: GitHub.com
Date: Sat, 29 Mar 2014 22:39:51 GMT
...
{
  "login": "freenerd",
  "id": 25713,
  ...
}

The -i flag shows the headers of the Response.

Accessing public data on Github

So let's do the http request:

// request http api
res, err := http.Get("https://api.github.com/users/freenerd")
if err != nil {
  log.Fatal(err)
}

// read body
body, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
  log.Fatal(err)
}

if res.StatusCode != 200 {
  log.Fatal("Unexpected status code", res.StatusCode)
}

log.Printf("Body: %s\n", body)
}

The current body is just a string, so we still have to parse it in order to access the data within. The Github API returns json, so let's use Go's encoding/json library.

// parse json
type jsonUser struct {
  Name string `json:"login"`
  Blog string `json:"blog"`
}
user := jsonUser{}

err = json.Unmarshal(body, &user)
if err != nil {
  log.Fatal(err)
}

log.Printf("Received user %s with blog %s", user.Name, user.Blog)

First we create a new struct that will be filled with the data from the json object. Then we Unmarshal the string into the struct. If the naming of the keys in the json and in the struct does not match, use the json:"<name>" annotation to fix it.

That's it for accessing public data. Check the whole code here.

When playing around with the Github API like this, you might all of a sudden be stopped in your work, instead of returning meaningful results, the Github API only returns 403 status codes with a body talking about API rate limit exceeded. To prevent abuse of their API, Github only allows a certain number (60 at time of writing) of unauthorized API calls per hour from one IP. But thankfully, the docs on that rate limiting also point to the solution: more calls per hour (5000 at time of writing) if you do authorized calls, so let's try that next ...

Authorized calls to private data on Github

Github has several ways of doing authenticated calls. Even though it might seem tempting, please don't use Basic Auth with your username and password, for many reasons, but mostly because your passwords should be secret and using them anywhere in code opens the door for making mistakes which might lead to their exposure.

Instead, we opt for using a personal access token that you can create from within the Github web application here. When generating the token, make sure you give it the appropriate scope for the endpoint you want to query.

Once you have your token (a 40-character string), you can use it with basic auth. For example, I want to look at my ssh keys on Github (which requires the read:public_key scope).

Let's first do the request via curl (where becomes your token and the -u flag sets up basic auth):

$ curl -u <token>:x-oauth-basic https://api.github.com/user/keys
HTTP/1.1 200 OK
Server: GitHub.com
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4983
(...)
[
  {
    "id": 483166,
    (...)
  }
]

Again, I've removed some of the output. But note the X-RateLimit-Remaining header, which tells us how many calls we still have left for the next hour. Also the username:password for basic auth is <token> as username and the string x-oauth-basic as password.

Let's do all this in Go. To use basic auth, we have to create an http request object, which is then executed via an http client.

// request http api
req, err := http.NewRequest("GET", "https://api.github.com/user/keys", nil)
if err != nil {
  log.Fatal(err)
}
req.SetBasicAuth("<token>", "x-oauth-basic")

client := http.Client{}
res, err := client.Do(req)
if err != nil {
  log.Fatal(err)
}

log.Println("StatusCode:", res.StatusCode)

// read body
body, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
  log.Fatal(err)
}

log.Printf("Body: %s\n", body)

I'll leave the json parsing to you, its similar to the way we did it before. The full code can be found here.

Setting up Jekyll on Github Pages with a custom domain on Gandi.net

Recently I've moved my blog to Jekyll. This included moving my domain freenerd.de, which is now registered with gandi.net and hosted on github pages. Here is some experience from my move:

  • Switching .de domains is fast The last time I moved a domain, I had to fax (as in fax machine) a signed KK-Antrag and wait for days until the switch happened. Today, one only has to enter an auth code and wait for the DNS TTL to switch to the new registration, boom, done.
  • Think of your mail I'm running my mail through my domain. While switching registration, I also got new mail servers. These have to be configured. Do that before the switch. Also FYI: Gandi allows to forward email (for incoming mail) and still have a mailbox (for outgoing mail) for the same email address.
  • The CNAME influences all your domain When you change the CNAME file in your Github Pages repo, all other Github Pages from all your other projects will also be redirected to your custom domain. Example: The honeypot repo pages used to live under http://freenerd.github.io/honeypot but now are at http://www.freenerd.de/honeypot. This is cool. But remember to not re-use these paths in your blog though.
  • No apex domains with Gandi I couldn't get them to work, since Gandi does not support ALIAS records. So I'm running everything under the www. subdomain now.

Github has good documentation on using a custom domain. Still, for completeness, here is the important bit of my gandi dns zone file:

@ 10800 IN A 192.30.252.153
@ 10800 IN A 192.30.252.154
www 10800 IN CNAME freenerd.github.io.

The blog refreshed

Some days ago, I've switched this blog under freenerd.de from Wordpress to Jekyll. The move had several reasons:

  • A new design I want simplicity. Content is king. Also it had to be readable well on mobile.
  • Write posts in Markdown The expressiveness of HTML is not needed for most of my content. Speed of editing is important. And I don't like WYSIWYG.
  • No Spam Everyone who's been running a Wordpress installation over some time will have experienced hacking attempts and spam. By serving a static website, this became practically irrelevant. This also means, that I don't have to spend time updating my wordpress installation and its plugins anymore.
  • No traffic/scaling concerns Hard to believe, but I had a hosting package at a german hosting company which basically didn't bring any change in the 10 years of contract. 10 years! This included an cap on traffic. Also, their machines were rather weak. On the other hand, they did the job, so it took me a long time to change things. Well, until now. I've switched to hosting on Github Pages.

I considered moving to a blogging platform, but I see freenerd.de as my home address, which should not be subject to the product schedule of a venture-backed company [1]. If I'd have chosen a blog platform in 2004, it would have probably been blogger.com and that has changed quite a lot since then ...

So I've been blogging since 2004. In these 10 years, I published nearly 800 posts. In most of them, I was trying out the medium, writing in German. During the move, I only carried over around 70 posts. Like the first one. My excitement for the first Music Hack Day Berlin in 2009. Working in South Africa. Life events like finishing my Bachelor's thesis, starting and quitting at SoundCloud. Being on StartupBus and organizing Art Hack Day Berlin.

So what was in the 700 posts that vanished? Mostly boring, maybe embarassing notes of my past self. As a person, I think I've moved far in these years. And many things, that I've published back then do not contribute to my present me. Obviously, I want to convey an image of myself here (like: sophisticated tech literate). Party pictures from 9 years ago, if seen in the present context, would not contribute well.

Which brings me back to the fact that its nice to be able to rewrite my own history, on own terms. For that I will be able to reflect on this in 10 years again.

P.S. A shout out goes to robb for helping me with the move. Thanks man!

[1] To be fair, I'm hosting on Github now, which is venture-backed, but this is hosting only and will hopefully never interfer with the hosted content. And even if so, moving to a new hoster could be done within an hour.

Bringing Art Hack Day to Berlin

LEAP_EVENT02

So it looks like every year I need my "organizing a hackathon" fix. After MHD Berlin and MHD Reykajvík, this year it is Art Hack Day Berlin.

The gist: 50 international artists from visual over digital to sound and performance, 48 hours to create a public exhibition, hacking starts on evening of Thursday September 26th, exhibit is on evening of Saturday September 28th, space is LEAP at alexanderplatz, theme is "going dark", event is not-for-profit run by volunteers and financed by some sponsors, in collaboration with transmediale 2014

During past events in SF, NYC, Boston and Stockholm great works were created, as this and this video testify. For the Berlin edition, I'm very excited for the theme of "Going Dark", playing on the general notion of growing data collection and how we react to it. I'm hoping for a political creative exhibition, questioning the individual, society, organizations and governments.

So why am I organizing this hack day? In the past years I was very active in the Music Hack Day scene. But after 15 attended events, I feel creative fatigue. Hacks repeat over and over. The running gag is the infamous "put-tracks/gigs on a map" hack being done at every event (yes, i've done that hack myself in the past). MHDs are still great for networking and learning, but don't stimulate me creatively anymore. For me they somehow became more legitimate work and less innocent fun. I was dreaming of a Music Creation Hack Day that would not involve platform-mashup-promo hacks. Sadly, I did not make that happen (yet).

Then Kriesse introduced me to Olof, the founder of Art Hack Day. We started plotting AHB Berlin in May and in two weeks from now it's finally happening. I love working with an amazing team and this time is a jackpot again (LEAP are great, Olof is fully committed, more helping hands all around). And now the fun phase actually starts, seeing the artists discussing ideas on the mailing list, seeing PR picking up, growing excitement for the hack day to finally start.

So, come to the exhibition! It will be amazing, no doubt.

Saturday September 28th 19:00 - 00:00 at LEAP near Alexanderplatz. Free admission. Performances, Installations, Visual Art. Plenty to watch, also something to drink. Event Link

How I learned to English

From time to time people comment on my English skills being surprisingly good for somebody who never lived in an English-speaking country. So I thought I should write down a bit about my learning path. Please excuse if most of it appears rather trivial.

If I remember correctly I started learning English in school when I was in 5th grade, just being shy of 11 years. A year later, we got internet access at home. And as I quickly discovered, understanding English was highly beneficial there. This sparked my interest in learning English a lot. On a side note: I can still remember how I once stumbled over a French website, was amazed that I understood a bit and hoped this would spark my interest in the language. Turns out the French internet is small and I barely passed my French class. But back to English: The amazement when I had my first online chats in English about Punk music on Napster. The first time I followed an English tutorial on programming and succeeded. All the video games that finally made sense once I understood the instructions and story. And of course music and lyrics.

Classes ran through to the 13th grade and me being 19 years old. During the 8 years of weekly education we went on to read full books, watch movies, listen to music lyrics and write essays and stories in English. So you could say, after 8 years of school, internet and media education I had a decent vocabulary, enough grammar and a basic cultural understanding. What I lacked was pronunciation, confidence and subtext. Not to speak of idioms, puns and humor. Or spending time with actual English-speaking people.

I spent some time abroad in Poland, where English became the third means of communication after Polish and German. Next I went studying Computer Science in Berlin, where a lot of reading and writing was in English but classes mostly stayed German. But the real kicker then was joining the SoundCloud gang. Diverse background and the only common ground for communication being English. Being thrown in, having to adapt and level up. Additionally I made some good English-only friends at that time and ended up having weeks in which my German speaking was reduced to a minimum, despite still living in Berlin/Germany. Obviously this is a huge problem for people coming to Berlin hoping to learn German, but that's another story. Pro tip: Even though I never had a foreign-language partner, friends of mine took that route as a language bootcamp. Turned out quite well for them.

Anyways, my life shifted to English leading to me often writing notes to myself in English. And even my thoughts aren't necessarily in my mother tongue anymore.

But back to how I learned: Movies and TV Series were a big thing. Germany has a terrible culture of dubbing German audio tracks onto everything. Luckily the advent of DVDs and the internet helped us out there. Today I cringe on dubbing and put in extra effort to see orignal versions, preferably with English subtitles. Like The Wire.

Obviously I'm an internet person and my day-to-day work is in English. Still I stumble over unknown words constantly, so I made a habit of looking up everything instantly. Mac OS X's dictionary.app (in combination with Alfred) as well as the Google search spell checker (seriously) are my go-to tools here.

And the last thing is book that influenced the way I write a lot: "Revising Prose" by Richard Lanham. It gives a magnificent view on how to write concise and clear. Sadly the latest edition is overpriced, but I do recommend it strongly.