The Sport of Programming

 

I’ve been envious of cryptographers for the many public challenges bodies such as GCHQ have been setting for many years now. I have always wanted to jump in but cryptography is not an area I have an interest in, and the barrier to entry for me has just been too high. Which is why I was delighted to see a competition in an area I do have some knowledge in, data analytics.

The Data Science Challenge was fronted by the UK Government’s Defence Science and Technology Laboratory promised to give ordinary members of the public the chance to play with “representative” defence data. Two competitions were set, a text classification and a vehicle detection competition. Both took the format of providing a training data set to create a model, and then scored were based on making predictions for an un-labeld test data set.

DSC

The text classification competition was detecting the topics of Guardian articles from the content, whilst the vehicle detection competition was detecting and classifying vehicles appearing in satellite images. I saw this as an excellent opportunity to practice two technologies I had not used much before, Spark and TensofFlow.

How’d I do?

Good. Tragically as the user area of the website has already been taken down by the time of writing this retrospective I can’t check my final standings, however I entered both competitions and from memory finished just outside the top 20 in each (of 500-800ish entrants in each competition)

Which I’m pretty happy with. I noted the top-10 in each competition did not enter both competitions, so I’m happy that my skill-set is general enough to pick up new (to me) technologies quickly and perform reasonably well, even if not quite matching those specialised in a particular area.

How’d I did it – Text Classification

I had been starting to learn Apache Spark in the run up to this competition as R was proving too difficult to parallelise efficiently for large data sets and thought it a natural fit here. I found the map-reduce aspect of Spark easy to pick up, it’s very similar to functional programming & lambda calculus I studied in University many years ago, which further goes to show nothing’s really new in IT. Even neural networks aren’t too evolved from the hyper heuristics of 10+ years past.

My solution was based on a comment from Dr Hannah Fry in the BBC4 documentary The Joy of Data that I watched a few weeks earlier, where she summarised that the less frequently a word is used, the more information it carries. For each topic I conducted a word-count and compared the frequency with which a word was used in the topic with the frequency with which it was used outside the topic. The words which saw the mist significant increase in use frequency were then used to classify topics.

I found setting thresholds for the number of distinct articles a word was used in to be key as this prevented words used many times in a small number of articles from selecting over-fitted keywords. Once the keywords for each topic were identified, it was easy to count them in all articles which reduced the problem to simple classification based on numerical data.

I experimented with a range of models including random forests and multiple variable linear regression, extreme gradient boosting showed the best accuracy.

At this point I was still quite far off the pace set by the leaders, I then extended my solution to also use bigrams (sequential pairs of words). This took a little more effort particularly as punctuation now had to be accounted for whereas previously it could all be stripped but a fun coding session later I was running.

There are obviously a lot more pairs of words than there are words, and this is where I met the computational limitations of my machine. Memory was manageable but I needed more compute to do more analysis on bigrams, and further trigrams. The majority of my code was Spark using pyspark so moving on to AWS would be fairly simple, but two driving forces made me stop there:

  1. There’s another competition and I really want to do both
  2. I’m a cheapskate and don’t want to pay AWS

How’d I did it – Vehicle Detection

Basically, I hacked somet together with TensorFlow and did surprisingly well.

This is far from anything I have done before, but I consider myself a well-rounded programmer and was keen to take up the challenge. I wrote my dissertation many years ago on Computer Vision and feature detection and so had some understanding of image processing, but had not yet touched neural networks.

With time now being of the essence since I spent too long on the first competition I dived into some tutorials and worked backwards. In my eyes the problem became find out what I can do, then hammer that into a format that answers the question.

I’d previously dabbled in a Kaggle digit recognition competition and used this as the starting point, however it was Rstudio’s Tensorflow tutorial that really got me up and running. With a little code modification to account for three colour channels I was able to pass image “chips” in labeld with what they contained (if anything, random un-tagged chips were also used) and use those to train a Softmax model, and then a Multilayer ConvNet, both using a range of different chip-sizes and chip-spacing to find a good balance.

An example source image, and two chips containing vehicle (not to scale)

As a beginner I started with the CPU-only version of TensorFlow but quickly moved to the GPU accelerated version using NVIDIA’s cuDNN library. Wow, the improvement was staggering. The training stage was just over 7 times faster using my modest GTX 960M (4GB version) than using just my i7-6700HQ.

Closing Thoughts

I enjoyed the challenge but there were a couple of points which let it down. Firstly the promise of playing with representative defence data was totally exaggerated, the data was articles from theguardian.com and google satellite images  of a UK city. It was nice to get the data in an easily machine processable format but this data is already publicly accessible via HTML and APIs.

Secondly although building a community was a stated goal, the competition was not set up to facilitate that. The leaderboard was limited to viewing the top 10 and the community forums already seem to have been taken down. Hopefully they can learn from Kaggle and its thriving community here.

but I am very satisfied how close I came to the winners in each competition and look forward to the next round. Time to see what else I can do with my growing Spark & neural networks knowledge.

 

 

Advertisements

It’s just not worth buying computers at the moment.

In our present market, two turbulent forces entwine to null the value of anything less than monumental improvements. These are Brexit and memory price gouging. To qualify the clickbaity title, specifically I mean for technical computing in the UK.

Lies, damn lies, and marketing

In an earlier post I described that although an interesting technical observation, the idea of doubling in actual performance has been falsely perpetuated by marketing types. 10-20% improvement is more realistic however incremental improvements have at least been improvements. And a new machine has been a worthwhile investment over renewing maintenance on an old machine. Plus we like shiny new machines…

But there are many ways of measuring performance, and for many workloads even a 10% generational improvement is a falsehood.

SPECfp

To test my title hypothesis, consider SPECfp. This is a computer benchmark designed to test the floating point performance, however it differs from Linpack in that it more accurately represents Scientific / Technical computing applications. These tend to be very data-orientated and often push entire system bandwidths to the limits moving data on and off the CPUs

I collected data published on spec.org and using R extracted a comparable set of statistics for generations of Intel Xeon E5 CPUs, grouped them by their E5 sku number to compare generation-on-generation performance. Anyone who has benchmarked AMD for performance applications will naturally know why I’m only looking at Intel…

The below charts depicts those E5 numbers which occur in all four of the most recent generations being considered. There was an un-even time gap between generations, for further comparison I have also plotted this data against. A trend of decreasing improvements can be seen.

OK yes I’ve exaggerating, but not my much

In fairness to Intel, using these skus as the comparison point is a bit of a simplification. They will say we shouldn’t compare based on their E5 label, but whichever way we look at it the same patterns emerge. The below chart takes the highest performing CPU of each generation (including CPUs not represented on the earlier chart) and plots them against time.

I’ve taken the opportunity to join in the debunking of the “doubling in performance every two years” myth here by also plotting where that doubling in performance would have led to. I give this chart the alternative title: “Where marketing think computer performance has been going compared to where it has actually been going”.

Where marketing think computer performance has been going compared to where it has actually been going

The below chart plots the performance of all E5-2600 CPUs including those which do not occur in all generations for a fuller comparison agnostic to the names these products are given. Again the diminishing returns are apparent.

all-e5-perfs-against-time

The Intimidating Shadow of Ivy Bridge

Returning to my hypothesis, specifically I’d like to zone in on the Ivy Bridge (v2) CPUs launched in the tail end of 2013. Initially priced at a premium, however as prices settled into 2014 many more were bought. Machines sold with 3-years maintenance are pretty standard in IT, and so a significant number of machines up for maintenance renewals or replacement are Ivy Bridge.

Comparing the highest SKU of each generation, we see only 12% increase in real performance Ivy Bridge to Broadwell. Comparing SKUs over generations we typically see around 22% improvement.

This is most worrying as with current memory prices and currency exchanges servers typically cost 20-25% more than they were 8-9 months ago.

Memory cost did decrease per GB from Ivy Bridge until recently, plus we now have DDR4 and SSDs are more sensible. But if you have a higher end Ivy Bridge server falling off warranty, it’s just not replacing it right now. Buy maintenance instead and hope Skylake is better.

The Compute Landscape at the Beginning of 2017

For years the IT industry has accepted Intel as the only viable option. At the beginning of Intel’s reign the consensus was: “yeah Intel CPUs are way better than anyone else, let’s buy lots”. But now the feeling is: “oh, another incremental upgrade from Intel. What’s AMD up to? Ah still nothing. Fine buy more Intel…

Being fair it is mean of me to wail on Intel for AMD’s failure/refusal to compete, Intel have still been innovating just not at the rate we became accustomed to in the competitive years. 2017 looks to be an interesting year, a year we all get more choice.

Beyond Kidz wiv Graphics Cardz

NVIDIA have been pushing really hard for years now to establish themselves beyond gaming. Their GPU hardware offers excellent performance but despite creating a whole CUDA ecosystem to support their products, few made the leap. Incrementally faster horses were fine and we could all get on with our work.

Deep/Machine Learning is beginning to revolutionise IT. It’s stretching out beyond academia into more and more commercial uses. Soon if you do not have an analytics strategy you will not be competitive. This is an excellent area to use GPU accelerators; many machine learning applications involve a larger number of parallel computations proportional to the amount of data. And “big data” applications exploit scale-out designs beautifully.

Intel position their Phi co-processors (and lately Knight’s Landing processors) as a competitor to NVIDIA GPUs, but without significant direction no one really knows what to do with a large number of inferior Xeon cores in one box. Our E5 Xeons are often not at 100% utilisation, there’s little benefit moving to a platform with less memory per core, and less network bandwidth per core.

After years of unchallenged Intel dominance they are emitting the field of dreams aura of “If we build it, they will come”. This works for Xeon E5 chips as no one’s building anything else. But with NVIDIA building and aggressively supporting users move to their platform, accelerator users are flocking to NVIDIA leaving Phi and Knight’s Landing dead on the side of the road.

Are AMD about to ante up?

You’d think that as Intel have been cramming more and more cores into a box then AMD should have been quite competitive, until recently AMD were exceeding Intel in this metric. But their architecture is such that two “cores” share an ALU. This makes it not too dis-similar to Intel’s Hyperthreading where two virtual cores also time-share a physical core. Both get good utilisation out of their ALUs, but in most fair comparison Intel outperforms AMD.

AMD have been viewed as a cheaper “also-ran”. With the major exception of cloud providers, most of the industry has been moving to do more from less hardware. And even many cloud providers are using Intel (often E3s stacked high and sold cheap).

Intel have been coasting. The time is right for AMD to get back in the game. PCIe Gen 4.0 along with a refreshed nano-architecture could offer great potential for high-bandwidth applications.  Bandwidth between CPUS and accelerators, memory and the network.

Choice is Good

I’m speculating somewhat on AMD’s next platform and weather it will be any good, but NVIDIA certainly are well placed for 2017. The announcement of their Pascal architecture last year was a game changer for accelerators of which we are still feeling excitement. And IBM’s opening of their historically proprietary POWER platform into the OpenPower foundation opens the gates for more competitive POWER systems to break through.

I see more going on in compute now than there has been for years.

The Myths and Marketing of Moore’s Law

Moore’s Law won’t end. Even when it ends it won’t end.

The Law follows that more components can be crammed into an integrated circuit with developments in technology over time. However transistors are getting so small that current leakage becomes a greater issue. In short this means there needs to be an amount of empty space between transistors for them to work predictably and without predictability you can’t build computers. This “empty space” (dark silicon) means even if we were to make transistors infinitely small, there would still be a finite limit on how many we could fit on a chip.

For electrical transistors at least, the current wording of Moore’s Law is ending. I won’t prophesies a paradigm shift to optical or quantum computers to take the next leg; although on the way they will not arrive in time. It won’t end for a much simpler reason…

What’s this doubling business?

The idea of doubling in “performance” always was a myth. Even in the frequency scaling hey-day we saw diminishing returns but a doubling in something sure was a good reason to buy a new computer. With recent CPU architectures we’ve only been seeing ~10% increase in performance for a die shrink and ~20% for a full nano-architecture redesign, which is why for many system owners the hardware refresh cycle can be five or more years.

Why it won’t end:

It’s not a law governing what will happen but an observation on what has happened. The prospect of selling computers funds innovation IT so marketeers will just adapt the law to observe something else. We old hats know this won’t be the first time. The real world implication of Moore’s Law is you buy a new computer every few years, which is why though the wording may change The Law will continue. And the myth of doubling with it.

time to take Java seriously again?

Like many Computer Science graduates Java was the first language I’d say I really learnt. Sure I’d dabbled in C and VB but Java is where I first wrote meaningful code beyond examples from the text book. Again like many Computer Science graduates, I turned my back on Java pretty soon after that.

The need is not to get the most out of your hardware but to get the most out of your data, as quickly and continuously as possible to retain your advantage.

My experience in video game programming as well as my current day job around research computing (although not in a programming capacity) both feature squeezing every drop out of hardware which sadly leaves little space for Java. In both code written in fast low-level languages is optimised to exploit the hardware it will run on.

remove-c-give-java

The ongoing data analytics and machine learning revolution, surely the most exciting area in IT at the moment, is bringing with it a data-centric approach of which we should all take note. The need is not to get the most out of your hardware but to get the most out of your data, as quickly and continuously as possible to retain your advantage.

Spark for example is written in Scala, which compiles into Java byte code to run on the Java Virtual Machine which itself finally runs on the hardware. Furthermore many Spark apps are themselves written in a different language such a R or Python which have to first interface with Spark. This is a lot of layers of abstraction each adding overheads which would be shunned by performant orientated programmers.

kill-java

Yet when I look at these stacks I instead see wonderful things being done and begin to see past my preconceptions.

I’m also seeing containers grow in prominence which are a natural fit for Java development. With S2I builds (source to image) developers can seamlessly inject their code from their git repository into a Docker image and deploy that straight onto a managed system.

Whilst C++ will remain the norm for mature performant orientated applications, hypothesis testing and prototyping to yield quick results is giving an extra life to Java.