Video: Probabilistic Modelling with Scala

In April I did a talk at Scalar Conference in Warsaw, Poland, about “Probabilistic modelling with Scala“. The host now released a video of my talk which must not remain confidential.

In the talk I give you an introduction to probabilistic modelling and Bayesian Networks and show you an example analysis done in Scala.

For those of you who don’t know Scala: it’s functional object-oriented hybrid language that allows you to write concise and efficient code. At the same time, Scala is becoming the language of choice for many data analysis tasks, especially in Big Data (e.g. Spark). Working with our PPC software camato and its bidding features involves a lot of number crunching and statistical modelling, so I found Scala to be an almost perfect fit.

Probabilistic models allow you to extract insights from your data while Bayesian Networks are one technique for modelling relations between random variables. Both are easy to grasp and effective in learning under many conditions.

Bayesian Networks allow you to model complex dependencies between observable and hidden variables in your data. For example, you could create a model for the conversion rate of combinations of phrases in your product titles and specific products that contain those phrases. This model could take into account both the combination of a phrase and a product, as well as the fact that some products have a higher conversion rate per se (i.e. independent of which product title they have) and certain phrases work better or worse than others independent of the product (e.g. “high quality”). It could also probabilistically model unobserved variables like the intent of a visitor which you can only guess from your data.

Once you have a trained Bayesian Network, you can ask it a variety of interesting questions:

    • If I sell this new product, what kind of conversion rates can I expect? (Prediction)
    • I have observed a peculiar behavior for a certain product, what could be the reason? (Diagnosis)
    • Do people looking at a certain product become new customers? (Classification)
    • What variables could I change to improve the conversion rate? (Decision making)

Bayesian Networks are quite interesting for many reasons. It takes some time getting into their way of “thinking”, but it might very well pay off nicely!

You can find the slides of the talk here and for the technically inclined you can play with the examples from the talk using the instructions from here.


Martin is developing a bid management solution for crealytics. He is also a long-term meditator and interested in neuro-feedback.

    Find more about me on:
  • twitter