Pinball Productivity

published on 2020-01-27 included in book

The pinball metaphor Another interesting thing in the book Elastic Leadership 1 by Roy Osherove. It’s more an anecdote, however the analogy is funny. He talks about another—quite old—book on management called Becoming a Technical Leader 2. In this book, the author, Gerald Weinberg tells a story about improving the high score at the pinball game. Here is the progress of the score along the time. Pinball score It’s easy to figure out that the progress does not follow a steady line, we can see several linear—and slow—progress, then big steps.

Multiple Image Tags with Docker Compose

published on 2020-01-26 included in ops

Recently I answered a question on Stack Overflow

How to use multiple image tags with docker-compose?

The validated answer was based on extends, but it cannot by used anymore in Compose file format 3.x. As suggested by a user, the Extension fields capability added in the version 3.4 of Docker Compose can replace it to achieve the same goal: reuse a single definition to set several tags.

The Three Team Phases

published on 2020-01-25 included in book

Three phases In his book Elastic Leadership 1, Roy Osherove talks about an interesting principle he call the Three team phases. If you are a team leader or just a team member your team should be in one of these three phases. Survival phase (no time to learn): The team is spending its time fighting fires or trying to reach deadlines. The team is struggling and has to use the most efficient solution—certainly not the most efficient, but the most pragmatic—to achieve the work as soon as possible.

Deprecation in R

published on 2020-01-20 included in dev

After my recent article on marking deprecated code in Python, I had to do the same thing in R. It’s included in the language (in The R Base Package).

First Look at the UBI Init Base Images

published on 2020-01-19 included in ops

Among its catalog of Universal Base Images (UBI), Red Hat provides a kind / flavor prefixed init.

Deprecation in Python

published on 2020-01-17 included in dev

It’s always a good practice to deprecate functions, methods or classes before removing or changing something.

Radian

published on 2020-01-08 included in data

I’m a big fan of console tools, and I was wondering if there was a way to color the R console output.

I came across this tool called Radian with this attractive tagline.

A 21 century R console

So let’s try it!

Littler

published on 2020-01-06 included in data

R packages have often funny names. littler stands for little R, this means lower case r.

Functional Programming in Python

published on 2019-12-27 included in dev

Functional programming can also be interesting in Python. Here are some useful snippets.

Molecule

published on 2019-08-26 included in ops

Molecule is designed to aid in the development and testing of Ansible roles. Molecule provides support for testing with multiple instances, operating systems and distributions, virtualization providers, test frameworks and testing scenarios. Molecule encourages an approach that results in consistently developed roles that are well-written, easily understood and maintained. Molecule Installation $ conda create -n molecule python=3.7 $ source activate ansible $ conda install -c conda-forge ansible docker-py docker-compose molecule # docker-py seems to be called docker in PyPi $ pip install ansible docker docker-compose molecule Main features Cookiecutter to create role from a standardized template.

Playing with Haskell

published on 2019-07-01 included in dev

While I was reading Seven Languages in Seven Weeks ¹, I wanted to test Haskell. Just for fun.

Spark on Kubernetes Client Mode

published on 2018-12-29 included in data

This is the third article in the Spark on Kubernetes (K8S) series after: Spark on Kubernetes First Spark on Kubernetes Python and R bindings This one is dedicated to the client mode a feature that as been introduced in Spark 2.4. In client mode the driver runs locally (or on an external pod) making possible interactive mode and so it cannot be used to run REPL like Spark shell or Jupyter notebooks.

Spark on Kubernetes Python and R bindings

published on 2018-12-28 included in data

The version 2.4 of Spark for Kubernetes introduces Python and R bindings. spark-py: The Spark image with Python bindings (including Python 2 and 3 executables) spark-r: The Spark image with R bindings (including R executable) Databricks has published an article dedicated to the Spark 2.4 features for Kubernetes. It’s exactly the same principle as already explained in my previous article. But this time we are using: A different image: spark-py Another example: local:///opt/spark/examples/src/main/python/pi.

Spark on Kubernetes First Run

published on 2018-12-27 included in data

Since the version 2.3, Spark can run on a Kubernetes cluster. Let’s see how to do it. In this example I will use the version 2.4. Prerequisites are: Download an install (unzip) the corresponding Spark distribution, For more information, there is a section on the Spark site dedicated to this use case. Spark images I will build and push Spark images to make them available to the K8S cluster.

Dplyr & Sparklyr usage

published on 2018-12-01 included in data

In this example, I want to show the possibility to perform with the same syntax local computing as well as distributed computing thanks to the Sparklyr package. To do that I will use the nycflights13 dataset (one of the dataset used in the Sparklyr demo) in order to check if the number of flights by day evolves according to the period of the year (the month). Spoiler: It varies but not so much.

Configure PySpark to connect to a Standalone Spark Cluster

published on 2018-11-29 included in data

In one of my previous article I talked about running a Standalone Spark Cluster inside Docker containers through the usage of docker-spark. I was using it with R Sparklyr framework. However if you want to use from a Python environment in an interactive mode (like in Jupyter notebooks where the driver runs on the local machine while the workers run in the cluster), you have several steps to follow. You need to run the same Python version on the driver and on the workers.

Spark History Server available in docker-spark

published on 2018-11-29 included in data

Spark comes with a history server, it provides a great UI with many information regarding Spark jobs execution (event timeline, detail of stages, etc.). Details can be found in the Spark monitoring page. I’ve modified the docker-spark to be able to run it with the docker-compose upcommand. With this implementation, its UI will be running at http://${YOUR_DOCKER_HOST}:18080. To use the Spark’s history server you have to tell your Spark driver:

Configure Sparklyr to connect to a Standalone Spark Cluster

published on 2018-11-28 included in data

One of the simplest way to run a standalone spark cluster is to use Docker!

Find Spark

published on 2018-11-28 included in data

Find Spark is an handy tool to use each time you want to switch between spark versions in Jupyter Notebooks without the need to change the SPARK_HOME environment variable.

WSJF

published on 2018-11-27 included in dev

WSJF stands for Weighted Shortest Job First. It’s a technique used in scaled Agile framework (SAFe) to prioritise jobs—epics, features and capabilities—according to their value relative to the cost to perform it. Basically it’s a way of ranking a list of features in order to maximise the outcome—the value produced—with a constrained capacity to produce it. The job with the highest WSJF (value over the cost) is selected first for implementation.

Back 2 Code