The pinball metaphor Another interesting thing in the book Elastic Leadership 1 by Roy Osherove. It’s more an anecdote, however the analogy is funny.
He talks about another—quite old—book on management called Becoming a Technical Leader 2. In this book, the author, Gerald Weinberg tells a story about improving the high score at the pinball game. Here is the progress of the score along the time.
Pinball score It’s easy to figure out that the progress does not follow a steady line, we can see several linear—and slow—progress, then big steps.
How to use multiple image tags with docker-compose?
The validated answer was based on extends, but it cannot by used anymore in Compose file format 3.x. As suggested by a user, the Extension fields capability added in the version 3.4 of Docker Compose can replace it to achieve the same goal: reuse a single definition to set several tags.
Three phases In his book Elastic Leadership 1, Roy Osherove talks about an interesting principle he call the Three team phases. If you are a team leader or just a team member your team should be in one of these three phases.
Survival phase (no time to learn): The team is spending its time fighting fires or trying to reach deadlines. The team is struggling and has to use the most efficient solution—certainly not the most efficient, but the most pragmatic—to achieve the work as soon as possible.
Molecule is designed to aid in the development and testing of Ansible roles. Molecule provides support for testing with multiple instances, operating systems and distributions, virtualization providers, test frameworks and testing scenarios. Molecule encourages an approach that results in consistently developed roles that are well-written, easily understood and maintained.
Molecule
Installation $ conda create -n molecule python=3.7 $ source activate ansible $ conda install -c conda-forge ansible docker-py docker-compose molecule # docker-py seems to be called docker in PyPi $ pip install ansible docker docker-compose molecule Main features Cookiecutter to create role from a standardized template.
This is the third article in the Spark on Kubernetes (K8S) series after:
Spark on Kubernetes First Spark on Kubernetes Python and R bindings This one is dedicated to the client mode a feature that as been introduced in Spark 2.4. In client mode the driver runs locally (or on an external pod) making possible interactive mode and so it cannot be used to run REPL like Spark shell or Jupyter notebooks.
The version 2.4 of Spark for Kubernetes introduces Python and R bindings.
spark-py: The Spark image with Python bindings (including Python 2 and 3 executables) spark-r: The Spark image with R bindings (including R executable) Databricks has published an article dedicated to the Spark 2.4 features for Kubernetes.
It’s exactly the same principle as already explained in my previous article. But this time we are using:
A different image: spark-py Another example: local:///opt/spark/examples/src/main/python/pi.
Since the version 2.3, Spark can run on a Kubernetes cluster. Let’s see how to do it. In this example I will use the version 2.4. Prerequisites are:
Download an install (unzip) the corresponding Spark distribution, For more information, there is a section on the Spark site dedicated to this use case.
Spark images I will build and push Spark images to make them available to the K8S cluster.
In this example, I want to show the possibility to perform with the same syntax local computing as well as distributed computing thanks to the Sparklyr package.
To do that I will use the nycflights13 dataset (one of the dataset used in the Sparklyr demo) in order to check if the number of flights by day evolves according to the period of the year (the month).
Spoiler: It varies but not so much.
In one of my previous article I talked about running a Standalone Spark Cluster inside Docker containers through the usage of docker-spark. I was using it with R Sparklyr framework.
However if you want to use from a Python environment in an interactive mode (like in Jupyter notebooks where the driver runs on the local machine while the workers run in the cluster), you have several steps to follow.
You need to run the same Python version on the driver and on the workers.
Spark comes with a history server, it provides a great UI with many information regarding Spark jobs execution (event timeline, detail of stages, etc.). Details can be found in the Spark monitoring page.
I’ve modified the docker-spark to be able to run it with the docker-compose upcommand.
With this implementation, its UI will be running at http://${YOUR_DOCKER_HOST}:18080.
To use the Spark’s history server you have to tell your Spark driver:
Find Spark is an handy tool to use each time you want to switch between spark versions in Jupyter Notebooks without the need to change the SPARK_HOME environment variable.
WSJF stands for Weighted Shortest Job First. It’s a technique used in scaled Agile framework (SAFe) to prioritise jobs—epics, features and capabilities—according to their value relative to the cost to perform it. Basically it’s a way of ranking a list of features in order to maximise the outcome—the value produced—with a constrained capacity to produce it. The job with the highest WSJF (value over the cost) is selected first for implementation.