Improving Python's speed by 40% when running Home Assistant

We use Alpine for most of our Containers. It is the perfect distribution for containers because it is small (BusyBox based), available for a lot of CPU architectures, and the package system is slim. Alpine uses musl as their C library instead of the more commonly used glibc.

Alpine with musl are relatively young compared to their peers (15 and 9 years old) but have seen a significant development pace. Because things move so fast, a lot of misconceptions exist about both based on things that are no longer true. The goal of this post is to address a couple of those and how we have solved them.

This blogpost is not meant as a musl vs. glibc flamewar. Each use case is different and has its own trade-offs. For example, we use glibc in our OS.

For the tests, I used the images from Docker Python library, and the result is published to our base images. I used pyperformance for lab testing and the Home Assistant internal benchmark tools for more real-life comparison. The test environment was running inside a container on the same Docker host.

C/POSIX standard library

I often read: Python is slower when it uses musl as the default C library. This fact is not 100% correct. If the Python runtime was compiled with the same GCC and with -O3, the glibc variant is a bit faster in the lab benchmark, but in the real world, the difference is insignificant. Alpine compiles it with -Os while most other distributions compile it with -O2. This causes the often written difference between the Python runtime interpreters. But when using the same compiler optimizations, musl based Python runtimes have no negative side-effects.

But there is a game-changer, which makes the musl one more useful compared to the glibc-based runtime. It is the memory allocator jemalloc, a general-purpose malloc implementation that emphasizes fragmentation avoidance and scalable concurrency support. There is an interesting effect, which I found on some blogpost about Rust. There were some developers who saw that musl is much faster when using jemalloc compared to glibc, while glibc is slower when using jemalloc. For sure, the benefit with glibc and jemalloc is not the speed as they optimize memory management, but musl get both benefits. While the difference between pure musl and glibc can be ignored, the difference between musl + jemalloc and glibc are substantial (with disabled GCC memory allocator built-in optimization). Yes, today's jemalloc is compatible with musl (there was a time which it was not).

Compiler

How you compile Python is also essential. There were statements from Fedora or Redhat about disable semantic-interposition to get a high-performance boost. I was not able to reproduce this on GCC 9.3.0, but I also saw no adverse side-effects. I can recommend disabling the semantics like the built-in allocator optimization and link jemalloc at build time. I will also recommend using the -O3 optimization. We never saw an issue with these aggressive optimizations on our targeted platforms. I need to say, unlike the distro Python runtime interpreters, we don't need to run everywhere. So we can use the --enable-optimizations without any overwrite and add more flags. I can say today, PGO/LTO/O3 make Python faster and it works on our target CPUs.

Python packages

Alpine indeed has no manylinux compatibility with musl. If you don't cache your builds, it needs to compile the C extensions when installing packages that require it. This process takes time, just like if you would cross-build with Qemu for different CPU architectures. You cannot get precompiled binaries from PyPi. This is not a problem for us as the provided binaries on PyPI are mostly not optimized for our target systems.

To fix installation times of Python package, we created our own wheel index and backend to compile all needed wheels and keep it up to date using CI agents. We pre-build over 1k packages for each CPU architecture, and the build time of the Docker file is not so important at all.

Alpine Linux

Alpine is a great base system for Container and allows us to provide the best experience to our user. A big thanks to Alpine Linux, musl, and jemalloc, which make this all possible.

The table shows the results comparing the Alpine Linux's Python runtime and our optimization (GCC 9.3.0/musl). All tests done using Python 3.8.3.

BenchmarkAlpineOptimized
2to3924 ms699 ms: 1.32x faster (-24%)
chameleon37.9 ms25.6 ms: 1.48x faster (-33%)
chaos393 ms273 ms: 1.44x faster (-31%)
crypto_pyaes373 ms245 ms: 1.52x faster (-34%)
deltablue22.8 ms16.4 ms: 1.39x faster (-28%)
django_template184 ms145 ms: 1.27x faster (-21%)
dulwich_log157 ms122 ms: 1.29x faster (-22%)
fannkuch1.81 sec1.32 sec: 1.38x faster (-27%)
float363 ms263 ms: 1.38x faster (-28%)
genshi_text113 ms83.9 ms: 1.34x faster (-26%)
genshi_xml226 ms171 ms: 1.32x faster (-24%)
go816 ms598 ms: 1.36x faster (-27%)
hexiom36.8 ms24.2 ms: 1.52x faster (-34%)
json_dumps34.8 ms25.6 ms: 1.36x faster (-26%)
json_loads61.2 us47.4 us: 1.29x faster (-23%)
logging_format30.0 us23.5 us: 1.28x faster (-22%)
logging_silent673 ns486 ns: 1.39x faster (-28%)
logging_simple27.2 us21.3 us: 1.27x faster (-22%)
mako54.5 ms35.6 ms: 1.53x faster (-35%)
meteor_contest344 ms219 ms: 1.57x faster (-36%)
nbody526 ms305 ms: 1.73x faster (-42%)
nqueens368 ms246 ms: 1.49x faster (-33%)
pathlib64.4 ms45.2 ms: 1.42x faster (-30%)
pickle20.3 us17.1 us: 1.19x faster (-16%)
pickle_dict40.2 us33.6 us: 1.20x faster (-16%)
pickle_list6.77 us5.88 us: 1.15x faster (-13%)
pickle_pure_python1.85 ms1.27 ms: 1.45x faster (-31%)
pidigits274 ms222 ms: 1.24x faster (-19%)
pyflate2.53 sec1.74 sec: 1.45x faster (-31%)
python_startup14.9 ms12.1 ms: 1.23x faster (-19%)
python_startup_no_site9.84 ms8.24 ms: 1.19x faster (-16%)
raytrace1.61 sec1.23 sec: 1.30x faster (-23%)
regex_compile547 ms398 ms: 1.38x faster (-27%)
regex_dna445 ms484 ms: 1.09x slower (+9%)
regex_effbot10.3 ms9.96 ms: 1.03x faster (-3%)
regex_v881.8 ms71.6 ms: 1.14x faster (-12%)
richards265 ms182 ms: 1.46x faster (-31%)
scimark_fft1.31 sec851 ms: 1.54x faster (-35%)
scimark_lu616 ms384 ms: 1.61x faster (-38%)
scimark_monte_carlo390 ms248 ms: 1.57x faster (-36%)
scimark_sor838 ms571 ms: 1.47x faster (-32%)
scimark_sparse_mat_mult19.0 ms13.2 ms: 1.43x faster (-30%)
spectral_norm567 ms388 ms: 1.46x faster (-32%)
sqlalchemy_declarative364 ms286 ms: 1.27x faster (-21%)
sqlalchemy_imperative60.3 ms46.8 ms: 1.29x faster (-22%)
sqlite_synth6.88 us5.09 us: 1.35x faster (-26%)
sympy_expand1.39 sec1.05 sec: 1.32x faster (-24%)
sympy_integrate67.3 ms49.5 ms: 1.36x faster (-26%)
sympy_sum505 ms389 ms: 1.30x faster (-23%)
sympy_str945 ms656 ms: 1.44x faster (-31%)
telco17.9 ms12.5 ms: 1.44x faster (-31%)
tornado_http347 ms273 ms: 1.27x faster (-21%)
unpack_sequence232 ns212 ns: 1.09x faster (-9%)
unpickle41.6 us30.7 us: 1.36x faster (-26%)
unpickle_list10.5 us9.24 us: 1.14x faster (-12%)
unpickle_pure_python1.28 ms945 us: 1.36x faster (-26%)
xml_etree_parse335 ms292 ms: 1.15x faster (-13%)
xml_etree_iterparse281 ms226 ms: 1.24x faster (-20%)
xml_etree_generate330 ms219 ms: 1.51x faster (-34%)
xml_etree_process263 ms181 ms: 1.45x faster (-31%)

Lovelace: getCardSize can now be async

Ever since we introduced lazy loading cards to Lovelace, getting the card size of a lazy loaded card was hard.

We used to send out an error element before the element was loaded, which would have a getCardSize function. But that would be the wrong size. When the element would be defined we would, fire and rebuild the event so the right card would be recreated.

In 0.110 we stopped doing this, we would give back the correct element, but the element constructor would not be loaded yet, so it doesn't have the getCardSize. When the constructor is loaded, the element will be upgraded and we set the config. From that moment we can call getCardSize.

In this version, we changed the logic for getCardSize so it will wait for this. This means some cards, like stacks, will return a promise because they have to wait for their children to be defined.

If you are a custom card developer and your custom card uses getCardSize to get the size of other cards, you have to adjust it to handle these promises.

Our function to get the card size, which you could copy, now looks like this:

export const computeCardSize = (
card: LovelaceCard | LovelaceHeaderFooter
): number | Promise<number> => {
if (typeof card.getCardSize === "function") {
return card.getCardSize();
}
if (customElements.get(card.localName)) {
return 1;
}
return customElements
.whenDefined(card.localName)
.then(() => computeCardSize(card));
};

We first have the same check as before, if the element has a getCardSize function we will return that value, this should be a number or Promise that resolves to a number.

If the function doesn't exist, we will check if the constructor of the element is registered, if it is, this means the element doesn't have a getCardSize and we will return 1 as we did before.

If the element isn't registered yet, we will wait until it is and then call the same function again of the now defined card to get the size.

Entity class names

Ever wondered when implementing entities for our entity integrations why you had to extend BinarySensorDevice and not BinarySensorEntity? Wonder no longer, as we have addressed the situation in Home Assistant 0.110 by renaming all classes that incorrectly had Device in their name. The old classes are still around but will log a warning when used.

All integrations in Home Assistant have been upgraded. Custom component authors need to do the migration themselves. You can do this while keeping backwards compatibility by using the following snippet:

try:
from homeassistant.components.binary_sensor import BinarySensorEntity
except ImportError:
from homeassistant.components.binary_sensor import BinarySensorDevice as BinarySensorEntity

The following classes have been renamed:

Old Class NameNew Class Name
BinarySensorDeviceBinarySensorEntity
MediaPlayerDeviceMediaPlayerEntity
LockDeviceLockEntity
ClimateDeviceClimateEntity
CoverDeviceCoverEntity
VacuumDeviceVacuumEntity
RemoteDeviceRemoteEntity
LightLightEntity
SwitchDeviceSwitchEntity
WaterHeaterDeviceWaterHeaterEntity

Custom icon sets

If you are the maintainer of a custom icon set, you might need to update it.

In Home Assistant core version 0.110 we will change the way our icons are loaded. We no longer load all the mdi icons at once, and they will not become DOM elements. This will save us almost 5000 DOM elements and will reduce loading time.

This also means we no longer use or load <ha-iconset-svg>, if your icon set relied on this element, you will have to change your icon set.

We introduced a new API where you can register your custom icon set with an async function, that we will call with the icon name as parameter. We expect a promise with an object of the icon requested. Your icon set can decide on a strategy for loading and caching.

The format of the API is:

window.customIconsets: {
[iconset_name: string]: (icon_name: string) => Promise< { path: string; viewBox?: string } >
};

path is the path of the svg. This is the string that is in the d attribute of the <path> element. The viewBox is optional and will default to 0 0 24 24.

An very simple example of this for the icon custom:icon:

async function getIcon(name) {
return {
path: "M13,14H11V10H13M13,18H11V16H13M1,21H23L12,2L1,21Z",
};
}
window.customIconsets = window.customIconsets || {};
window.customIconsets["custom"] = getIcon;

Home Assistant will call the fuction getIcon("icon") when the icon custom:icon is set.

Instance URL helper

If you are an integration developer and came across the problem of getting the URL of the users' Home Assistant instance, you probably know, this wasn't always easy.

The main problem is that a Home Assistant instance is generally installed, at home. Meaning the internal and external address can be different and even those can have variations (for example, if a user has a Home Assistant Cloud subscription).

Matters become worse if the integration has specific requirements for the URL; for example, it must be externally available and requires SSL.

As of Home Assistant Core 0.110, a new instance URL helper is introduced to ease that. We started out with the following flow chart to solve this issue:

Flow chart of getting a Home Assistant instance URL

As a result of this, the previously available base_url is now replaced by two new core configuration settings for the user: the internal and external URL.

From a development perspective, the use of hass.config.api.base_url is now deprecated in favor of the new get_url helper method.

For more information on using and implementing this new URL helper method, consult our documentation here.

Logos for custom integrations

Recently, Home Assistant started to support images & icons for integrations to show up in the frontend. They look amazing and really brings some color to the UI of Home Assistant.

We got a lot of questions lately on how custom integrations (also known as custom components) can add their images. As of today, that is possible!

HACS icon in the Home Assistant frontend

Created a custom integration? Want the logo & icon for your integration to show up in the Home Assistant frontend? In that case, head over to our GitHub brands repository to add yours!

PS: Did you know you can also add your custom integration to our Python wheels repository? It will make the installation of your custom integration in Home Assistant lightning fast!

Translations for custom Lovelace

If you are the author of a custom Lovelace card and use translations, please pay attention as the state translation keys have changed.

Before 0.109, state translations lived under state.<domain>.<state> or state.<domain>.<device class>.<state> for binary sensors. Starting with version 0.109, these translations are now part of the backend and so they have the key format for backend translations. We have standardized the state format to always include a device class. The device class _ is reserved as a fallback for entities without a device class.

OldNew
state.<domain>.<state>component.<domain>.state._.<state>
state.<domain>.<device class>.<state>component.<domain>.state.<device class>.<state>

In future releases, we're planning to migrate state attribute translations to the backend too. We'll publish this on this blog when it happens.

Hassfest for custom components

Hassfest is an internal tool that we use in Home Assistant to make sure that all integrations have valid data. We've now made Hassfest able to validate any integration, including custom integrations. To make it easy to get started with this, @ludeeus has created a GitHub Action that gets you up and running in less than a minute.

To intall it, follow these steps:

  1. Go to your custom component repository on GitHub

  2. Click on "Create new file"

  3. For filename, paste .github/workflows/hassfest.yaml

  4. Paste the following contents:

    name: Validate with hassfest
    on:
    push:
    pull_request:
    schedule:
    - cron: "0 0 * * *"
    jobs:
    validate:
    runs-on: "ubuntu-latest"
    steps:
    - uses: "actions/[email protected]"
    - uses: home-assistant/actions/[email protected]

GitHub will now lint all incoming PRs and commits with hassfest, and will also run it once every night to check against the latest requirements.

The Hassfest action will track the beta release channel. That way you will be notified if your integration is incompatible with newer versions of Home Assistant.

S6 Overlay for our Docker containers

Home Assistant uses a lot of different Docker containers for all kinds of purposes. Not only the Home Assistant Core that is available as Docker containers but also our Supervisor and all add-ons are leveraging Docker.

In many situations, we need to run multiple processes in our containers, that all need to be managed. We used to do this using simple Bash scripting, but quickly learned we need a proper process manager to handle this.

We decided to use the S6 Overlay init system, which is based on the excellent S6 toolset that provides process supervision and management, logging, and system initialization.

The S6 Overlay has been added to our Docker base images, which is used by every Docker image Home Assistant ships.

All containers have been updated, and changes are automatically handled by the Home Assistant Supervisor; For Home Assistant users, there is no noticeable impact.

For users of the Home Assistant Core containers on Docker, this might impact the way you run or start the container. If you run your Home Assistant Core container with an override of the Docker entry point or command, you need to adapt those. For example, some container management systems, like Portainer and Synology, automatically override those for you so you are impacted.

In those cases:

  • The entry point has changed to: /init
  • The command (CMD) has changed to: (Empty/not set)

If you override the command endpoint to start Home Assistant, the init system in the entry point will still be active in the background and a second launch Home Assistant. This can lead to unexpected behavior.

Translations 2.0

We've migrated our translation scripts in the Home Assisstant Core repository under a single namespace. It can now all be invoked using python3 -m script.translations.

Old commandNew command
script/translations_developpython3 -m script.translations develop
script/translations_uploadpython3 -m script.translations upload
script/translations_downloadpython3 -m script.translations download
script/translations_cleanpython3 -m script.translations clean

This will help us prepare for our Translations 2.0 effort that will clean up translations and make it scale better.