Offline Wikipedia
Table of Contents
Creating an Offline Copy of Wikipedia with Tailscaled Kiwix Quadlet
I have a new self-hosting/local-first fascination and an old laptop that was waiting to be upcycled. I might write more about that here later. For now, I’ll cram too many links into this for the curious. Please say ‘hello’ to the first, abruptly technical post on this blog.
#Inspiration
The other day I searched the IPFS ecosystem directory for more projects because I’m enjoying anytype as a notes app so much. I saw it listed there in a roundabout way. It’s still unclear to me how or if anytype depends on IPFS. In any case, it seems to me the IPFS ecosystem directory may have more projects with a local-first ethos. That’s when I noticed Kiwix listed in the directory.
Kiwix looks like it started as offline wikipedia but it seems to support so much more.
Kiwix is an open-source software that allows you to have the whole Wikipedia at your fingertips! You can browse the content of Wikipedia, TED talks, Stack Exchange, and many other resources without an internet connection. Kiwix is available on various platforms, including Windows, macOS, Linux, Android, and iOS.
It’s unclear if that directory is maintained or if Kiwix still depends on IPFS.1 In any case, I like the idea of self-hosting an offline copy of the web resources I appreciate. I get to cosplay as my own digital librarian! Let’s give this a try.
#Implementation
Someone else must have already figure out how to do this so it shouldn’t take me more than an hour, two tops, right? ..right?
I have a yaml template for tailscale-powered pods ready, so all I thought I needed was some config info for spinning up a Kiwix pod, a ZIM file, and a few symlinks. There was an additional piece of dealing with the disappointment that Kiwix doesn’t appear to index ZIM files automatically..?
Utilmately, this boiled down to three things and took about 3-4 hours in the end:
- Manually creating a “library” file
- Telling Kiwix to monitor the library file for changes
- Coming to terms that I’d need to manual update the library file (one command) after I add a new ZIM file (offline copy of a website) to the same directory. Or setup a cronjob on the old laptop.. It’s doubtful I’d be updating that so much.
#Kiwix Pod
A trip over to the awesome-selfhosted repo pointed me to kiwix-serve. I was led to the GitHub container registry kiwix-serve.
It looks like the image is built from this Dockerfile. That means I need to map to port 8080 and mount /data when I configure the pod.
Now all I need to do for this step is setup the quadlet files with this info and generate a tailscale token to add it to the tailnet.
#kiwix.json
Placed this in the tailscale config directory. Notice the port is set to 8080 so it can proxy to the kiwix container in the pod.
{
"TCP": {
"443": {
"HTTPS": true
}
},
"Web": {
"${TS_CERT_DOMAIN}:443": {
"Handlers": {
"/": {
"Proxy": "http://127.0.0.1:8080"
}
}
}
},
"AllowFunnel": {
"${TS_CERT_DOMAIN}:443": false
}
}
#kiwix.kube
[Kube]
ConfigMap=kiwix.kube.d/kiwix-secrets.yaml
Yaml=kiwix.kube.d/kiwix.yaml
Network=kiwix.network
[Install]
WantedBy=multi-user.target
#kiwix.network
[Network]
Label=app=kiwix
#https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#network-units-network
#kiwix.kube.d/kiwix.yaml
It’s important to note that ghcr.io/kiwix/kiwix-serve:latest is an image that appears to be based on this Dockerfile. Importantly, the kiwix-serve image is derived from the ghcr.io/kiwix/kiwix-tools image which appears to be defined in this Dockerfile.
We can see the ENTRYPOINT for the kiwix-serve image is a script called start.sh which will make sure the serve defaults to start listening on port 8080. Importantly, the kiwix container needs to be started with a different default ENTRYPOINT than what it’s been built with.
The kiwix-serve reference doc shows the options we need to start this with:
kiwix-serve --library --monitorLibrary LIBRARY_FILE_PATH
You can see this added to the command options of the kiwix container below:
apiVersion: v1
kind: Pod
metadata:
name: kiwix
labels:
use: "Hosting ZIM files to Read Over HTTPS"
io.containers.autoupdate: "registry"
spec:
containers:
- name: tailscale
image: docker.io/tailscale/tailscale:latest
env:
- name: TS_AUTHKEY
valueFrom:
configMapKeyRef:
name: kiwix-secrets
key: ts-authkey
- name: TS_SERVE_CONFIG
value: /config/kiwix.json
- name: TS_STATE_DIR
value: /var/lib/tailscale
- name: TS_EXTRA_ARGS
value: --hostname=kiwix #--advertise-tags=tag:container # --advertise-exit-node
# - name: TS_USERSPACE
# value: "true" # according to https://hub.docker.com/r/tailscale/tailscale this is enabled by default
securityContext:
#allowPrivilegeEscalation: true
capabilities:
#drop:
#- CAP_MKNOD
#- CAP_NET_RAW
#- CAP_AUDIT_WRITE
add:
- CAP_NET_ADMIN
- CAP_NET_RAW
- CAP_SYS_MODULE
privileged: false
volumeMounts:
- name: ts-config
mountPath: /config
- name: ts-state
mountPath: /var/lib
- name: tunnel
mountPath: /dev/net/tun
- name: kiwix
image: ghcr.io/kiwix/kiwix-serve:latest
command: ["kiwix-serve"]
args: ["--port", "8080", "--library", "--monitorLibrary", "library_zim.xml"]
ports:
- containerPort: 8080
hostPort: 13395
protocol: TCP
volumeMounts:
- name : app-data
mountPath: /data:Z
restartPolicy: Always
dnsPolicy: Default
volumes:
- name: ts-config
hostPath:
path: /local/path/to/appdata/kiwix/tailscale-config
type: Directory
- name: ts-state
hostPath:
path: /local/path/to/appdata/kiwix/tailscale-state
type: Directory
- name: tunnel
hostPath:
path: /dev/net/tun
type: CharDevice
- name: app-data
hostPath:
path: /local/path/to/appdata/kiwix/data
type: Directory
If you want to get fancy, you can make the volume mounts to a network filesystem.
#kiwix.kube.d/kiwix-secrets.yaml
The contents of this file are obviously a secret. For now, this must be a configMap call kiwix-secrets because systemd doesn’t yet support podman secrets. This is necessary so tailscale can retrieve it’s authkey, ts-authkey , to connect to the tailnet.
#ZIM Library
The container will die without a zim file or library to start with. We need to create a library file to avoid this. To make this easier on ourselves, install kiwix-tools.
sudo apt install kiwix-tools -y
Download some zim files from the Kiwix library to add to our own library xml file. Pu the files in the directory that will be mounted to the /data dir in the kiwix pod.
To create the library file, from that dir on the host (NOTE: it seems this command needs to be manually rerun to update the library file whenever you want to add a new zim to the library):
kiwix-manage library_zim.xml add *.zim
Now all the zim files in /data have been added to the library file.
#Deployment
Next is to put all the the quadlet files in the right place.
After the quadlet files are in /etc/containers/systemd/ (running this as root for a proof of concept, can deal with making this rootless if I like it enough down the line) and kiwix.json is in the tailscale config dir, next we reload the systemd daemon to generate the kiwix systemd service.
sudo systemctl daemon-reload
Now start kiwix
sudo systemctl start kiwix
#And Beyond
#Create own ZIM files
There’s an openzim tool called zimit that can be used to create DIY zim files for an personal offline copy of a website. The Kiwix ipfs-portal github repo has only 15 commits. Most of which are 3-4 years old and the most recent was only to the index.html file. Seems they depend on ZIM files rather than IPFS at this point. Kiwix docs reference ZIM, and it looks like the IPFS association is from 2017 and maybe was only a proof of concept for IPFS. ↩
