Offline Wikipedia

Table of Contents

Creating an Offline Copy of Wikipedia with Tailscaled Kiwix Quadlet

I have a new self-hosting/local-first fascination and an old laptop that was waiting to be upcycled. I might write more about that here later. For now, I’ll cram too many links into this for the curious. Please say ‘hello’ to the first, abruptly technical post on this blog.

#Inspiration

The other day I searched the IPFS ecosystem directory for more projects because I’m enjoying anytype as a notes app so much. I saw it listed there in a roundabout way. It’s still unclear to me how or if anytype depends on IPFS. In any case, it seems to me the IPFS ecosystem directory may have more projects with a local-first ethos. That’s when I noticed Kiwix listed in the directory.
Kiwix looks like it started as offline wikipedia but it seems to support so much more.

Kiwix is an open-source software that allows you to have the whole Wikipedia at your fingertips! You can browse the content of Wikipedia, TED talks, Stack Exchange, and many other resources without an internet connection. Kiwix is available on various platforms, including Windows, macOS, Linux, Android, and iOS.

about-kiwix

It’s unclear if that directory is maintained or if Kiwix still depends on IPFS.1 In any case, I like the idea of self-hosting an offline copy of the web resources I appreciate. I get to cosplay as my own digital librarian! Let’s give this a try.

#Implementation

Someone else must have already figure out how to do this so it shouldn’t take me more than an hour, two tops, right? ..right?

I have a yaml template for tailscale-powered pods ready, so all I thought I needed was some config info for spinning up a Kiwix pod, a ZIM file, and a few symlinks. There was an additional piece of dealing with the disappointment that Kiwix doesn’t appear to index ZIM files automatically..?

Utilmately, this boiled down to three things and took about 3-4 hours in the end:

  1. Manually creating a “library” file
  2. Telling Kiwix to monitor the library file for changes
  3. Coming to terms that I’d need to manual update the library file (one command) after I add a new ZIM file (offline copy of a website) to the same directory. Or setup a cronjob on the old laptop.. It’s doubtful I’d be updating that so much.

#Kiwix Pod

A trip over to the awesome-selfhosted repo pointed me to kiwix-serve. I was led to the GitHub container registry kiwix-serve.

It looks like the image is built from this Dockerfile. That means I need to map to port 8080 and mount /data when I configure the pod.
Now all I need to do for this step is setup the quadlet files with this info and generate a tailscale token to add it to the tailnet.

#kiwix.json

Placed this in the tailscale config directory. Notice the port is set to 8080 so it can proxy to the kiwix container in the pod.

{
    "TCP": {
      "443": {
        "HTTPS": true
      }
    },
    "Web": {
      "${TS_CERT_DOMAIN}:443": {
        "Handlers": {
          "/": {
            "Proxy": "http://127.0.0.1:8080"
          }
        }
      }
    },
    "AllowFunnel": {
      "${TS_CERT_DOMAIN}:443": false
    }
  }

#kiwix.kube

[Kube]
ConfigMap=kiwix.kube.d/kiwix-secrets.yaml
Yaml=kiwix.kube.d/kiwix.yaml
Network=kiwix.network

[Install]
WantedBy=multi-user.target

#kiwix.network

[Network]
Label=app=kiwix

#https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#network-units-network

#kiwix.kube.d/kiwix.yaml

It’s important to note that ghcr.io/kiwix/kiwix-serve:latest is an image that appears to be based on this Dockerfile. Importantly, the kiwix-serve image is derived from the ghcr.io/kiwix/kiwix-tools image which appears to be defined in this Dockerfile.

We can see the ENTRYPOINT for the kiwix-serve image is a script called start.sh which will make sure the serve defaults to start listening on port 8080. Importantly, the kiwix container needs to be started with a different default ENTRYPOINT than what it’s been built with.

The kiwix-serve reference doc shows the options we need to start this with:

kiwix-serve --library --monitorLibrary LIBRARY_FILE_PATH

You can see this added to the command options of the kiwix container below:

apiVersion: v1
kind: Pod
metadata:
  name: kiwix
  labels:
    use: "Hosting ZIM files to Read Over HTTPS"
    io.containers.autoupdate: "registry"
spec:
  containers:
    - name: tailscale
      image: docker.io/tailscale/tailscale:latest
      env:
        - name: TS_AUTHKEY
          valueFrom:
            configMapKeyRef:
              name: kiwix-secrets
              key:  ts-authkey
        - name:  TS_SERVE_CONFIG
          value: /config/kiwix.json
        - name:  TS_STATE_DIR
          value: /var/lib/tailscale
        - name: TS_EXTRA_ARGS
          value: --hostname=kiwix #--advertise-tags=tag:container # --advertise-exit-node
        # - name:   TS_USERSPACE
        #   value:  "true" # according to https://hub.docker.com/r/tailscale/tailscale this is enabled by default
      securityContext:
        #allowPrivilegeEscalation: true
        capabilities:
          #drop:
            #- CAP_MKNOD
            #- CAP_NET_RAW
            #- CAP_AUDIT_WRITE
          add:
            - CAP_NET_ADMIN
            - CAP_NET_RAW
            - CAP_SYS_MODULE
        privileged: false    
      volumeMounts:
      - name: ts-config
        mountPath: /config
      - name: ts-state
        mountPath: /var/lib
      - name: tunnel
        mountPath: /dev/net/tun

    - name: kiwix
      image: ghcr.io/kiwix/kiwix-serve:latest
      command: ["kiwix-serve"]
      args:    ["--port", "8080", "--library", "--monitorLibrary", "library_zim.xml"]
      ports:
      - containerPort: 8080
        hostPort: 13395
        protocol: TCP
      volumeMounts:
      - name :  app-data
        mountPath: /data:Z

  restartPolicy: Always
  dnsPolicy: Default
  volumes:
    - name: ts-config
      hostPath:
        path: /local/path/to/appdata/kiwix/tailscale-config
        type: Directory
    - name: ts-state 
      hostPath: 
        path: /local/path/to/appdata/kiwix/tailscale-state
        type: Directory
    - name: tunnel
      hostPath:
        path: /dev/net/tun
        type: CharDevice
    - name: app-data 
      hostPath: 
        path: /local/path/to/appdata/kiwix/data
        type: Directory

If you want to get fancy, you can make the volume mounts to a network filesystem.

#kiwix.kube.d/kiwix-secrets.yaml

The contents of this file are obviously a secret. For now, this must be a configMap call kiwix-secrets because systemd doesn’t yet support podman secrets. This is necessary so tailscale can retrieve it’s authkey, ts-authkey , to connect to the tailnet.

#ZIM Library

The container will die without a zim file or library to start with. We need to create a library file to avoid this. To make this easier on ourselves, install kiwix-tools.

sudo apt install kiwix-tools -y

Download some zim files from the Kiwix library to add to our own library xml file. Pu the files in the directory that will be mounted to the /data dir in the kiwix pod.
To create the library file, from that dir on the host (NOTE: it seems this command needs to be manually rerun to update the library file whenever you want to add a new zim to the library):

kiwix-manage library_zim.xml add *.zim

Now all the zim files in /data have been added to the library file.

#Deployment

Next is to put all the the quadlet files in the right place.

After the quadlet files are in /etc/containers/systemd/ (running this as root for a proof of concept, can deal with making this rootless if I like it enough down the line) and kiwix.json is in the tailscale config dir, next we reload the systemd daemon to generate the kiwix systemd service.

sudo systemctl daemon-reload

Now start kiwix

sudo systemctl start kiwix

#And Beyond

#Create own ZIM files

There’s an openzim tool called zimit that can be used to create DIY zim files for an personal offline copy of a website.

  1. The Kiwix ipfs-portal github repo has only 15 commits. Most of which are 3-4 years old and the most recent was only to the index.html file. Seems they depend on ZIM files rather than IPFS at this point. Kiwix docs reference ZIM, and it looks like the IPFS association is from 2017 and maybe was only a proof of concept for IPFS.   

🔍 Search results