[Howto] My own mail & groupware server, part 4: Nextcloud

Running your own mail and groupware server can be challenging. I recently had to re-create my own setup from the ground and describe the steps in a blog post series. This blog post is #4 of the series and covers the integration of Nextcloud for storage.

Let’s add Nextcloud to the existing mail server. This part will focus on setting it up and configuring it in basic terms. Groupware and webmail will come in a later post! If you are new to this series, don’t forget to read part 1: what, why, how?, and all about the mail server setup itself in the second post, part 2: initial mail server setup. We also added a Git server in part 3: Git server.

Nextcloud as “cloud” server

Today’s online experience does not only cover mails and other groupware functions these days, but also the interaction with files in some online storage. Arguably, for many this is these days sometimes more important than e-mail systems.

Thus adding a service to the mail server providing a “cloud” experience around file management makes sense. The result is lacking cloud functionality in terms of high availability, but provides a rich UI, accessibility from all kinds of devices and integration in various services. It also offers the option to extend the functions further.

Nextcloud is probably the best known solution for self-hosted cloud solutions, and is also used large scale by universities, governments and companies. I also picked it because I had past experience with it and it offers some integrations and add-ons I really like and depend on.

Alternatives worth checking out are owncloud, Seafile and Pydio.

Integration into mailu setup

Nextcloud can be added to an existing mailu setup in three steps:

  1. Let Nginx know about the service
  2. Add a DB and set it up
  3. Add Nextcloud

The proxy bit is easily done by creating the file /data/mailu/overrides/nginx/nc.conf with the following content:

location /nc/ {
  add_header Front-End-Https on;
  proxy_buffering off;
  fastcgi_request_buffering off;
  proxy_pass http://nc/;
  client_max_body_size 0;
}

We also need a DB. Add this to docker-compose.yml:

  # Nextcloud

  ncpostgresql:
    image: postgres:12
    restart: always
    environment:
      POSTGRES_PASSWORD: ...
    volumes:
      - /data/ncpostgresql:/var/lib/postgresql/data

Make sure to add a proper password here! Next, we have to bring the environment down and up again to add the DB container, and then access the DB and create the right users and database with corresponding privileges:

  • Get the DB up & running: docker compose down and docker compose up
  • access DB container: sudo docker exec -it mailu_ncpostgresql_1 /bin/bash
  • become super user: su - postgres
  • add user nextcloud, add proper password here: create user nextcloud with password '...';
  • add nextcloud database: CREATE DATABASE nextcloud TEMPLATE template0 ENCODING 'UNICODE';
  • change database owner to user nextcloud: ALTER DATABASE nextcloud OWNER TO nextcloud;
  • grant all privileges to nextcloud: GRANT ALL PRIVILEGES ON DATABASE nextcloud TO nextcloud;

Now we can add the Nextcloud container itself. We will add a few environment variables to properly configure the DB access and the initial admin account. Add the following listing to the Docker Compose file:

  nc:
    image: nextcloud:apache
    restart: always
    environment:
      POSTGRES_HOST: ncpostgresql
      POSTGRES_USER: nextcloud
      POSTGRES_PASSWORD: ....
      POSTGRES_DB: nextcloud
      NEXTCLOUD_ADMIN_USER: admin
      NEXTCLOUD_ADMIN_PASSWORD: ...
      NEXTCLOUD_TRUSTED_DOMAINS: front
      REDIS_HOST: redis
    depends_on:
      - resolver
      - ncpostgresql
    dns:
      - 192.168.203.254
    volumes:
      - /data/nc/main:/var/www/html
      - /data/nc/custom_apps:/var/www/html/custom_apps
      - /data/nc/data:/var/www/html/data
      - /data/nc/config:/var/www/html/config
      - /data/nc/zzz_upload_php.ini:/usr/local/etc/php/conf.d/zzz_upload_php.ini

Nextcloud configuration

Before we launch Nextcloud, we need to configure it properly. As shown in the last line in the previous example, a specific file is needed to define the values for PHP file upload sizes. This is only needed in corner cases (browsers split up files during upload automatically these days), but can help sometimes. Create the file /data/nc/zzz_upload_php.ini:

upload_max_filesize=2G
post_max_size=2G
memory_limit=4G

Next, we need to create the configuration for the actual Nextcloud instance. Stop the Docker Compose setup, and start it up again. That generates the basic config files on the disk, and you can access/data/nc/config/config.php and adjust the following variables (others are left intact):

  'overwritewebroot' => '/nc',
  'overwritehost' => 'nc.bayz.de',
  'overwriteprotocol' => 'https',
  'trusted_domains' => 
  array (
    0 => 'lisa.bayz.de',
    1 => 'front',
    2 => 'mailu_front_1.mailu_default',
    3 => 'nc.bayz.de',
  ),

After another Docker Compose down and up, the instance should be all good! If the admin password need to be reset, access the container via sudo docker exec -it mailu_nc_1 /bin/bash and reset the password with: su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ user:resetpassword admin"

Next we can connect Nextcloud to the mailu IMAP server to use it for authentication. First we install the app “External user authentication” from the developers section. Next we add the following code to the above mentioned config.php:

  'user_backends' => array(
    array(
        'class' => 'OC_User_IMAP',
        'arguments' => array(
            'imap', 143, 'null', 'bayz.de', true, false
        ),
    ),
),

Restart the setup, and a login as user should be possible.

Sync existing files

In my case the instance was following a previous one. As part of the migration, a lot of “old” data had to be copied. The problem: copying the data for example via webdav is time consuming, does not perform and might be troublesome when the sync needs to be picked up after interruption again.

It is easier to sync direct from disc to disc with established tools like rsync. However, Nextcloud does not know that new files arrived that way and does not list them. The steps to make Nextcloud aware of those are:

  1. Log in as each user for which data should be synced so that target directories exist underneath the files/ directory
  2. Sync data with rsync or other tool of choice
  3. Correct permissions: chown -R ...:... files/
  4. Access container: sudo docker exec -it mailu_nc_1 /bin/bash
  5. Trigger file scan in Nextcloud: su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ files:scan --all"

Recurrent updater

For add-ons like the newsreader, Nextcloud needs to perform tasks on a regular base. Surprisingly enough, Nextcloud cannot easily do this on its own. The best way is to add a cron job to do that. And the best way to do that is a Systemd timer.

So first we add the service to be triggered regularly. On the host itself (not inside the container) create the file /etc/systemd/system/nextcloudcron.service:

[Unit]
Description=Nextcloud cron.php job

[Service]
ExecStart=/usr/bin/docker exec mailu_nc_1 su -s /bin/sh www-data -c "/usr/local/bin/php -f /var/www/html/cron.php"

Then, create the timer via the file /etc/systemd/system/nextcloudcron.timer:

[Unit]
Description=Run Nextcloud cron.php every 5 minutes

[Timer]
OnBootSec=5min
OnUnitActiveSec=5min
Unit=nextcloudcron.service

[Install]
WantedBy=timers.target

Enable the timer: systemctl enable --now nextcloudcron.timer. And it is done. And is way more flexible and usable and maintainable than the old cron jobs. If you are new to timers, check their execution with sudo systemctl list-timers.

DB performance

A lot of Nextcloud’s performance depends on the performance of the DB. And DBs are all about indices. There are a few commands which can help with that – and which are recommended on the self check inside Nextcloud anyway:

access container: sudo docker exec -it mailu_nc_1 /bin/bash
add missing indices: su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ db:add-missing-indices" www-data
convert filecache: su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ db:convert-filecache-bigint" www-data
add missing columns: su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ db:add-missing-columns" www-data

Preview generator – fast image loading

The major clients used to access Nextcloud will probably be the Android client and a web browser. However, scrolling through galleries full of images is a pain: it takes ages until all the previews are loaded. Sometimes even a slide show is not possible because it all just takes too long.

This is because the images are not downloaded in real size (that would take too long), instead previews of the size required in that moment are generated live (still takes long, but not that long).

To make this all faster, one idea is to pre-generate the previews! To do so, we install the app “Preview Generator” in our instance. However, this generates a bit too many preview files, many in sizes which are hardly ever used. So we need to alter the sizes to be generated:

$ sudo docker exec -it mailu_nc_1 /bin/bash
$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:app:set previewgenerator squareSizes --value='256 1024'"
$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:app:set previewgenerator widthSizes  --value='384 2048'"
$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:app:set previewgenerator heightSizes --value='256 2048'"

Also we want to limit the preview sizes to not waste too much storage:

$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:system:set preview_max_x --value 2048"
$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:system:set preview_max_y --value 2048"
$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:system:set jpeg_quality --value 80"
$ su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ config:app:set preview jpeg_quality --value='80'"

Last but not least we run the preview generator:

su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ preview:generate-all -vvv"

Note that this can easily take hours, and thus I recommend to launch this via a tmux shell.

Of course new files will reach the system, so once in a while new previews should be generated. Use this command:

su - www-data -s /bin/bash -c "/usr/local/bin/php -f /var/www/html/occ preview:pre-generate"

This can also be added to a cron job similar to the timer above. Create the file /etc/systemd/system/nextcloudpreview.service:

[Unit]
Description=Nextcloud image preview generator job

[Service]
ExecStart=/usr/bin/docker exec mailu_nc_1 su -s /bin/sh www-data -c "/usr/local/bin/php -f /var/www/html/occ preview:pre-generate"

And add a timer similar to the above one triggering the service every 15 minutes. Create the file /etc/systemd/system/nextcloudpreview.timer:

[Unit]
Description=Run Nextcloud preview generator every 15 minutes

[Timer]
OnBootSec=15min
OnUnitActiveSec=15min
Unit=nextcloudpreview.service

[Install]
WantedBy=timers.target

Launch the timer: sudo systemctl enable --now nextcloudpreview.timer

One final word of caution: previews can take up a lot of space. Like A LOT. Expect maybe an additional 20% of storage needed for your images.

What’s next?

With Nextcloud up and running and all old data synced I was feeling good: all basic infrastructure services were running again. People could access all their stuff with only slight adjustments to their URLs.

The missing piece now was webmail and general groupware functionality. This will be covered in another post of this series.

More about that in the next post.

Image by RÜŞTÜ BOZKUŞ from Pixabay

Of debugging Ansible Tower and underlying cloud images

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Ansible Logo

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Background – Isolated Nodes

Ansible Tower has a nice feature called “isolated nodes”. Those are dedicated Tower instances which can manage nodes in separated environments – basically an Ansible Tower Proxy.

An Isolated Node is an Ansible Tower node that contains a small piece of software for running playbooks locally to manage a set of infrastructure. It can be deployed behind a firewall/VPC or in a remote datacenter, with only SSH access available. When a job is run that targets things managed by the isolated node, the job and its environment will be pushed to the isolated node over SSH, where it will run as normal.

Ansible Tower Feature Spotlight: Instance Groups and Isolated Nodes

Isolated nodes are especially handy when you setup your automation in security sensitive environments. Think of DMZs here, of network separation and so on.

I was fooling around with a clustered Tower installation on RHEL 7 VMs in a cloud environment when I run into trouble though.

My problem – Isolated node unavailable

Isolated nodes – like instance groups – have a status inside Tower: if things are problematic, they are marked as unavailable. And this is what happened with my instance isonode.remote.example.com running in my lab environment:

Ansible Tower showing an instance node as unavailable

I tried to turn it “off” and “on” again with the button in the control interface. It made the node available, it was even able to executed jobs – but it became quickly unavailable soon after.

Analysis

So what happened? The Tower logs showed a Python error:

# tail -f /var/log/tower/tower.log
fatal: [isonode.remote.example.com]: FAILED! => {"changed": false,
"module_stderr": "Shared connection to isonode.remote.example.com
closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n
File \"/var/lib/awx/.ansible/tmp/ansible-tmp-1552400585.04
-60203645751230/AnsiballZ_awx_capacity.py\", line 113, in <module>\r\n
_ansiballz_main()\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 105, in
_ansiballz_main\r\n    invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 48, in
invoke_module\r\n    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 74, in
<module>\r\n  File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\",
line 60, in main\r\n  File
\"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 27, in
get_cpu_capacity\r\nAttributeError: 'module' object has no attribute
'cpu_count'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}

PLAY RECAP *********************************************************************
isonode.remote.example.com : ok=0    changed=0    unreachable=0    failed=1  

Apparently a Python function was missing. If we check the code we see that indeed in line 27 of file awx_capacity.py the function psutil.cpu_count() is called:

def get_cpu_capacity():
    env_forkcpu = os.getenv('SYSTEM_TASK_FORKS_CPU', None)
    cpu = psutil.cpu_count()

Support for this function was added in version 2.0 of psutil:

2014-03-10
Enhancements
424: [Windows] installer for Python 3.X 64 bit.
427: number of logical and physical CPUs (psutil.cpu_count()).

psutil history

Note the date here: 2014-03-10 – pretty old! I check the version of the installed package, and indeed the version was pre-2.0:

$ rpm -q --queryformat '%{VERSION}\n' python-psutil
1.2.1

To be really sure and also to ensure that there was no weird function backporting, I checked the function call directly on the Tower machine:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> import psutil as module
>>> functions = inspect.getmembers(module, inspect.isfunction)
>>> functions
[('_assert_pid_not_reused', <function _assert_pid_not_reused at
0x7f9eb10a8d70>), ('_deprecated', <function deprecated at 0x7f9eb38ec320>),
('_wraps', <function wraps at 0x7f9eb414f848>), ('avail_phymem', <function
avail_phymem at 0x7f9eb0c32ed8>), ('avail_virtmem', <function avail_virtmem at
0x7f9eb0c36398>), ('cached_phymem', <function cached_phymem at
0x7f9eb10a86e0>), ('cpu_percent', <function cpu_percent at 0x7f9eb0c32320>),
('cpu_times', <function cpu_times at 0x7f9eb0c322a8>), ('cpu_times_percent',
<function cpu_times_percent at 0x7f9eb0c326e0>), ('disk_io_counters',
<function disk_io_counters at 0x7f9eb0c32938>), ('disk_partitions', <function
disk_partitions at 0x7f9eb0c328c0>), ('disk_usage', <function disk_usage at
0x7f9eb0c32848>), ('get_boot_time', <function get_boot_time at
0x7f9eb0c32a28>), ('get_pid_list', <function get_pid_list at 0x7f9eb0c4b410>),
('get_process_list', <function get_process_list at 0x7f9eb0c32c08>),
('get_users', <function get_users at 0x7f9eb0c32aa0>), ('namedtuple',
<function namedtuple at 0x7f9ebc84df50>), ('net_io_counters', <function
net_io_counters at 0x7f9eb0c329b0>), ('network_io_counters', <function
network_io_counters at 0x7f9eb0c36500>), ('phymem_buffers', <function
phymem_buffers at 0x7f9eb10a8848>), ('phymem_usage', <function phymem_usage at
0x7f9eb0c32cf8>), ('pid_exists', <function pid_exists at 0x7f9eb0c32140>),
('process_iter', <function process_iter at 0x7f9eb0c321b8>), ('swap_memory',
<function swap_memory at 0x7f9eb0c327d0>), ('test', <function test at
0x7f9eb0c32b18>), ('total_virtmem', <function total_virtmem at
0x7f9eb0c361b8>), ('used_phymem', <function used_phymem at 0x7f9eb0c36050>),
('used_virtmem', <function used_virtmem at 0x7f9eb0c362a8>), ('virtmem_usage',
<function virtmem_usage at 0x7f9eb0c32de8>), ('virtual_memory', <function
virtual_memory at 0x7f9eb0c32758>), ('wait_procs', <function wait_procs at
0x7f9eb0c32230>)]

Searching for a package origin

So how to solve this issue? My first idea was to get this working by updating the entire code part to the multiprocessor lib:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> cpu = multiprocessing.cpu_count()
>>> cpu
4

But while I was filling a bug report I wondered why RHEL shipped such an ancient library. After all, RHEL 7 was released in June 2014, and psutil had cpu_count available since early 2014! And indeed, a quick search for the package via the Red Hat package search showed a weird result: python-psutil was never part of base RHEL 7! It was only shipped as part of some very, very old OpenStack channels:

access.redhat.com package search, results for python-psutil

Newer OpenStack channels in fact come along with newer versions of python-psutil.

So how did this outdated package end up on this RHEL 7 image? Why was it never updated?

The cloud image is to blame! The package was installed on it – most likely during the creation of the image: python-psutil is needed for OpenStack Heat, so I assume that these RHEL 7 images where once created via OpenStack and then used as the default image in this demo environment.

And after the initial creation of the image the Heat packages were forgotten. In the meantime the image was updated to newer RHEL versions, snapshots were created as new defaults and so on. But since the package in question was never part of the main RHEL repos, it was never changed or removed. It just stayed there. Waiting, apparently, for me 😉

Conclusion

This issue showed me how tricky cloud images can be. Think about your own cloud images: have you really checked all all of them and verified that no package, no start up script, no configuration was changed from the Linux distribution vendor’s base setup?

With RPMs this is still manageable, you can track if packages are installed which are not present in the existing channels. But did someone install something with pip? Or any other way?

Take my case: an outdated version of a library was called instead of a much, much more recent one. If there would have been a serious security issue with the library in the meantime, I would have been exposed although my update management did not report any library to be updated.

I learned my lesson to be more critical with cloud images, checking them in more detail in the future to avoid having nasty surprises during production. And I can just recommend that you do that as well.