3 Years of ZFS

2020-03-10

3 years and change ago, I decided to setup a FreeNAS box with a ZFS mirror to store long term data. The OS changed to Debian when I decided to run a few web apps, but I kept using ZFS for the mirror. The main point of ZFS is that it checks the data periodically and can repair as needed. After over 3 years of running it found its first data error with a few checksums being wrong. This wasn't surprising since the drive had uncorrectable sectors for the last year. At the same time as the ZFS checksum errors, the first reallocated sectors showed up. I went ahead and ordered a new drive since Backblaze and Google correlate reallocated sectors to a failing drive.

I'm a big fan of keeping it simple so I reevaluated whether ZFS is the right choice vs the standard mdadm+lvm+ext4 setup. My use case is a mirror(RAID 1) that's used as a media server and backup of my desktop ssd. ZFS's checksum protection did nothing for me since SMART data already told me the drive should be monitored closely and replaced when reallocated sectors happened or uncorrectlable sectors became worrying. ZFS's snapshots were turned on and I used them once to recover a file I had accidently deleted. This feature was unsettling to use since it rollsback the entire filesystem to a previous date in time. You have to be sure that you get back to a state where the data you want is correct and the rest of the filesystem is acceptable. This gets a bit sketchy when you have thousands of files on a 4TB drive. It's also not a free feature as many people portray it. Each snapshot withholds changed blocks as long as the snapshot exists. This meant I had a few hundred GBs being used on top of the current data. Compression is currently sitting at 0% since the majority of files aren't compressible. A minor pain point on Linux is that every ZFS update means recompiling the ZFS modules so updates take a good deal longer and will need a couple cores dedicated to that.

In the end, I'm not convinced a user without some serious hardware should use ZFS. It's got shiny features, but they don't seem to come into play unless you have a targeted use case or by some very, extremely, ridiculously unlikely event, manage to get an error that isn't caught by ECC, SMART etc, but is caught by ZFS. You might say those features seem like they could be nice, why not use ZFS in case I want them later? ZFS lags behind common file systems in one critical area, performance. In the end, you'd be giving up daily performance for that possibility. If I were to do it again, I would go with mdadm+lvm+ext4.


Website Switched To VPS

2019-12-20

I finished moving the site from NearlyFreeSpeech over to a Debian VPS at DigitalOcean today. I wanted a low maintenance OS so Debain with unattended upgrades installed to automatically handle security updates was the obvious choice.The stack is still python with apache, but I switched the database to Postgresql and mod_wsgi is running python instead of gunicorn. Postgresql is what I've been using in other projects so standardizing on it made sense. All my database interactions were written using the Django ORM so switching over was just changing couple of settings in Django to point to postgresql. Mod_wsgi is easier to properly setup than gunicorn, the logging integrates with apache and it has great performance out of the box. Certbot takes care of keeping the Lets Encrypt cert up to date. debian logo


Add Open Address Data To MapsMe

2019-12-01

Building maps for Maps.Me requires the same data as my previous posts with OSMAnd. See those posts for processing the initial data. For Maps.Me the process is more complicated since it uses C++ and adds its own data to the OSM data. The instructions are pretty good so I'll summarize the issues I had. Lessons learned:

  • Data ids must be positive, sorted ascending by feature type and in first 48 bits
  • libsqlite3-dev required to build map generator
  • Use map generator for the release branch you are targeting ie release-94 for 9.4.X
  • Set debug=0 in .ini file
  • Download subway file once, then change config to point to it. Saves 10+ minutes
  • md5 file is required
  • Street names must be very close to addr:street or addresses will be rewritten or ignored for search. See https://github.com/mapsme/omim/issues/12121
  • Coastal states built without coastlines won't render oceans at high zooms

Add Open Address Data To OSMAND Part 2

2019-09-24

After you add your first county things become easier since you probably only need to change the translation file to grab the column names that the county used. I've been doing some manual quality assurance(QA) by opening the osm file generated by ogr2osm using Josm and Nano to spot check that the translation works as expected. Then you'll add the new osm file to your osmium merge command. QA the output of that using osmium diff -qs newfile.osm.pbf oldfile.osm.pbf. This will show the number of objects that are different from the last time you performed your merge, the left number should roughly correspond to the number of addresses you've added, right 0 since you're not deleting. My WA file is currenly up to 1.33 million addresses added from 6 counties. Osmand map creator took 44min on a 2C/4T 3.7Ghz cpu with a max of 6.3GB ram allowed. You can check out the files I'm generating for WA at WA Map Releases

Lessons Learned:

  • ids need to be unique for proper merging using Osmium, use negative numbers and set --id option for ogr2osm to last line number of previous file

More Investigation Needed:

  • how osmium merge deduplication works.
    • number of objects in OSM files doesn't match amount of objects changed in pbf files
  • Spokane County addresses are viewable, but not searchable
    • UPDATE restarting app fixes this

Add Open Address Data To OSMAND

2019-08-31

One of the biggest pains when using OSMAND for navigation is the lack of addresses in many areas. Through some research for an OpenStreetMap(osm) import, I've learned there is open address data for the counties around me that hasn't been imported due to the amount of work it takes. As a stopgap measure, I decided to find a way to load that open address data into OSMAND. The process was a bit of a pain to figure out due to OSMAND not having documentation on using non OSM extract data to create an obf file(OSMAND's map format)

  1. Install software if needed (e.g. OGR2OSM, Osmium, OSMAND Map Creator)
  2. Get address file with open license (e.g. King County Address Shapefile)
  3. Download extract for area I want to merge address data with (e.g. WA state osm extract in OSM format)
  4. Translate address file as needed (e.g. use ogr2osm to translate to osm, expand street names, you'll have to create your own translation file since each dataset can have different column names, formats etc.)
  5. Conflate address file with OSM extract as needed (e.g. trim address file to limit overlap with OSM data, I did a quick and dirty cleanup by filtering cities that had imports done)
  6. Combine osm files into pbf file with .osm.pbf extension (e.g. osmium merge)
  7. Run Osmand map creator to create map for OSMAND
  8. Upload obf file to OSMAND's data folder on your phone
    /mnt/sdcard/osmand or /Android/data/net.osmand/files

The end result is full address coverage for King Co. with the minor inconviences of some duplicate addresses and addresses with a 0 housenumber since I didn't do a full clean of the data.

Lessons Learned:

  • OSMAND must have addresses in same file as other features to build its index that makes the addresses searchable. Found on a page buried on the OSMAND site: MapAddressDataStructure
  • Map file size may stay roughly the same. The WA map went from 309 MB to 308.8 MB after adding 350,000 addresses. That's correct, it got smaller after adding data.
    • UPDATE: after looking into it, osmium merge does some deduplication that may have been dropping addresses. File sizes have been steadily increasing in newer iterations of the file

Links: