Relief Web Review/Image extraction
From CrisisCommons Wiki
As part of the Relief Web Review we are prototyping ideas around mobile access with more lightweight bandwidth requirements. We figured out a process for converting PDFs to tilesets and presenting them with JavaScript.
>>> See demo tilesets : http://harrywood.co.uk/maps/haiti/tiledpdf/ <<<
Contents |
Hosted PDF viewing solutions
Another way of thinking about the problem:
Presumably the google docs folk are pretty experienced with making PDFs accessible to mobile devices. Should reliefweb just share all their PDFs on google docs??
PDF to Image conversion
Looking at "Some PDF Image Extract" tool for this.
Solution: The best approach at the moment looks to be to convert the pdf into an image file. Tested several options of which using ImageMagick to convert the pdf into a mid resolution (150 dpi) png or jpg seem to be the most workable option.
Example maps:
A command like this
convert -density 600 original.pdf -scale 5000x5000 bigimage.png
...produces an image 5000pixel along the maximum dimension. The 'density' (docs) is the DPI resolution stored internally whilst imagemagick encodes the PDF. There's no harm in cranking it up quite high. It does not effect the dimensions of the output image, just how good it looks.
Image tiling tools
Harry Wood worked on this 6 February
Using googletilecutter seemed to work (GPL licensed) Another tool is gmaps-tiler.py which we didn't use in the end.
The idea is to chop a massive hi-res image (output from PDF to image conversion) into tiles at multiple zoom levels, and present it using javascript. The obvious library for presenting would be the google maps API (or OpenLayers/other map libarary alternatives). Note. Doesn't necessarily matter if tiles are not geo-calibrated, but we should be clear that this is what we've done, and avoid any potential confusion e.g with bogus lat/lon permalinks.
Successfully ran googletilecutter and used Web Maps Lite and OpenLayers to display it. The results can be seen at the link above
Problems and ideas for further work
- It's slow. Both Andrew and Harry have tried running the process over lots of PDFs. Harry is ran it on the full set over several days! Main slowdown is running pngcrush on each tile. Andrew suggests trying to use advpng instead of pngcrush.
- One PDF page results in about 15Mb of tile images (on average, depending on complexity. That's with pngcrush doing it's slow thing) Running it over all the Haiti PDFs we downloaded, the total size is 570Mb. Not too bad, but not insignificant either. It's worth noting that higher zoom levels use a lot more space. If we dropped zoom level 5 from the tilesets, that would almost quarter the number of tile files... but then less zooming.
- Currently limited to just the first page on multi-page PDFs (this is probably fixable with some string manipulations in the bash scripts)
- OpenLayers isn't quite configured right in the Layer definition. We're seeing tile weirdness appearing below the map at higher zoomlevels.
- Problems with updating with new PDFs on releifweb and syncing with their changes/updates. We could explore ways of webbifying the process to allow on-demand tile generating for a particular PDF, or taking in the RSS and generating tiles for new PDFs appearing on there (we already have a scraper, so joining the two bits together on a server somewhere maybe) Ultimately hosting a duplication of reliefweb's maps might be frowned upon or deemed counter-productive, but all of these things could be hosted by releifweb themselves (view it as a prototype to that)
- Openlayers relies on javascript to serve up the map tiles and there are tile loading issues when accessing the site with mobile phone web browsers that do not have javascript. This is a serious drawback as a majority of phones with web browsers are Symbian based or Blackberry based which do not have javascript capability. A possible solution may be to redirect users based on their phone's capability. This article on detecting javascript capability of mobile web browsers might be of use.
Any suggestions. Edit the page!

