Trigger Flexpaper Search and Jump To First Result on Page Load
I’m doing some interesting work for a client at the moment. The core feature is displaying PDFs using the Flexpaper PDF viewer.
Full text and advanced search
Outside of Flexpaper, the application features both full text and advanced search engines using Thinking Sphinx and Ransack.
PDF::Reader
To facilitate our search, I’m processing an uploaded PDF into text which I can store in the database. To do this I’m using the PDF::Reader gem.
pdf2json
To facilitate search within Flexpaper, I’m converting the uploaded PDF into JSON using pdf2json.
Connecting application search to Flexpaper’s internal search
I’m able to hook our own advanced and full text search engine results up to Flexpaper by storing a list of pages and their page numbers in our database. I can then, for example, do a Thinking Sphinx search for book pages containing, for example, the term ‘Mackenzie King’.
When the user clicks on a search result, they are redirected to a book page with a query parameter containing their search query. We can then use that search query, along with the page number where the first hit occurs, to trigger Flexpaper’s internal search engine on page load, and then jump the user to the first hit within the book.
Obstacles
I ran into some issues along the way.
Initially, I used a combination of Flexpaper’s startAtPage configuration parameter, with its searchText method and onDocumentLoaded event. This was not working as planned for some reason – Flexpaper was showing duplicate entries for the first result (and possibly others).
I eventually arrived at a workaround: I dropped the startAtPage parameter, and am now starting at the first page of the book.
I then call Flexpaper’s searchText method to bring up the search results, followed immediately by another call to Flexpaper’s goToPage method to bring the user to the page that I know contains the first hit.
Another issue that came up involved what seemed to be trouble with Flexpaper’s onDocumentLoaded event. We were getting strange results, perhaps it is firing before the document is genuinely ready due to file load time, etc.
I was able to get around this by delaying the call to Flexpaper’s searchText method using a javascript setTimeout of 1500 milliseconds – I have fiddled with various times to get what seemed to be a workable time, you may find that this is not neccessary or can be pushed up or down for best effect.
Various tools used:
- pdftk to split the PDF into single page PDFs
- pdf2json to process the PDF into JSON which can be used by Flexpaper’s internal search engine.
- The PDF::Reader gem to process the PDF into text which we can store in our database for indexing by Thinking Sphinx, or for advanced searches using the Ransack gem
A Solution
Here’s an example – I’ve removed all of the Flexpaper config to highlight the salient details:
I’d love to hear your thoughts on Flexpaper and PDF viewers in the comments below!