PullMonkey Blog


18 Jan

Ruby PDF Reader Gem Tutorial


I've been doing a lot of work these days dealing with PDFs and for the most part I've been happy with using poppler-utils' pdftohtml. And that is great if you don't care about positioning or formatting and just care about the content. But for those of you who, like me, have run across the need to know text positioning, font size, indentation, coloring, etc., then we will have to use something more.

I had given just about every other option a chance before I finally found the pdf reader gem. But when I found pdf reader, it didn't have much documentation and it wasn't entirely clear how to get started using it and if it would work. Well, I can tell you that it will work and after playing with the examples a lot of it became much clearer. I learned a lot that would probably be useful for a few other people out there, hence this post.
Ok, well to get started, take a look at the github repository for pdf reader. You don't need to spend too much time, but just note a few places like the examples directory and the list of callbacks.

You should probably familiarize yourself with this PDF specification too - found here. It really came in handly when trying to figure out what arguments are being passed around what they represent.

Let's get started -

Step 1:  Install the gem

Yah, this is a pretty easy step, but it is required 🙂
sudo gem install pdf-reader

Step 2:  Find a PDF (or PDFs) to use

It would be best to have several PDFs for you to work with since the callbacks could vary depending on the PDF.
NOTE: For these examples, I'm using a really simple PDF, pdf reader could take a while on some PDFs and seem as though it is hanging but it is not, it is just chugging away right around line 283 of this file, reading each byte of your PDF.

Step 3: List the possible callbacks and their args for one of your PDFs

The point of this is to find out what methods we can write for pdf reader to call when it encounters the various parts of our PDF.
The BIG One that you will most likely use is show_text() or some form of it like show_text_with_positioning().
But, for now, THIS all depends on the PDF file you are using, so we need to find out what your PDF uses and go from there.

The easiest way to do this is to follow this example and just substitute "somefile.pdf" with the path to your pdf file.

Run it and you will see a long list of possible callbacks and their arguments. It is likely all squished together, so you can simply change the line of your code that says puts cb to puts cb.inspect and get a MUCH better look at everything.

We will start with show_text, so grep for show_text and see what you get. For my PDF, I have mostly show_text_with_positioning.

Step 4: Do some lookups

What are the args they are showing me for my callbacks and how do we find out?
You can do this two ways, try your luck at searching the pdf file for "show text" or "show text with positioning" and see what you get. Or you can lookup the token used to represent show_text or show_text_with_positioning.
The first way is pretty obvious, so on to the second - look in the list of callbacks I had your familiarize yourself with earlier, starting on line 172. Looking through we can find show_text and show_text_with_positioning, having Tj and TJ as their operators. Alright, now we have something to look up - "TJ". Well, I found it on page 251 of the PDF Specification from earlier. Some of descriptions for the operators will require rereading but you will get the hang of it.

Step 5: Use what we found

Now that we know how the show_text_with_positioning works and what args it brings in, we can write our code.
We need an instance of a receiver to pass to the PDF Reader. This is just a class that has methods likes show_text() of show_text_with_positioning(). Our receiver could look something like this:

Now we just need to create our receiver instance an pass our PDF file to pdf reader:

Don't forget to require the pdf reader at the top of your script like this:
require 'rubygems'
require 'pdf/reader'

Step 6: Check out the results

If we run our script, we will see all the text that uses Tj or TJ print out.

This is just the beginning and you can pick and choose any of the callbacks from that list (list of operators) and implement just about anything.

At the beginning of this post, I mentioned that I was concerned about positioning. This means I had to get very familiar with the text matrix operator (Tm), found on page 250 of the specification. It takes six arguments (a-f) all representing one thing or another and it is not very well documented. From what I can gather, the first four (a through d) are for things like scale and rotation, the last two e and f are for position on the page, where e is along the x axis and f along the y axis.

There is another text positioning operator that I saw quite often and that is move_text_position (Td operator, page 249 of the specification) that actually provides the x and y (unscaled) text space units coordinates. So if y is -1, that just means go to the next line and if y is 0, stay on the same line, -2, move down two lines, 2, move up two lines, etc. x is for indentation or horizontal spacing and represents the number of characters (spaces) to offset the text position by.

I hope this helps and a huge thanks goes to James Healy for his grand work on pdf reader.


06 Jan

POST OFC Graph as Image


I was asked recently (well sort of) to give an example of saving an image to the server. If you look at teethgrinder's example for this, you will see that he has made available an external interface to do just that - POST your graph as png raw data to your server for storage. This has many benefits such as saving the image for use in a PDF report or for printing, since we know at times it is a bit troublesome to print the embedded flash object.
I think the main problem people are having with this is the receiving of the image data post - see the upload_image method below. Also, teethgrinder's example never really says where to make the post_image() call. So I touch on both in the code below.
Here is an example of the png that is saved when I did this for the chart in the previous example:
OFC Saved Image

Well, let's just get right in to the code.
The controller contains the same code as my last post with only a few minor changes to the index method and the addition of the upload_image method.
In the controller, I have this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class TestItController < ApplicationController
  def index
    # note the user of open_flash_chart_object_from_hash instead of just open_flash_chart_object
    # this allows you to pass in the id of the div you want the the chart to be in
    # this is useful for when we need to findSWF by this id
    @graph = open_flash_chart_object_from_hash("/test_it/chart", :div_name => "my_chart")
  end

  # added to recieve the post data for the OFC png image of the OFC graph
  def upload_image
    name = "tmp_image.png" || params[:name]
    # the save_image method that is provided by the OFC swf file sends raw post data, so get to it like this
    data = request.raw_post
    File.open("#{RAILS_ROOT}/tmp/#{name}", "wb") { |f| f.write(data) } if data
    render :nothing => true
  end

  def chart
    # same code from here - http://pullmonkey.com/2010/01/05/open-flash-chart-ii-x-axis-date-and-time/ 
    ...
  end
end

So just note the use of open_flash_chart_object_from_hash() in the index method, this way we can pass in the id of the div.
In the view, I have this:

1
2
3
4
5
<%= javascript_include_tag 'swfobject.js' %>
<%= @graph %>
<%= save_as_image("http://localhost:3000/test_it/upload_image?name=tmp.png", :id => "my_chart") %>
<br/>
<%= button_to_function "Save Image", "post_image()" %>

Really the only difference from what we would normally have in our view is that I am using the save image setup method that was added to the open flash chart ruby on rails plugin in the last couple hours (as of this post). The save_image method takes some arguments, mainly the url to post the image data to and the id of the chart we setup in the controller.


05 Jan

Open Flash Chart II – X Axis Date and Time


I was asked how to display date and time for the x axis as seen in this teethgrinder example - So here it goes.

Here is the graph we are after in this example:

More Open Flash Chart II examples.

And here is the code (the controller):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

class TestItController < ApplicationController
  def index
    @graph = open_flash_chart_object(550,300,"/test_it/chart")
  end
  def chart
    data1 = []
    data2 = []
    year = Time.now.year

    31.times do |i|
      x = "#{year}-1-#{i+1}".to_time.to_i
      y = (Math.sin(i+1) * 2.5) + 10

      data1 << ScatterValue.new(x,y)
      data2 << (Math.cos(i+1) * 1.9) + 4
    end

    dot = HollowDot.new
    dot.size = 3
    dot.halo_size = 2
    dot.tooltip = "#date:d M y#<br>Value: #val#"

    line = ScatterLine.new("#DB1750", 3)
    line.values = data1
    line.default_dot_style = dot

    x = XAxis.new
    x.set_range("#{year}-1-1".to_time.to_i, "#{year}-1-31".to_time.to_i)
    x.steps = 86400

    labels = XAxisLabels.new
    labels.text = "#date: l jS, M Y#"
    labels.steps = 86400
    labels.visible_steps = 2
    labels.rotate = 90

    x.labels = labels

    y = YAxis.new
    y.set_range(0,15,5)

    chart = OpenFlashChart.new
    title = Title.new(data2.size)

    chart.title = title
    chart.add_element(line)
    chart.x_axis = x
    chart.y_axis = y

    render :text => chart, :layout => false
  end
end

And in your view (index.html.erb):

1
2
3
4

<script type="text/javascript" src="/javascripts/swfobject.js"></script>
<%= @graph %>

Good Luck!


04 Jan

Open Flash Chart II for Ruby on Rails – Lug Wyrm Charmer


A long time overdue, but I've managed to get everything updated to the new version of Teethgrinder's open flash chart.

I've also started tagging everything, so if you notice any problems trying to do anything from Teethgrinder's examples, then first check that you are using the latest (as of now, that is Lug Wyrm Charmer) - http://github.com/pullmonkey/open_flash_chart/tree/LugWyrmCharmer.

Make sure you are using the latest swf either from the plugin assets directory or from Teethgrinder's downloads.