PullMonkey Blog


18 Jan

Ruby PDF Reader Gem Tutorial


I've been doing a lot of work these days dealing with PDFs and for the most part I've been happy with using poppler-utils' pdftohtml. And that is great if you don't care about positioning or formatting and just care about the content. But for those of you who, like me, have run across the need to know text positioning, font size, indentation, coloring, etc., then we will have to use something more.

I had given just about every other option a chance before I finally found the pdf reader gem. But when I found pdf reader, it didn't have much documentation and it wasn't entirely clear how to get started using it and if it would work. Well, I can tell you that it will work and after playing with the examples a lot of it became much clearer. I learned a lot that would probably be useful for a few other people out there, hence this post.
Ok, well to get started, take a look at the github repository for pdf reader. You don't need to spend too much time, but just note a few places like the examples directory and the list of callbacks.

You should probably familiarize yourself with this PDF specification too - found here. It really came in handly when trying to figure out what arguments are being passed around what they represent.

Let's get started -

Step 1:  Install the gem

Yah, this is a pretty easy step, but it is required :)
sudo gem install pdf-reader

Step 2:  Find a PDF (or PDFs) to use

It would be best to have several PDFs for you to work with since the callbacks could vary depending on the PDF.
NOTE: For these examples, I'm using a really simple PDF, pdf reader could take a while on some PDFs and seem as though it is hanging but it is not, it is just chugging away right around line 283 of this file, reading each byte of your PDF.

Step 3: List the possible callbacks and their args for one of your PDFs

The point of this is to find out what methods we can write for pdf reader to call when it encounters the various parts of our PDF.
The BIG One that you will most likely use is show_text() or some form of it like show_text_with_positioning().
But, for now, THIS all depends on the PDF file you are using, so we need to find out what your PDF uses and go from there.

The easiest way to do this is to follow this example and just substitute "somefile.pdf" with the path to your pdf file.

Run it and you will see a long list of possible callbacks and their arguments. It is likely all squished together, so you can simply change the line of your code that says

puts cb

to

puts cb.inspect

and get a MUCH better look at everything.

We will start with show_text, so

grep

for show_text and see what you get. For my PDF, I have mostly show_text_with_positioning.

Step 4: Do some lookups

What are the args they are showing me for my callbacks and how do we find out?
You can do this two ways, try your luck at searching the pdf file for "show text" or "show text with positioning" and see what you get. Or you can lookup the token used to represent show_text or show_text_with_positioning.
The first way is pretty obvious, so on to the second - look in the list of callbacks I had your familiarize yourself with earlier, starting on line 172. Looking through we can find show_text and show_text_with_positioning, having Tj and TJ as their operators. Alright, now we have something to look up - "TJ". Well, I found it on page 251 of the PDF Specification from earlier. Some of descriptions for the operators will require rereading but you will get the hang of it.

Step 5: Use what we found

Now that we know how the show_text_with_positioning works and what args it brings in, we can write our code.
We need an instance of a receiver to pass to the PDF Reader. This is just a class that has methods likes show_text() of show_text_with_positioning(). Our receiver could look something like this:

Now we just need to create our receiver instance an pass our PDF file to pdf reader:

Don't forget to require the pdf reader at the top of your script like this:

require 'rubygems'
require 'pdf/reader'

Step 6: Check out the results

If we run our script, we will see all the text that uses Tj or TJ print out.

This is just the beginning and you can pick and choose any of the callbacks from that list (list of operators) and implement just about anything.

At the beginning of this post, I mentioned that I was concerned about positioning. This means I had to get very familiar with the text matrix operator (Tm), found on page 250 of the specification. It takes six arguments (a-f) all representing one thing or another and it is not very well documented. From what I can gather, the first four (a through d) are for things like scale and rotation, the last two e and f are for position on the page, where e is along the x axis and f along the y axis.

There is another text positioning operator that I saw quite often and that is move_text_position (Td operator, page 249 of the specification) that actually provides the x and y (unscaled) text space units coordinates. So if y is -1, that just means go to the next line and if y is 0, stay on the same line, -2, move down two lines, 2, move up two lines, etc. x is for indentation or horizontal spacing and represents the number of characters (spaces) to offset the text position by.

I hope this helps and a huge thanks goes to James Healy for his grand work on pdf reader.


06 Jan

POST OFC Graph as Image


I was asked recently (well sort of) to give an example of saving an image to the server. If you look at teethgrinder's example for this, you will see that he has made available an external interface to do just that - POST your graph as png raw data to your server for storage. This has many benefits such as saving the image for use in a PDF report or for printing, since we know at times it is a bit troublesome to print the embedded flash object.

I think the main problem people are having with this is the receiving of the image data post - see the upload_image method below. Also, teethgrinder's example never really says where to make the post_image() call. So I touch on both in the code below.

Here is an example of the png that is saved when I did this for the chart in the previous example:

OFC Saved Image


Well, let's just get right in to the code.

The controller contains the same code as my last post with only a few minor changes to the index method and the addition of the upload_image method.
In the controller, I have this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class TestItController < ApplicationController
  def index
    # note the user of open_flash_chart_object_from_hash instead of just open_flash_chart_object
    # this allows you to pass in the id of the div you want the the chart to be in
    # this is useful for when we need to findSWF by this id
    @graph = open_flash_chart_object_from_hash("/test_it/chart", :div_name => "my_chart")
  end

  # added to recieve the post data for the OFC png image of the OFC graph
  def upload_image
    name = "tmp_image.png" || params[:name]
    # the save_image method that is provided by the OFC swf file sends raw post data, so get to it like this
    data = request.raw_post
    File.open("#{RAILS_ROOT}/tmp/#{name}", "wb") { |f| f.write(data) } if data
    render :nothing => true
  end

  def chart
    # same code from here - http://pullmonkey.com/2010/01/05/open-flash-chart-ii-x-axis-date-and-time/ 
    ...
  end
end




So just note the use of open_flash_chart_object_from_hash() in the index method, this way we can pass in the id of the div.

In the view, I have this:

1
2
3
4
5
<%= javascript_include_tag 'swfobject.js' %>
<%= @graph %>
<%= save_as_image("http://localhost:3000/test_it/upload_image?name=tmp.png", :id => "my_chart") %>
<br/>
<%= button_to_function "Save Image", "post_image()" %>



Really the only difference from what we would normally have in our view is that I am using the save image setup method that was added to the open flash chart ruby on rails plugin in the last couple hours (as of this post). The save_image method takes some arguments, mainly the url to post the image data to and the id of the chart we setup in the controller.



04 Jan

Open Flash Chart II for Ruby on Rails - Lug Wyrm Charmer


A long time overdue, but I've managed to get everything updated to the new version of Teethgrinder's open flash chart.

I've also started tagging everything, so if you notice any problems trying to do anything from Teethgrinder's examples, then first check that you are using the latest (as of now, that is Lug Wyrm Charmer) - http://github.com/pullmonkey/open_flash_chart/tree/LugWyrmCharmer.

Make sure you are using the latest swf either from the plugin assets directory or from Teethgrinder's downloads.


30 Dec

Using Tumblr as a CMS


Thought you all might like this - http://blog.skizmo.com/post/308406755/use-tumblr-as-your-cms

It is something we sort of dreamed up and it works great as a partial CMS - very much like SimpleCMS where you can specify what exactly on the page needs to be managed by a CMS. This allows you to mix your CMS static content with your dynamic content.


14 Sep

THINning it out


Been having problems with swap space and memory on my slicehost servers.  And it is all apache's and mongrel's fault.  That used to be the cool combination and now it is an ugly, sluggish beast.  Just recently, I switched to nginx (to replace apache) and thin (to replace mongrel).  So far so good, major speed improvements and definitely memory consumption improvements.

I started out by switching everything over the nginx while keeping the mongrels alive, that was actually pretty easy.  Information was available everywhere.

Thinning everything via capistrano took a while, that wasn't as well documented.  Thin was documented, capistrano was documented, but easy solutions as to how to combine the two were difficult to find.

Here's the solution I was able to come up with -

Capistrano

My config for using mongrel used to look something like this -

set :stages, %w(staging production)
set :default_stage, "production"

require "capistrano/ext/multistage"
require "mongrel_cluster/recipes"

set :application, "myapplication.com"
set :user, "appuser"set :repository"http://svn.myapplication.com/myapp/trunk"
set :deploy_to, "/var/www/#{application}"

role :app, application
role :web, application
role :db,  application, :primary => true

set :runner, user
set :keep_releases, 3
set(:mongrel_conf) { "#{current_path}/config/mongrel_cluster.yml" }

deploy.task :after_update_code, :roles => [:web] do
desc "Copying the right mongrel cluster config for the current stage environment."
run "cp -f #{release_path}/config/mongrel_#{stage}.yml #{release_path}/config/mongrel_cluster.yml"
end

... <other things like symlinks>

Now that we are moving from mongrel to thin, no need for two lines in particular, one being the line that requires mongrel_cluster recipes and the other that sets the mongrel_cluster yaml config path.  A third line changes from mongrel_cluster.yml to thin_cluster.yml.  You get something like this:

set :stages, %w(staging production)
set :default_stage, "production"

require "capistrano/ext/multistage"

set :application, "myapplication.com"
set :user, "appuser"

set :repository"http://svn.myapplication.com/myapp/trunk"
set :deploy_to, "/var/www/#{application}"

role :app, application
role :web, application
role :db,  application, :primary => true

set :runner, user
set :keep_releases, 3

deploy.task :after_update_code, :roles => [:web] do
desc "Copying the right mongrel cluster config for the current stage environment."
run "cp -f #{release_path}/config/thin_#{stage}.yml #{release_path}/config/thin_cluster.yml"
end
... <other things like symlinks>

Now we need to implement what mongrel recipes was doing for us, start, stop and restart but in terms of thin (added this to the bottom of my deploy.rb):

namespace :deploy do
desc "Restart the Thin processes on the app server."
task :restart do
run "thin restart -C #{release_path}/config/thin_cluster.yml"
end
desc "Start the Thin processes on the app server."
task :start do
run "thin start -C #{release_path}/config/thin_cluster.yml"
end
desc "Stop the Thin processes on the app server."
task :stop do
run "thin stop -C #{release_path}/config/thin_cluster.yml"
end
end

Here's what my thin_cluster.yml looks like:

---
log: log/thin.log
address: 127.0.0.1
port: 9000
chdir: /var/www/myapp.com/current
environment: production
pid: tmp/pids/thin.pid
user: www-user
group: www-data
servers: 3

That's it and it has worked out nicely so far.


30 Apr

Open Flash Chart II - fully automated


Just as an attention grabber - we are going after this example in this article:

Keeping up

Ok, seeing that the php versions of open flash chart and open flash chart swf files continually change along with with the API (not saying this is a bad thing), I wanted to come up with an even more abstract solution. The goal is to not have to worry when the swf file is released with the latest set of graphs or changes its API. I simply don't want to worry about this method or that method, or this class or that class.

Feedback

This article will sort of act as a tutorial for those interested in metaprogramming and as a set of instructions for those looking to experiment with the latest version of the OFC II Rails Plugin that I am currently toying with. I would like to hear feedback, but just remember that phase 1 of this release will be very basic, meaning none of the ajaxy stuff. It will come, just not yet.

Let's see what we can get away with

I am already using method_missing() for pretty much everything in the OFC II Rails Plugin that is being used now. But every time new classes are added, I have to sit down and basically convert the php class to ruby - just plain tedious, not really what I had planned when I started all this. Ok, so method_missing() was great, but let me introduce (or possibly reintroduce) you to const_missing(), basically method_missing() but instead of methods, we can create classes or modules or other objects on the fly. This will definitely help when the php version gets a new class. Instead of getting hounded to update the rails version to be 100% like the php version, everything will just work, no updates to code required. Well, we hope ! So check this out:



Here is what we did with method_missing():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

module OFC
  class Base
    def method_missing(method_name, *args, &blk)
      case method_name.to_s
      when /(.*)=/   # i.e., if it is something x_legend=
        # if the user wants to set an instance variable then let them
        # the other args (args[0]) are ignored since it is a set method
        self.instance_variable_set("@#{$1}", args[0])
      when /^set_(.*)/
        # backwards compatible ... the user can still use the same set_y_legend methods if they want
        self.instance_variable_set("@#{$1}", args[0])
      else
          if inst = self.instance_variable_get("@#{method_name}")
            inst
          else
            # if the method/attribute is missing and it is not a set method then hmmmm better let the user know
            super
          end
      end
    end
  end
end

This just basically allows me to do this:

1
2
3
4
5
6
7
8
9
10
11
12
13

  class Foo < OFC::Base
  end

  foo = Foo.new

  foo.some_random_attribute = "Hello"  #=> "Hello"
  foo.some_random_attribute  #=> "Hello"
  foo.some_random_undefined_attribute  #=> Method Missing error (calls super)

  # too be like php, for easier conversion
  foo.set_some_random_attribute("Good Bye")  #=> "Good Bye"
  foo.some_random_attribute  #=> "Good Bye"

Along the same lines, I have created an initialize method that takes any argument hash of variable/value pairs and calls variable=() which is handled by method missing as we saw above:

1
2
3
4
5
6
7
8

  class Foo < OFC::Base
  end

  foo = Foo.new(:x_axis => 5, :min => 10, :max => 90, :steps => 5, :elements => ["one", "two"])
  
  foo.x_axis #=> 5
  foo.min #=> 10

Ok, so on to const_missing() and what we can do with that:

1
2
3
4
5
6

  def OFC.const_missing(const)
    klass = Class.new OFC::Base
    Object.const_set const, klass
    return klass
  end

This says that any undefined (missing) constant of OFC should be defined as a new class that inherits from OFC::Base.

So when we say OFC::Foo, that has not been defined, so we will get back class OFC::Foo < OFC::Base;end; which will give us the initialize() method and method_missing() method from above. Let's see how this works:

1
2
3
4
5
6
7
8
9
10
11

  line = OFC::Line.new(:values => [1,2,3,nil,nil,5,6,7])
  line.values #=> [1,2,3,nil,nil,5,6,7]
  line.some_random_variable = "Hello" #=> "Hello"
  line.some_random_variable #=> "Hello"

  stacked_bar_chart = OFC::BarStack.new
  stacked_bar_chart.values = []
  stacked_bar_chart.values << [2,3,4]
  stacked_bar_chart.values << [5, {"val" => 5, "colour" => "#ff0000"}]
  stacked_bar_chart.keys = [{ "colour" => "#C4D318", "text" => "Kiting", "font-size" => 13 } ...]

So it all sort of came together right there. I've shown you all the code that comes with the Rails Open Flash Chart plugin now. No more definining idividual classes, no more trying to keep up with the never ending php version, and no more late nights converting php to ruby (!). About dang time.



Ok, but this is just the beginning, nothing has been set in stone, so like I said, give me your feedback, what works for you and what does not. And, hopefully, I will have solutions for you or you for me.


Example with new version (test version)

I am using rails 2.3.2, but I don't think it will matter what version you are using.

Create your new rails project

1
2
3
4
5

# create a new rails project 
> pullmonkey$ rails testing_it
#<Bunch of stuff is created ....>
> pullmonkey$ cd testing_it/

Install the plugin from the test branch

Note the -r test in this next step. The new version (test version) I am playing with is under the test branch and -r says what branch to pull from.

Also, you can use git:// instead of http:// below, but depending on your firewall restrictions http:// will probably work out best for you.

1
2
3

> pullmonkey$ ./script/plugin install http://github.com/pullmonkey/open_flash_chart.git -r test
# <Bunch more stuff ...>

Create a controller to play in

1
2
3

> pullmonkey$ ./script/generate controller test_it
# <And more stuff >

Get our assets

1
2
3
4
5
6
7

# first we will get swfobject.js
> pullmonkey$ cp vendor/plugins/open_flash_chart/assets/javascripts/swfobject.js public/javascripts/
# next the open flash chart swf (GET whatever is the latest version), right now that is here: http://teethgrinder.co.uk/open-flash-chart-2/open-flash-chart.swf
> pullmonkey$ cd public/
> pullmonkey$ wget http://teethgrinder.co.uk/open-flash-chart-2/open-flash-chart.swf
> pullmonkey$ cd ..

Edit our controller

Notice here that I just include one of the many examples from the plugin's examples directory. Definitely more to follow.

One thing you will notice about the examples, is that the php code is in the comments, so you can see how I would convert from the php examples to ruby. Please feel free to add your own examples, just fork the project.

1
2
3
4
5
6
7
8
9
10

> pullmonkey$ vi app/controllers/test_it_controller.rb
# mine looks like this:
class TestItController < ApplicationController
  include OFC::Examples::AreaHollow

  def index
    @graph = open_flash_chart_object(600,300, "/test_it/area_hollow")
  end
end

Edit our view

1
2
3
4
5

> pullmonkey$ vi app/views/test_it/index.html.erb
# mine looks like this:
<%= javascript_include_tag 'swfobject' %>
<%= @graph %>

Start 'er up

1
2
3
4
5

> pullmonkey$ ./script/server

# browse to the test_it index
http://localhost:3000/test_it

Our example


30 Apr

Spreadsheet Gem - data may have been lost


I've been using the spreadsheet gem lately for a couple projects I am working on to modify existing spreadsheets. I have quite often stumbled upon this error when opening modified spreadsheets in excel:


File error: data may have been lost

Like most microsoft errors, it was useless and the spreadsheet came up just fine. But that error was just so annoying, other spreadsheet applications (open office, excel on mac) opened without any problems. So after quite a bit of hacking and digging around, I finally tried setting the encoding, which defaults to UTF-8. Well it just so happens that the spreadsheet being modified was encoded with UTF-16LE.



So part one of my solution became this:

1
2

Spreadsheet.client_encoding = 'UTF-16LE'

Then doing a little more digging I decided that this would be a better long-term solution:

1
2
3

book = Spreadsheet.open spreadsheet_file
Spreadsheet.client_encoding = book.encoding

Well, hopefully it wasn't just me and someone will be able to save a bit of time with this.



17 Apr

Can you read this?


Got an email today, I have seen it before and I am sure it has been going around for years. This time, I thought that I would do an exercise and create a plugin that duplicates what I found in this email. See for yourself.



Here is the email I got:






What I gathered was that the only important letters are the first and last letter of each word, those have to be in the right order. So the rest of the letters can be in any random order. That is what I did - I created a plugin and put it out on github. You can install it like this:

./script/plugin install http://github.com/pullmonkey/can_you_read_this.git

And use it like this:

#in your views <%= can_you_read_this("hello, can you read this?") %>

Have fun.


12 Jan

Open Flash Chart II - OFC Object Creators


Thought I would try and make things a little more flexible. In doing so, two new OFC Object creators came to life. You all may recall the very basic:

open_flash_chart_object(600,300,'/test_it/graph_test')


And maybe not, well either way, I am going to describe its functionality here plus the functionality of the two new object creators.

open_flash_chart_object()

Usage:

This method returns only the graph html:

@graph = open_flash_chart_object(....)

Arguments

  • width (required)
  • height (required)
  • url (required)
  • use_swfobject (optional and defaults to true)
  • base (optional and defaults to "/")
  • swf_file_name (optional and defaults to "open-flash-chart.swf")

open_flash_chart_object_and_div_name()

Usage:

This method will return, not only the html for the graph but also the div_name for use with javascript manipulation:

@graph, @div_name = open_flash_chart_object_and_div_name(...)

Arguments

  • width (required)
  • height (required)
  • url (required)
  • use_swfobject (optional and defaults to true)
  • base (optional and defaults to "/")
  • swf_file_name (optional and defaults to "open-flash-chart.swf")

open_flash_chart_object_from_hash()

Usage:

This method will return the graph html, but gives you absolute control over quite a few things, most importantly div_name.

@graph = open_flash_chart_object_from_hash(...)

Additional Usage:

@graph = open_flash_chart_object_from_hash("/test_it/graph_code", :div_name => 'my_div_name', :width => 600)
@graph = open_flash_chart_object_from_hash("/test_it/graph_code", :base => '/projects', :height => 600)

Arguments

  • url (required)
  • options (optional)
    • div_name (defaults to "flash_content_[random string]")
    • base (defaults to "/")
    • swf_file_name (defaults to "open-flash-chart.swf")
    • width (defaults to 550)
    • height (defaults to 300)
    • protocol (defaults to "http")
    • obj_id (defaults to "chart_[random string]")



Well, there it is, good luck and have fun.



08 Jan

Open Flash Chart II - Bar Graphs with on-click


Building on line graph clicking, thanks to the support of a few other people (mentioned throughout the article) we now have bar graph clicking as well. The only down side (if you want to call it that) is that it is experimental in the sense that the open flash chart swf object had to be updated, and the update is not part of the official OFC release (at least not at the time of this writing). No big deal though, just be aware. It is however part of the OFC rails plugin release.

Big thanks goes to Eric for his work on the action script for the bar clicking open-flash-chart swf file - see this forum entry for more details.

Obvious thanks also goes to monk.e.boy.


Ok, so two things to note for this to work:

  1. Pull the latest from github and make sure to get Eric's swf file (under the assets directory - open-flash-chart-bar-clicking.swf ) and place it under RAILS_ROOT/public
  2. The call to open_flash_chart_object() has changed to accept an optional parameter for the swf file name. I am leaving the original for use as open-flash-chart.swf (which is the default for the swf_file_name param) and added Eric's as open-flash-chart-bar-clicking.swf. See the example below for usage.



The changes that were made can be found here.




Here is the graph we are after in this example (click the bars to see what happens):






More Open Flash Chart II examples.




And here is the code (the controller):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

class TestItController < ApplicationController
  def index
    @graph = open_flash_chart_object(600,300,"/test_it/graph_code", true, "/", "open-flash-chart-bar-clicking.swf")
  end

  def graph_code
    title = Title.new("Bar on-click Example")
    bar = BarGlass.new
    # NOTE ... the next two lines are if you want each bar to have a different response when clicked
    bar_values = (1..9).to_a.map{|x| bv = BarValue.new(x); bv.on_click = "alert('hello, my value is #{x}')"; bv}
    bar.set_values(bar_values)
    # if you want a more generic response across all bars, then the following lines would do:
    # bar.on_click = "alert('hello there')"
    # bar.set_values((1..9).to_a)
    chart = OpenFlashChart.new
    chart.set_title(title)
    chart.add_element(bar)
    render :text => chart.to_s
  end
end



And in your view (index.html.erb):

1
2
3
4

<script type="text/javascript" src="/javascripts/swfobject.js"></script>
<%= @graph %>





Good Luck!