Dark Moon Velvet

A simple ruby script

Posted on: April 19, 2009

Nothing special, just a ruby script.

We start with something such as this:

# is script executed?
if __FILE__ == $0

end

This is our main program (sort of). We basically check if the script was executed or included by checking if the current file is equal to the initially executed file.

Now what I want to do is read some files passed as arguments and process them in some way then write back a processed file, so:

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
  end
end

We have each argument in as arg.

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
    begin

    rescue Exception => e
      puts "Error: #{e.to_s}"
    end
  end
end

We’re going to read from some files, might as well add some exception handling (nothing fancy).

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
    begin
      path = Dir.getwd
      puts "Processing file: #{path + "/" + arg}"
      html = HtmScarab.new(path + "/" + arg)
      File.open(path + "/" + "se-" + arg, "w") do |file|
        file.puts html.eval
      end
    rescue Exception => e
      puts "Error: #{e.to_s}"
    end
  end
end

We read the file, process it and then write it back with a “se-” prefix. The file is closed for us at the end of the File/end block (good ol’ ruby).

Above this code we need to create the business end of the script. So we add the following class:

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

end

We have a constructor that accepts a file and reads it into a field called html. This is what we wanted to use looking at what we wrote in the main part of the script so this is what we need to write:

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

  def clean
    regex = /\n|\r/mi
    @html.gsub! regex, ' '
    regex = /\s\s*/mi
    @html.gsub! regex, ' '
  end

end

We had a cleaning function to remove spaces and newlines as they might get in the way.

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

  def clean
    regex = /\n|\r/mi
    @html.gsub! regex, ' '
    regex = /\s\s*/mi
    @html.gsub! regex, ' '
  end

  # the heart of the operation!
  def eval
    if (@html)
      list = []
      # the following are not semantic or are unnecessary:
      list << Pair.new(/<head.*<\/head>/mi, "")
      list << Pair.new(/\s*class=\".*?\"/mi, "")
      list << Pair.new(/<\/?(div|span).*?>/mi, "")
      list << Pair.new(/<script.*?<\/script>/mi, "")
      list << Pair.new(/<style.*?<\/style>/mi, "")
      list << Pair.new(/<\?xml-stylesheet.*\?>/mi, "")
      list << Pair.new(/<!--.*?-->/mi, "")

      # what was I doing?
      list.each do |pair|
        @html.gsub! pair.regex, pair.value
      end

      clean
      # return
      @html
    end
  end

end

And of course we do some quick regex seek and destroy. It may not be great but it gets the job done… well not quite, I just invented the class Pair as I went by, because it was convinient, so time to create it with all the functions we need:

class Pair
  attr_accessor :regex, :value

  def initialize regex, value
    @regex = regex
    @value = value
  end

end

The point

You might be tempted to add more methods and so on to either the Pair or Scarab class. Don’t! It’s a waste of time, and effort, even if they look incomplete as they are; overengineering (anything) will only eventually cause it to be unnecessary complicated and eventually harder to understand. A lot of programers will occasionally use their “god given foresight” to create all sorts of extra functions for the future. The consequence is classes with all sorts of useless dangling bits nobody ever needs.

The incremental way I create the script in the example above is not always possible for any program; but do try to at least sketch up a prototype application and thus create the application starting from the functionality inward rather then conceiving and presuming usability and usefulness.

In the case of ruby adding useless methods when they are not needed is even more useless then other languages. Suppose we want to reuse a object of our Scarab class, we would need to add a extra method. It goes something like this:

class HtmScarab
  def set value
    @html = value
  end
end

So, I opened the class by writing class HtmScarab / end anywhere in my code, then added the new method I need. It’s simple, clean and in a way efficient.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: