Dark Moon Velvet

Posts Tagged ‘Sourcecode

Nothing special, just a ruby script.

We start with something such as this:

# is script executed?
if __FILE__ == $0

end

This is our main program (sort of). We basically check if the script was executed or included by checking if the current file is equal to the initially executed file.

Now what I want to do is read some files passed as arguments and process them in some way then write back a processed file, so:

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
  end
end

We have each argument in as arg.

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
    begin

    rescue Exception => e
      puts "Error: #{e.to_s}"
    end
  end
end

We’re going to read from some files, might as well add some exception handling (nothing fancy).

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
    begin
      path = Dir.getwd
      puts "Processing file: #{path + "/" + arg}"
      html = HtmScarab.new(path + "/" + arg)
      File.open(path + "/" + "se-" + arg, "w") do |file|
        file.puts html.eval
      end
    rescue Exception => e
      puts "Error: #{e.to_s}"
    end
  end
end

We read the file, process it and then write it back with a “se-” prefix. The file is closed for us at the end of the File/end block (good ol’ ruby).

Above this code we need to create the business end of the script. So we add the following class:

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

end

We have a constructor that accepts a file and reads it into a field called html. This is what we wanted to use looking at what we wrote in the main part of the script so this is what we need to write:

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

  def clean
    regex = /\n|\r/mi
    @html.gsub! regex, ' '
    regex = /\s\s*/mi
    @html.gsub! regex, ' '
  end

end

We had a cleaning function to remove spaces and newlines as they might get in the way.

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

  def clean
    regex = /\n|\r/mi
    @html.gsub! regex, ' '
    regex = /\s\s*/mi
    @html.gsub! regex, ' '
  end

  # the heart of the operation!
  def eval
    if (@html)
      list = []
      # the following are not semantic or are unnecessary:
      list << Pair.new(/<head.*<\/head>/mi, "")
      list << Pair.new(/\s*class=\".*?\"/mi, "")
      list << Pair.new(/<\/?(div|span).*?>/mi, "")
      list << Pair.new(/<script.*?<\/script>/mi, "")
      list << Pair.new(/<style.*?<\/style>/mi, "")
      list << Pair.new(/<\?xml-stylesheet.*\?>/mi, "")
      list << Pair.new(/<!--.*?-->/mi, "")

      # what was I doing?
      list.each do |pair|
        @html.gsub! pair.regex, pair.value
      end

      clean
      # return
      @html
    end
  end

end

And of course we do some quick regex seek and destroy. It may not be great but it gets the job done… well not quite, I just invented the class Pair as I went by, because it was convinient, so time to create it with all the functions we need:

class Pair
  attr_accessor :regex, :value

  def initialize regex, value
    @regex = regex
    @value = value
  end

end

The point

You might be tempted to add more methods and so on to either the Pair or Scarab class. Don’t! It’s a waste of time, and effort, even if they look incomplete as they are; overengineering (anything) will only eventually cause it to be unnecessary complicated and eventually harder to understand. A lot of programers will occasionally use their “god given foresight” to create all sorts of extra functions for the future. The consequence is classes with all sorts of useless dangling bits nobody ever needs.

The incremental way I create the script in the example above is not always possible for any program; but do try to at least sketch up a prototype application and thus create the application starting from the functionality inward rather then conceiving and presuming usability and usefulness.

In the case of ruby adding useless methods when they are not needed is even more useless then other languages. Suppose we want to reuse a object of our Scarab class, we would need to add a extra method. It goes something like this:

class HtmScarab
  def set value
    @html = value
  end
end

So, I opened the class by writing class HtmScarab / end anywhere in my code, then added the new method I need. It’s simple, clean and in a way efficient.

Now I’m sure we’ve all heard this once or twice before:

Add comments to your code, as often as you can.

Unfortunately this innocent good advice is bad in practice. Well not bad as in inapplicable, but rather people seem to be a little clueless to how to actually apply it to practice.

Let us start with this mess:

package com.wordpress.sixmoon;

public class txNrm {

public static double proc (double[] in) {
double re = 0;
for (int i = 0; i < in.length; i++) { re += Math.abs(in[i]); } return re; } public static double proc (double[][] in) { double re = proc(input[0]); double c; for (int i = 1; i < in.length; i++) { c = proc(in[i]); if (re < c) re = c; } return re; } }[/sourcecode]

First of all, can this pile of junk get better if we add comments to it? The answer is, no. Computer code, for the most part, is designed to be to a certain point human readable. Do not fall under the illusion that adding more to unreadable code is going to make it more readable.

The wrong way, on the right street

Nevertheless some people try, often beginners and often a lot like this:

package com.wordpress.sixmoon;

public class txNrm {

/** method that accepts as input a vector */
public static double proc (double[] in) {
double re = 0; // return
// loop though all values
for (int i = 0; i < in.length; i++) { re += Math.abs(in[i]); } return re; } /** method that accepts as input a matrix */ public static double proc (double[][] in) { double re = proc(input[0]); // initialize double c; /* swap variable */ for (int i = 1; i < in.length; i++) { c = proc(in[i]); if (re < c) re = c; // new maximum } return re; } }[/sourcecode]

The simple lessons to learn, commenting is not what individual code snippets do, its what’s they are for (we can all see what they do). And, keep code clean to avoid having to write comments in the first place, or at all.

The alternative way

The following example minimizes comments, to the essential bits. Often thinking of comments, as commenting to a blog, article etc gives out the best results.

Keep in mind that there is not fixed formula and depending on what you are working on will vary what your comments should be useful for (in the following mathematical context makes mathematical hints important).

package com.wordpress.sixmoon;

public class TaxicabNorm {
// Mathematical class for calculating “Taxicab norm”
// Norm is also known as “Manhattan norm”

public static double calculate (double[] A) {
double norm = 0.0;
for (int i = 0; i < A.length; i++) { norm += Math.abs(A[i]); } return norm; } public static double calculate (double[][] A) { double norm = norm(A[0]); double cache; for (int i = 1; i < A.length; i++) { cache = norm(A[i]); if (norm < cache) norm = cache; } return norm; } }[/sourcecode]

As the code above shows, placing good comments makes spotting (obvious) errors easier.

End note

Remember, clear code speaks for itself, keep comments to just that, (lit.) comments. And by the way, do keep them in nice paragraph like blocks, it makes them so much easier to read, and keeps code clean as well.