Dark Moon Velvet

Archive for the ‘Programming’ Category

Nothing special, just a ruby script.

We start with something such as this:

# is script executed?
if __FILE__ == $0

end

This is our main program (sort of). We basically check if the script was executed or included by checking if the current file is equal to the initially executed file.

Now what I want to do is read some files passed as arguments and process them in some way then write back a processed file, so:

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
  end
end

We have each argument in as arg.

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
    begin

    rescue Exception => e
      puts "Error: #{e.to_s}"
    end
  end
end

We’re going to read from some files, might as well add some exception handling (nothing fancy).

# is script executed?
if __FILE__ == $0
  ARGV.each do |arg|
    begin
      path = Dir.getwd
      puts "Processing file: #{path + "/" + arg}"
      html = HtmScarab.new(path + "/" + arg)
      File.open(path + "/" + "se-" + arg, "w") do |file|
        file.puts html.eval
      end
    rescue Exception => e
      puts "Error: #{e.to_s}"
    end
  end
end

We read the file, process it and then write it back with a “se-” prefix. The file is closed for us at the end of the File/end block (good ol’ ruby).

Above this code we need to create the business end of the script. So we add the following class:

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

end

We have a constructor that accepts a file and reads it into a field called html. This is what we wanted to use looking at what we wrote in the main part of the script so this is what we need to write:

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

  def clean
    regex = /\n|\r/mi
    @html.gsub! regex, ' '
    regex = /\s\s*/mi
    @html.gsub! regex, ' '
  end

end

We had a cleaning function to remove spaces and newlines as they might get in the way.

class HtmScarab
  # class for converting from html to "semantic sugar",
  # essentially the eval method of this class will remove
  # non semantic html elements

  def initialize html_file
    @html = ""
    File.open html_file, 'r' do |file|
      while line = file.gets
        @html += line
      end
    end
  end

  def clean
    regex = /\n|\r/mi
    @html.gsub! regex, ' '
    regex = /\s\s*/mi
    @html.gsub! regex, ' '
  end

  # the heart of the operation!
  def eval
    if (@html)
      list = []
      # the following are not semantic or are unnecessary:
      list << Pair.new(/<head.*<\/head>/mi, "")
      list << Pair.new(/\s*class=\".*?\"/mi, "")
      list << Pair.new(/<\/?(div|span).*?>/mi, "")
      list << Pair.new(/<script.*?<\/script>/mi, "")
      list << Pair.new(/<style.*?<\/style>/mi, "")
      list << Pair.new(/<\?xml-stylesheet.*\?>/mi, "")
      list << Pair.new(/<!--.*?-->/mi, "")

      # what was I doing?
      list.each do |pair|
        @html.gsub! pair.regex, pair.value
      end

      clean
      # return
      @html
    end
  end

end

And of course we do some quick regex seek and destroy. It may not be great but it gets the job done… well not quite, I just invented the class Pair as I went by, because it was convinient, so time to create it with all the functions we need:

class Pair
  attr_accessor :regex, :value

  def initialize regex, value
    @regex = regex
    @value = value
  end

end

The point

You might be tempted to add more methods and so on to either the Pair or Scarab class. Don’t! It’s a waste of time, and effort, even if they look incomplete as they are; overengineering (anything) will only eventually cause it to be unnecessary complicated and eventually harder to understand. A lot of programers will occasionally use their “god given foresight” to create all sorts of extra functions for the future. The consequence is classes with all sorts of useless dangling bits nobody ever needs.

The incremental way I create the script in the example above is not always possible for any program; but do try to at least sketch up a prototype application and thus create the application starting from the functionality inward rather then conceiving and presuming usability and usefulness.

In the case of ruby adding useless methods when they are not needed is even more useless then other languages. Suppose we want to reuse a object of our Scarab class, we would need to add a extra method. It goes something like this:

class HtmScarab
  def set value
    @html = value
  end
end

So, I opened the class by writing class HtmScarab / end anywhere in my code, then added the new method I need. It’s simple, clean and in a way efficient.

What is the first words you hear when tags such as <b>, <i>, <small>, <big> come into discussion? The most common I keep hearing lately is: they are not semantic, use the <strong> and <em> tags instead. But are they really not semantic.

The purpose of HTML

As I see it a HTML document is (or should be) two very simple things, in no particular order:

  • readable by machines, so we can form an aggregate so as to make use of the information distributed. The most obvious example here being the common web search engine, with google as the main candidate even though there are other older ones.
  • readable by human beings, since if all we do is turn thought to 1s and 0s, the sum of our efforts may have zero value.

So from this we can assert a document needs to have both human semantics and machine semantics.

What we have

Surprisingly the original html is pretty well designed for this. Most tags will accomplish both tasks, by encapsulating both purpose and meaning in the same envelope. Take the humble h1, h2, h3 tags. A machine will understand them as the start of a section and also use them to subsequently determine the stacking (nesting) of sub-sections and a human will perceive them as titles.

But not all tags accomplish this bridge in understanding. Consider the anchor tag, it offers machine-only semantic sense since as human beings we can not magically parse text/url data; at best we might guess based on the words contained within. We also have tags which only apply as human semantics, here the i, b, big, small, to mention just the prime candidates, are there so humans can understand html.

They are not just style! This so called “style” has existed and been understood long before the web even had the foundations to stand on its own two feet. It is obvious with the state and common missuses on the web today, human expression can not be captured with generic terminology and encapsulated into some box like text. Not in the near future at least. That is why it is often necessary to hint to the meaning behind the words rather then to clearly butcher it by slicing it up.

Example

Lets take something which color coding is supposedly good at keeping us away from. Not the best example but let us say I have the following:

I had a car crash, the driver will pay for this.

What exactly am I saying here, did I have a car crash or did one of my drivers crash my car. I could emphasize the text with either a <em> or a <i>, but here’s the catch: in the context of the rest of the content I supposedly have it would make absolutely no sense to use a em. It may share both machine and human understanding but unfortunately in this document I just want to avoid the confusion, and the rest of the content has nothing to do with any of the two possible meanings of the incident, so adding machine semantics would be an error and would only skew the meaning. Just think of a document that has a lot of catchy phrases like that; if the topic is not them but how they are conceived does it make sense to add a em or strong to every clarification on them, after all there is no meaning for the words they contain in the context given.

The awful truth

Presented with the above some may press the following question: can you not place it in a <span> and style it with css. And the answer is: how would that then be separation of semantics and presentation.

Presentation should be something that goes on top of semantics in a document it should not be something that guarantees semantics in a document. These are not just empty words because someone wants it to be so. Lets think for a moment, what exactly guarantees that the screen of the client device will be 800 x 600 at least or that the client device supports let alone the exotic properties you used, even the first edition Css. Nothing guarantees this.

With current technology even the boundary between what the interpreter can be if vague; forget about exotic stuff such as a screen reader, it may be a simple thing like some aggregation service such as RSS or integrated into the network where you reside; think of WordPress.com and posts on this blog for example. You can subscribe to them and WordPress will show them in a sort of blog surfing listing. I don’t think anyone is under any misguided assumption that listing has any of the originating blog’s Css styling, so much for any span styled semantics.

So, even if you are stuck to the <em> and <strong>, consider if having your site look like a big blob of text with divs and spans striped out (the true presentation markup) is desirable and worth it.

Now I’m sure we’ve all heard this once or twice before:

Add comments to your code, as often as you can.

Unfortunately this innocent good advice is bad in practice. Well not bad as in inapplicable, but rather people seem to be a little clueless to how to actually apply it to practice.

Let us start with this mess:

package com.wordpress.sixmoon;

public class txNrm {

public static double proc (double[] in) {
double re = 0;
for (int i = 0; i < in.length; i++) { re += Math.abs(in[i]); } return re; } public static double proc (double[][] in) { double re = proc(input[0]); double c; for (int i = 1; i < in.length; i++) { c = proc(in[i]); if (re < c) re = c; } return re; } }[/sourcecode]

First of all, can this pile of junk get better if we add comments to it? The answer is, no. Computer code, for the most part, is designed to be to a certain point human readable. Do not fall under the illusion that adding more to unreadable code is going to make it more readable.

The wrong way, on the right street

Nevertheless some people try, often beginners and often a lot like this:

package com.wordpress.sixmoon;

public class txNrm {

/** method that accepts as input a vector */
public static double proc (double[] in) {
double re = 0; // return
// loop though all values
for (int i = 0; i < in.length; i++) { re += Math.abs(in[i]); } return re; } /** method that accepts as input a matrix */ public static double proc (double[][] in) { double re = proc(input[0]); // initialize double c; /* swap variable */ for (int i = 1; i < in.length; i++) { c = proc(in[i]); if (re < c) re = c; // new maximum } return re; } }[/sourcecode]

The simple lessons to learn, commenting is not what individual code snippets do, its what’s they are for (we can all see what they do). And, keep code clean to avoid having to write comments in the first place, or at all.

The alternative way

The following example minimizes comments, to the essential bits. Often thinking of comments, as commenting to a blog, article etc gives out the best results.

Keep in mind that there is not fixed formula and depending on what you are working on will vary what your comments should be useful for (in the following mathematical context makes mathematical hints important).

package com.wordpress.sixmoon;

public class TaxicabNorm {
// Mathematical class for calculating “Taxicab norm”
// Norm is also known as “Manhattan norm”

public static double calculate (double[] A) {
double norm = 0.0;
for (int i = 0; i < A.length; i++) { norm += Math.abs(A[i]); } return norm; } public static double calculate (double[][] A) { double norm = norm(A[0]); double cache; for (int i = 1; i < A.length; i++) { cache = norm(A[i]); if (norm < cache) norm = cache; } return norm; } }[/sourcecode]

As the code above shows, placing good comments makes spotting (obvious) errors easier.

End note

Remember, clear code speaks for itself, keep comments to just that, (lit.) comments. And by the way, do keep them in nice paragraph like blocks, it makes them so much easier to read, and keeps code clean as well.

I’m sure by now everyone know that a tag is a word starting with a letter enclosed within “<” (lower then) and “>” (greater then), and how it is highly recommended we should close them so as to avoid confusion, blah blah. But, semantics are not just going to write themselvs just by knowing that, and I find many people do not actually know what the heck it is they are writing.

Normally we start small, but that’s so boring, so here’s a full page:

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="index.css" ?>

<!DOCTYPE 
   html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
>

<html xmlns="http://www.w3.org/1999/xhtml">

   <head> 
      <meta http-equiv="Content-Type" 
               content="text/html;charset=ISO-8859-1" />
      <link rel="stylesheet" type="text/css" 
               href="index.css" />
      
      <title>Untitled</title>
      
      <style type="text/css">
      /** page specific style **/        
      </style>
      
      <meta name="description" content="Lorem ipsum." />
      <meta name="keywords" content="lorem, ipsum" />
      <meta name="author" content="velvet" />
      
      <meta name="distribution" content="global" />
      
      <link rel="copyright" href="#" />
      <link rel="help" href="#" />
   </head>
   
   <body>
      <h1>My Blog</h1>
      <h2>Lorem ipsum 2009</h2>
      <p>Lorem ipsum dolor sit amet, [...] </p>
      <p>Nulla facilisi. Vivamus erat neque, [...] </p>
      <p>Vivamus semper convallis enim. [...]</p>
      <h3>Comments</h3>
      <p>Vestibulum dignissim placerat magna.</p>
      <p>Cras hendrerit, dolor at semper rhoncus, 
      est odio sodales ligula, ut ante.</p>

      <h2>Lorem Ipsum 2008</h2>
      <p>Lorem ipsum dolor sit amet, [...] </p>

...
      
      <script type="text/javascript" src="index.js">
      </script>   
   </body>
   
</html>

I’ll explain each line starting from the top.

XML and DTD

<?xml version="1.0" encoding="ISO-8859-1"?>

Because I am writing XHTML (ie. “eXtensible HTML“) my page is (to some extent) a xml document, so it is only natural I treat it as such.

The line is a standard (I say this because it is easily overwriten) declaration of the document as XML, in our case its I’m saying:

This is a XML document using the 1.0 specifications, and using the character encoding ISO-8859-1.

Now, I did say “it is easily overwriten” and you would be interested to know that all major browsers will not care much for you writing it. Instead, they will determine what your document is (this includes all types of files) by which MIME type the server specifies for your document when it is sent. However should your document be saved to disk, the browser no longer has this convenience and will look at the above line.

Why do you need it: If your document is XML, its mandatory to have this. Parsers will throw an error should it be omitted.

<?xml-stylesheet type="text/css" href="index.css" ?>

This line specifies the Css stylesheet using xml syntax (I specify it bellow in html too, but no harm here). Translation:

Style this content with the stylesheet writen in “index.css” (located in the current folder). The style sheet has the MIME: text/css.

Why do you need it: Devices that understand very purist xhtml syntax may like it.

<!DOCTYPE 
   html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
>

This a doctype (Document Type Declaration) declaration. It tells the browser what tags go in what tags, what attributes are valid for each tag, and so on and so forth. And it is very important, as I shall explain bellow.

Fist the basics, a doctype declaration starts with a <!DOCTYPE and ends with >, I won’t go into detail about how to write one but I will explain what the code snippet we have does.

In the above doctype declaration we have linked the public (as in known by default by browsers) declaration of — in our case — xhtml strict document to the html tag (the root of our document). By linking it in, we have also declared all other enclosed elements by it as abiding by said doctype specifications.

The extra uri within quotes specifies a raw copy of the DTD (you can go there to see all the code). This is optional since just providing the public identifier is sufficient, if you wish you can write the entire declartion as:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">

Why do you need it: Modern browsers have to work with both old and new. So what happens when they see a page? should they run it though the gauntlet of code fixing when processing or trust that you were competent enough to write it correctly. Obviously that’s a hard decition, so they use what’s referred to as a doctype switch. Depending on what doctype you chose they will run more or less code fixing. This will both effect inconsistencies while designing (some Css may not work well if at all should you have a incorrect doctype) and also end-user performance. You can see a very simple behaviour chart created by Opera.

Just to be clear, a DTD is a HTML standard not a XML one, XML’s equivalent to DTDs is a Schema but since both are interchangeable as function go, and since browsers understand DTD better then anything else (and we just need to specify one not write one), its better to use DTDs for HTML pages.

Moving on to actual HTML…

<html xmlns="http://www.w3.org/1999/xhtml">

Start HTML markup, using XML namespace (xmlns): “http://www.w3.org/1999/xhtml&#8221;.

Do not be confused by xml namespaces tending to look as URLs, that is just because its easy to be unique that way. If the standards (set by w3c) had chosen the URI (Universal Resource Identifier) as using the format such as that used in Java: org.w3.1999.xhtml, we would be writing that.

Why do you need it: It identifies all elements including the html element in which it resides as belonging to XHTML, this does not imedialty become useful, and with today’s “standards in writing pages” and rendering by browsers (particularly desktop oriented ones) for most developers they are not very useful, but should you move to inserting other XML documents inside, it becomes useful. Consider something like this:

...
<html 
   xmlns="http://www.w3.org/1999/xhtml"
   xmlns:blog="org.example.blog.something.standard"
>
   <blog:pagetitle>My blog</blog:pagetitle>
   <blog:title>Post 1</blog:title>
   <p> ... </p>
   <img src=" ... " alt=" ... " />
   <p> ... </p>
   <p> ... </p>
...
   <blog:title>Post 2</blog:title>
   <p> ... </p>
...
</html>

Ok moving on to the html <head> section or “do not print this stuff on the page” / “meta-data” section.

<meta http-equiv="Content-Type"  
            content="text/html;charset=ISO-8859-1" />

I declare the content of this document as being text/html writen with the character set defined by ISO-8859-1.

Why do you need it: This is the standard HTML declartion for content type. This declaration should appear in a html document, however since the move to xml this declaration has become somewhat redundant and there probably will not be any issue removing it. Remember that once placed in the document if the if the browser detects conflicting settings here to and what its been told, it will go back to the top and restart processing with the charset mentioned in this meta tag (some of them will), so place it at the very top of the head element to avoid useless processing.

<link rel="stylesheet" type="text/css"  
               href="index.css" />  

A HTML declaration for the stylesheet, everything here is read the same as the xml one. If anyone is wondering why its called a generic “index.css”, its because its highly recommended to merge all your style sheets into one to avoid delaying page load with too many http requests to the server. I suggest you avoid separating different media stylesheets and instead use @media Css rule, as the gain from separating is little to nonexistent.

<title>Untitled</title>  
  
<style type="text/css">  
/** page specific style **/  
</style>  
  
<meta name="description" content="Lorem ipsum." />  
<meta name="keywords" content="lorem, ipsum" />  
<meta name="author" content="velvet" />  
  
<meta name="distribution" content="global" /> 

These are all very simple metadata which does pretty much what it says. I suggest reading more into SEO to find what they do, as well as what you should be doing and not doing with them.

<link rel="copyright" href="#" />  
<link rel="help" href="#" />

With these I am linking to documents which have a relationship with the current document; I’ve inserted those as a example, links to such documents is not necessary and you might be doing inside the <body> block. Note that the relationship is not random.

Why would you use such things: some programs make use of such metadata to improve the user interface.

Moving on to the body, I’ll start with the end…

<script type="text/javascript" src="index.js">  
</script>  

Why do I have all my javascript at the bottom of the page? The answer is simple: to avoid it loading before content. Lets say I have a huge script and some content it is applied to, the content in question is also perfectly viewable/usable with out the script, so then why waste time waiting for the script… It doesn’t make sense so we place the script as the last node in the body thus loaded last, this also avoid posible errors where javascript DOM alterations are not applied to some nodes which were not loaded at the time of the scripts execution (in certain incompetent browsers).

Semantics in HTML

Moving to content,

Good example

<h1>My Blog</h1>  
<h2>Lorem ipsum 2009</h2>  
<p>Lorem ipsum dolor sit amet, [...] </p>  
<p>Nulla facilisi. Vivamus erat neque, [...] </p>  
<p>Vivamus semper convallis enim. [...]</p>  
<h3>Comments</h3>  
<p>Vestibulum dignissim placerat magna.</p>  
<p>Cras hendrerit, dolor at semper rhoncus,  
est odio sodales ligula, ut ante.</p>  
  
<h2>Lorem Ipsum 2008</h2>  
<p>Lorem ipsum dolor sit amet, [...] </p>

It may not look it to some but that is how every proper XHTML sematic webpage should look, once striped to the bone of any spans, classes, divs and other presentation markup. To show how the above code works, lets consider the following — ever so common on forum software — bad example:

Bad example

<div id="header">
   <img src="header.jpg" alt="My blog" />
</div>  

<h4>Lorem ipsum 2009</h4>  
<div class="content">
Lorem ipsum dolor sit amet, [...] <br /> <br />
Nulla facilisi. Vivamus erat neque, [...] <br /> <br />
Vivamus semper convallis enim. [...] <br /> <br />
</div>
<em><strong>Comments</strong></em>
<div>Vestibulum dignissim placerat magna.</div>  
<div>Cras hendrerit, dolor at semper rhoncus,  
est odio sodales ligula, ut ante.</div>  
  
<h4>Lorem Ipsum 2008</h4>  
<div class="content">
Lorem ipsum dolor sit amet, [...] 
</div>

Just looking at it as a comparison it becomes evident something is horribly wrong. But lets drill though it to show just what exactly it is that is wrong and how.

First thing first, the site’s name/branding. In the good example, the title is placed in the once-per-page <h1> tag giving it maximum importance and naming the entire document; placing more then one <h1> tag would semantically mean more then one document. In the bad example the title of the page is placed as merely the alt of a image; semantically and from a SEO perspective it might as well not have been placed at all; remmeber the <title> in the <head> should (SEO wise) and is the title of the current page not the site, but it should not be a stand in for the current page’s title, since it is metadata not page content.

Moving on to the next error. If you look at the title of the posts, you’ll notice how the bad example has a <h4>. Ever since HTML first came to be, every hobbist tutorial site out there labeled the h1, h2, h3 etc as being headers with different degree of importance, and subsequently the genral populace (and more hobbists) continued the tradition of ranking content based on their bias and giving it a h label from 1 to 6. This is complete semantic nonsense and just to get things straight:

You are not helping crawlers and the web in any way by “ranking headers”!

Take the following example:

<h4> ... </h4>
<p> ... </p>
<p> ... </p>
<h1> ... </h1>
<p> ... </p>
<h3> ... </h3>
<p> ... </p>

Can you tell in which order that data should be semantically ordered. No, and neither can the web.

Headers are like nested lists, you always start with a <h1> (the “importance” is where you decide to start with it), you always use a <h2> for sub-content and another <h1> if its adjesent content. Once you used a <h2> you would use another <h2> for content of similar importance or a <h3> for sub-content to that. And so on and so forth:

<h1> ... </h1>
   <h2> ... </h2>
   <h2> ... </h2>
      <h3> ... </h3>
   <h2> ... </h2>
   <h2> ... </h2>
      <h3> ... </h3>
      <h3> ... </h3>
      <h3> ... </h3>
      <h3> ... </h3>
   <h2> ... </h2>

Now your entire document makes sense, every section defined by a header can be compared logically with any other; and thus subsequently data in that section as well. Compared to the complete randomness in the earlier example it is a huge improvement.

Moving on to the difference in writing content, lets look at the good and bad side by side:

<p>Lorem ipsum dolor sit amet, [...] </p>  
<p>Nulla facilisi. Vivamus erat neque, [...] </p>  
<p>Vivamus semper convallis enim. [...]</p>  
<div class="content">  
Lorem ipsum dolor sit amet, [...] <br /> <br />  
Nulla facilisi. Vivamus erat neque, [...] <br /> <br />  
Vivamus semper convallis enim. [...] <br /> <br />  
</div>  

In case your wondering the “[…]” means nothing special. It is the typographic way of saying “inserted content”, with the inserted content defined by square braces (in our case a ellipse for: “more”).

What is a <p>? I’ll tell you what it is not: a <p> is not a block of text with a empty line at the end, it is a “idea” or block delimiter for a message. You do not write <p>‘s just because they look like paragraphs, they have semanic value!

What is a break? A break delimites line data in html elements where it makes sense, such as the <address> element, think of phone, street, city etc, uncountable data. The <address> is used as for the author (of the page) information inside the page content; it was made in “simpler times” hence address, don’t missuse it by placing countless addresses on of people in it, it makes no sense if they are not the authors of the page where <address> is placed.

So now knowing that, how much sense does it make to insert two consecutive breaks (there is no real sematic use where you would use two!) instead of a paragraphs? To put it simply what is happening here is that three ideas are turned into one marvelous blob of text, though a hack to the semantic markup, with god knows what meaning; as much as this could mean a paragraph it could also mean a quote or anything (preformated sample computer code anyone?) since the enclosure is not a clear semantic delimiter but a div, which is used to mark semantic markup but has no semantic meaning itself.

Onto the last piece of semantic desaster, consider the following, again good and bad example side by side:

<h3>Comments</h3>  
<p>Vestibulum dignissim placerat magna.</p>  
<p>Cras hendrerit, dolor at semper rhoncus,  
est odio sodales ligula, ut ante.</p>
<em><strong>Comments</strong></em>  
<div>Vestibulum dignissim placerat magna.</div>  
<div>Cras hendrerit, dolor at semper rhoncus,  
est odio sodales ligula, ut ante.</div>

I already talked about headers and paragraphs and their importance above, but lets look at what is happening here with the alternative “emphisized” comment title. First of all even though it may seem correct (since we’re going to presume here the there is a enclosing block) to place those inline nodes there, do not do it! Blocks should follow blocks and most certainly inline elements should only be inside blocks not adjesent to them. Seccondly, placing that double emphasis is quite simply useless, there is no such thing as “more emphisized”, even though you want it to be so, so avoid double emphasizing something unless it’s a special case where your emphisizing part of something which is already emphasized.

The rest of the problem is obvious to to write it down: the comments and post content are being merged since obviously the emphized text betwean them is nothing but a mere paragraph; in many situations this merger is not desired.

Semantically speaking in sertain situation it is fair use to lets say “over emphasize” a sentence as a visual que to the reader and to avoid placing a title. This can be subsequently made to look as a title while semantically acting as a “anchor”.

Tip

Try to, start from the semantics outwards. Not from:

<div class="grabage code navigation"> ... </div>
<div class="grabage code header"> ... </div>
<div class="grabage code footer"> ... </div>

That is all (for now)

Do not worry you shall forget it soon enough.

Requirements

First, you need ruby!

Open your command-line, first thing to do is update gems. On unix systems you can prefix the following with sudo if you do not have the permissions.

gem update --system

Then install rails type in:

gem install rails --include-dependencies

Creating a RoR project

Done? Create a folder somewhere, in the rest of the post I’m going to use D:/Test/2009-04/. To create a rails project you simply need to type rails followed by the name of the project, for example to create a project darkmoon configured for a mysql database:

cd d:\Test\2009-04\
rails -d mysql darkmoon

To test some basic functionality, run the WEBrick server:

cd darkmoon/
ruby script/server

On a unix box the last line should read ./script/server/

Go to http://localhost:3000/, if you see the rails welcome page all is well.

Database

Your database configuration is located in config/database.yml. Default values are usually what you need if you do not run some exhotic configuration.

You should make sure your database driver is available, if you are not you will need to instal them, for example:

cd d:\Test\2009-04\darkmoon\
gem install mysql

Your database configuration file is located in config/database.yml. Often for testing, the dafaults should do nicely.

To setup the database:

cd d:\Test\2009-04\darkmoon\
rake db:create:all
Tags: , ,

Short summary

  • Apache’s <Directory /> needs to read Options Indexes FollowSymLinks +ExecCGI
  • Ruby cgi scripts need to have #!C:/path/to/ruby/bin/ruby.exe on Windows or #!/usr/bin/ruby for unix systems.

From the basics

Install wamp, just because its fast, simple, easy and stupid. Done? ok start it! You should see a little icon notication in your taskbar (near the clock). Click it! You should now see a set of menus, these are all shortcuts to whichever little thing you shall ever need (well, almost).

Now, to run a cgi script you will need to set up apache for it. Simply click Wamp, go to Apache, then httpd.conf. This will probably open it in notepad, if you do not have anything better setup. Press Ctrl+F and search for the following: <Directory. Found it? if not go to the top and click on the first line (notepad search is pretty stupid). If you are not following this tutorial using Wamp then this is probably the place you should edit, but since this is wamp its just slightly more... Wamp sets up its own special directory structure, so search again for <Directory and you should find something like (depending on where you installed wamp). These settings override the non specific ones so when using wamp or if you have something similar set up, edit here.

The alterations are relatively simple, simply, where it says Options make sure it says Options Indexes FollowSymLinks +ExecCGI.

Search for AddHandler cgi-script .cgi (it should be commented out, uncomment it by removing the sharp sign in front). then add a extra .rb at the end so it reads AddHandler cgi-script .cgi .rb

All done.

Optionally, you can search for DirectoryIndex and add index.rb to the list of files so that apache can auto execute them.

Now for some ruby scripting

Depending on the system your are using the first line will differ slightly, but in more or less it means the same ("execute the script using this"). For mac/unix systems type something like #!/usr/bin/ruby, while on windows it is: #!C:/path/to/ruby/bin/ruby.exe. Next type in your code, here's a sample snippet:

puts "Content-Type: text/html"
puts
puts "<html><body>"
puts "<p>Congratiolations on completing the intro.</p>"
puts "</body></html>"

This was meant for ruby. But, replace ruby references to Python etc and you get the same effect.

Tags: , , ,