I’m developing a small console application in C# to convert .gde files – file format of the wonderful “The Guide outliner” – to .chm (Microsoft Compiled HTML Help).

First step is to convert a .gde file to XML. This can be done with gdeutil, a tool included with The Guide. However, gedeutil.exe does not create a valid XML files: the character ‘&’ in node titles is not escaped to ‘&’.

So, I had to incorporate an XML preprocessing step in my tool, in which unescaped charachters are replaced by their XML entities. Otherwise, the document can not be parsed by the .NET XML parser (or most other parsers).

This is the method I created for this purpose:

/// <summary>
/// Inserts '&amp;' for '&' character in XML text.
/// </summary>
/// <param name="xmlText"></param>
public static String PreProcess(String xmlText)
{
    if (String.IsNullOrEmpty(xmlText))
        return xmlText;
   
    bool ampersand = false;

    StringBuilder output = new StringBuilder();
    StringBuilder buffer = new StringBuilder();
    for (int i = 0; i < xmlText.Length; i++)
    {
        char c = xmlText[i];
        if (c == '&')
        {
            // Maybe this is the start of an entity
            ampersand = true;
            buffer.Append(c);
        }
        else if (ampersand && c >= 64 && c <= 122)
        {                                        
            buffer.Append(c);
        }
        else if (ampersand && c == ';')
        {
            // Turns out to be an entity; don't change the output                
            output.Append(buffer.ToString());
            buffer.Clear();
            output.Append(c);
            ampersand = false;
        }
        else if (ampersand && (c < 64 || c > 122))
        {
            // Turns out not to be an entity                                      
            output.Append("&amp;" + c);
            buffer.Clear();
            ampersand = false;
        }
        else
        {
            output.Append(c);
        }                
    }
    return output.ToString();
}

Note that this is not a way of escaping entities in a text string (thats what HttpUtility.HtmlEncode is for), but a method of escaping characters in a complete XML document that includes tags The trick is to ignore already escaped characters; otherwise a simple search ‘&’ and replace with ‘&amp;’ would suffice.

I’m aware that this is not a fail-safe method. Nevertheless, I’m confident that this method is robust enough for use with XML files produced by gdeutil

Links

A simple captcha’s effect

Last July I implemented a simple captcha on this blog. Even though this type of captcha can be easily passed by an automated system (I wanted to keep comment posting as easy as possible for humans), the effect on received comment spam is very satisfactory as the image below shows.
(click for large version)


Aksimet stats screenshot.

Until this weekend, my last C++ program (more like a hello world “application”), dated from a few years ago. This weekend, I needed to write a small utility, that had to be very lightweight and fast. So I thought, why not try to write it C++? I must say, it was a bit of a challenge being a C# .NET programmer, but I succeeded. The biggest challenge though, turned out to be understanding strings in Visual C++.

As a C# programmer – used to just simply using the “String” class whenever text is involved – I was utterly confused to find many different constructs for strings in (Visual) C++. LPTSTR, LPCSTR, CString, TCHAR[], std:string… just to name a few. String types seem to live on different islands and in villages. There is an Unicode island, an ANSI island. On both islands there are “standard” and Microsoft villages.

It is very difficult to get a clear overview of Visual C++ string landscape, even by Googling. What I found where mainly forum threads with confusing answers, and MSDN articles shedding light only on Microsoft variants like LPTSTR.
But, I found a lighthouse, a guide that clearly explains the why and what of Visual C++ strings:

Unraveling Strings in Visual C++.

I hope this will help other .NET programmers who found themselves lost in a sting of C++ islands.

CodeIgniter query log hook

Whenever I use an ORM, such as CodeIgniter’s Active Record implementation, I find it very important to see the actual queries generated. Mainly because the danger of using an ORM is not knowing what happens in the background, which can introduce hard to find bugs and performance problems.

Below you’ll find a CodeIgniter hook that logs all database queries to a simple text file. I found this code useful in my first CodeIgniter project (since it’s from my first CI project, I think many revisions will follow, but you’ll get the idea).

/* config/hooks.php */
$hook['post_system'][] = array(
        'class' => 'QueryLogHook',
        'function' => 'log_queries',
        'filename' => 'QueryLogHook.php',
        'filepath' => 'hooks'

/* application/hooks/QueryLogHook.php */
class QueryLogHook {

    function log_queries() {   
        $CI =& get_instance();
        $times = $CI->db->query_times;
        $dbs    = array();
        $output = NULL;    
        $queries = $CI->db->queries;

        if (count($queries) == 0)
        {
            $output .= "no queries\n";
        }
        else
        {
            foreach ($queries as $key=>$query)
            {
                $output .= $query . "\n";
            }
            $took = round(doubleval($times[$key]), 3);
            $output .= "===[took:{$took}]\n\n";
        }

        $CI->load->helper('file');
        if ( ! write_file(APPPATH  . "/logs/queries.log.txt", $output, 'a+'))
        {
             log_message('debug','Unable to write query the file');
        }  
    }

}

If I had a “programming quote of the week” on this blog, this would be the one for this week:

It’s harder to read code than to write it.

Very true indeed. Why?

We’re programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We’re not excited by incremental renovation: tinkering, improving, planting flower beds.

Read more:

Link dump June 2011

  • Using MySQL Spatial with Java JPA – While accessing a MySql table which contained a ‘Point’ datatype, I could find a way to convert the binary array returned by the ADO.NET MySql data connector, to a WKB blob. Finally I found the answer in this post.
  • Subsonic source inspiration for my own ORM
  • Ben onderweg – ik weet niet waarom, ik weet niet van wie; maar bij toeval liep ik hier tegenaan.

Read the rest of this entry »