Making Sense of Programming Insanity: August 2016

Discussion:

Download Source: Visual Studio 2008 Project

Introduction

Usually there is a better way to grab text from the screen by using SendMessage with GETTEXT or using an appropriate API/SDK, but for applications that are using rendering systems such as GDI or DirectX/OpenGL a form of OCR is needed to grab the text.
That’s where a simple OCR program in C# comes in handy.
This Assisted OCR engine will use the combination of initial human input with character processing to provide an automated screen OCR process.
For starters we need to know some basics about pixels and text recognition:

Pixel Basics

There are several ways that an application can render text to pixels:

SingleBitPerPixel where each pixel is either on or off. See “Figure A”
AntiAliased where text appears smoother when neighboring pixels are turned slightly on near curves and angles
ClearType in this mode neighboring pixels are colored with red green or blue because LCD screens have RGB sub pixels usually from left to right that they combine into the colors we see.
This works because our eyes are more sensitive to brightness than color, so the eye sees this text as if the screen has triple the DPI
In this mode an l could be shifted left 1/3 of a pixel, and the top pixel would be rendered as ##B RG#, rather than a single RGB pixel, Black text is rendered as an absence of RGB so technically this is in reverse. See “Figure B”

Figure A – Clear Type Off (SingleBitPerPixel)

Figure B – Clear Type On (ClearType)

Recognizing Text

A good first step to recognizing text is getting the size and position of text lines:

There are many ways to do this, one good way is to do a horizontal scan pixel by pixel (or a more optimized approach as in the code) and look for pixels that contrast the major background color. You can now skip down a few pixels and do the same horizontal scan. Once you have a scanned grid, you can do further vertical scanning to determine the height of the text line and create a rectangle containing the entire area of the line:

Now that we know where the text is, and have an approximate size we can start looking at individual characters.
This should be simple as each character is separated by pixels, but there are some exceptions due to kerning and specific fonts where characters actually touch each other or even overlay.

Notice the fi
In this step of finding the characters we simply ignore this and treat fi or other kerned characters as a single entity. These will be parsed out later using known characters.

Now that we have found each character it is best to remove the ‘antialiasing’ and just read the pixels with the most contrast. I will put them into an List of Points starting from left to right to simplify later OCR logic. I also put the height of each character and boolean values indicating if it is tall, short or below the baseline.

Recognizing Individual Characters

Now that we know all of the characters in an individual line, it is time to recognize them as C# Strings:
The recognition will be made faster by using the height and boolean values as well as the number of pixels to ignore un-similar characters. Once the character passes the similarity test it is put through a pixel by pixel comparison to verify the match.
If there is no match the character is put through a secondary process where the character is matched with known characters from left to right:

Known Characters

“f” matches a known character, then we are left with “i”
We have successfully split this into f and i.
Of course this just matches characters to each other, we still don’t have useful Strings of text. This is where the Assisted process comes to life. In a few samples of text we can have most of the characters that will ever be used. Now a human can look at a generated list of unknown character symbols and enter the correct String value for each one. Once this is done for each character it wont need to be done again.
So a user will see a list:

And they will type “Agofdrsti”, once they have typed all 52 letters and a few symbols, they are ready for complete automation.
This process could be more automated by writing code to generate every possible character of every possible font at the font size used and compare against that. With one hundred fonts a decent computer should be able to do this in less than a few seconds. Also many letters are easily recognized by detecting simple shapes such as o,l,T,i,L,N etc. A smarter approach is very possible, and all the preprocessing we have done will make it much easier.

Code:

The entire visual studio project is available for download, however I will only be going over key portions of code in this text:
Let us assume that pixel values have been transformed into grayscale floats 0 being the lightest and 1 being darkest. Also we have broken the text into regions like this:

TextRegion Definition

internal class TextRegionInternal : TextRegion
{
    public float[,] Pixmap;
    public List<CharDef> Characters = new List<CharDef>();
    public TextRegionInternal(Rectangle bounds) : base()
    {
        Bounds = bounds;
        Pixmap = new float[bounds.Width+1, bounds.Height+1];
    }
}

public class TextRegion
{
    public Rectangle Bounds { get; set; }
    public string Text { get; set; }
    public string CharacterSet { get; set; }
}

So now we have a List of TextRegions:

List<TextRegion> Regions;

Each TextRegion contains a float[,] Pixmap which has a matrix of brightness values for each pixel in the region.

Character Parser

foreach (var item in Regions.OfType<TextRegionInternal>())
{
    int left = 0;
    int right = 0;

    bool wasEmpty = true;
    bool wasSpace = false;
    int space = 0;
    for (int tx = 0; tx <= item.Bounds.Width; tx++)
    {
        int cc = 0;
        int nxt = 0;
        for (int ty = 0; ty <= item.Bounds.Height; ty++)
        {
            if (item.Pixmap[tx, ty] > .3)
            {
                if (tx < item.Bounds.Width)
                {//check straight and diagonal if there are any connecting pixels
                    if (item.Pixmap[tx + 1, ty] > .3) nxt++;
                    if (ty - 1 > 0 && item.Pixmap[tx + 1, ty - 1] > .3) nxt++;
                    if (ty < item.Bounds.Height && item.Pixmap[tx + 1, ty + 1] > .3) nxt++;
                }
                cc++;
            }
        }

        if (cc != 0 && wasSpace)
        {
            wasSpace = false;                       
            item.Characters.Add(new CharDef() { Text = " " });
        }

        if (cc != 0 && wasEmpty)
        {
            left = tx;
            wasEmpty = false;
        }

        if (nxt == 0 && cc != 0)// if end of character
        {
            space = 0; //reset space counter
            right = tx;

            int char_width = right - left;
            int char_height = item.Bounds.Height - 1;
            float[,] map = new float[char_width,char_height];

            for (int xc = 0; xc < char_width; xc++)
                for (int yc = 0; yc < char_height; yc++)
                    map[xc, yc] = item.Pixmap[xc + left, yc];

            item.Characters.Add(new CharDef() { Pixmap = map });
            wasEmpty = true;
        }

         if (cc == 0)
        {
            space++;//no character found, starting incrimenting space counter
            wasEmpty = true;
        }
        if (space > (item.Bounds.Height / 4)) //height / 4 is a decent estimate of a space
        {
            space = -item.Bounds.Width;
            wasSpace = true;//significant spacing found, set flags
        }

    }
    item.Text = item.Characters.Count.ToString();
}

After that code we have something like this:

Now all we do is compare each character and find ones that match, then compare against our existing database of characters. We now have an Assisted OCR system.

Download Source: Visual Studio 2008 Project

Applications:

General Application Automation
Get current song information from internet radio applications
Make BOTs for chat programs

Simple OCR being used to keep track of Pandora songs (And More =D)

Advantages:

High Accuracy
Fast Performance

Limitations:

Requires high level of contrast and known font colors
Possible obscuring of text by other windows
ClearType reduces accuracy significantly (Turn it off)

It would not be too difficult to add full ClearType capability, Font pixels would simply need to be read as triplets, red green and blue deviations from background/font color would be converted to pixels being on or off
Requires GridFit fonts for best accuracy

(most programs render using GridFit for crisper results)
Requires Human pre-processing of each character for each font/size one time

(An automated processing system could be created but could reduce accuracy and decrease performance)

Making Sense of Programming Insanity

Monday, August 1, 2016

Assisted Screen OCR (Screen Scraper)