Blocking pornographic, gambling and other illegal sites may be a lot more easier for parents and law enforcement aganecies with European computer scientists developing a way to "read" web addresses overlaid in images or in a still from a video.
Internet marketers of all shades might add a website address, a URL, to a graphic or photo that might then be found through an image search engine. But clicking on such images may come with the risk of visiting illegal websites.
Given that internet search companies and other service providers are involved in various initiatives to identify and block illegal material on the internet, this new approach to URL extraction from images could be added to their arsenal of techniques for detecting such content as well as being useful in criminal investigations surrounding said content.
Nikolay Neshov from Technical University of Sofia, Bulgaria and colleagues have developed a computer algorithm that can detect the presence of text overlaid on to an image or a still from a video, extract the text and convert it into an active URL for accessing or blocking a website.
Conventional method of detecting anomalies in images called optical character recognition (OCR) does not work well with text overlaid on images. The new approach uses an identification extraction technique that finds anomalies in an image that would be present if text is overlaid.
The team has successfully tested their algorithm on thousands of images with overlaid URLs. They were able to identify 619 URLs from a random selection of 1000 test images at a rate of three per second using their approach.
Conventional OCR was faster but only found 83 URLs in the same 1000 images. The findings were detailed in the International Journal of Reasoning-based Intelligent Systems.