I have completely rewriten the code in model view mediator and object object builder functions base. This is posted at http://ogbuzz.com/search-engine.html
Since this work was made possible by using ubuntu I decided to port All the new code to open-source. Have fun.
Javascript search engine design
I am progamming a javaScript business-card search engine where double-clicking any words in the page trigger an instant search about that word. You can see the prototype at: www.artimap.com.
lundi 2 mai 2011
lundi 6 décembre 2010
The 2011 model
While working on the code for the search-engine, I was hitting limits where "internet explorer" would not work at all. Search was done by having the javascript read the HTML, problem is that some browser plugins, notably "skype" are modifying the HTML on the go, and the search-engine needed a very strict HTML representation and could not operate with modified HTML.
So came the idea to recode everything starting from 0. In november I started thinking about using JSON exclusively for generating the HTML. With the new version, the engine read exclusively from the JSON, it is insensitive to changes in HTML done by third party scripts like ads or browser plugins.
I also wanted to make search more powerful. Before if you searched for "bio" you would find "biological", "biochemistry" but not "agrobiology". If the search term had some accents like é or ô, or ï, there was also no matches. But to do that I needed to write a very fast acces index of the words. That took two weeks to figure. One of the first things to understand is that building an index for a mini search engine is very different from the one of giant search-engines like "google". The big players have to do with an enormous amount of losely or not at all clearly typed data (every web pages is different), on the opposite, a small search-engine using JSON as a database rely on very structured data, so accessing data is very different.
Another problem that is not evident is multilingual searches and results highlithing when using regular expressions. Regular expressions, when you can find a good one are a very powerful way of modifying some text/HTML, but you need exact search terms to send to the regular expression and so was the need to also send it accentuated searchterms for results highlithing, this took me a few days to get it perfect.
Now even all versions of "internet explorer" are fast.
So came the idea to recode everything starting from 0. In november I started thinking about using JSON exclusively for generating the HTML. With the new version, the engine read exclusively from the JSON, it is insensitive to changes in HTML done by third party scripts like ads or browser plugins.
I also wanted to make search more powerful. Before if you searched for "bio" you would find "biological", "biochemistry" but not "agrobiology". If the search term had some accents like é or ô, or ï, there was also no matches. But to do that I needed to write a very fast acces index of the words. That took two weeks to figure. One of the first things to understand is that building an index for a mini search engine is very different from the one of giant search-engines like "google". The big players have to do with an enormous amount of losely or not at all clearly typed data (every web pages is different), on the opposite, a small search-engine using JSON as a database rely on very structured data, so accessing data is very different.
Another problem that is not evident is multilingual searches and results highlithing when using regular expressions. Regular expressions, when you can find a good one are a very powerful way of modifying some text/HTML, but you need exact search terms to send to the regular expression and so was the need to also send it accentuated searchterms for results highlithing, this took me a few days to get it perfect.
Now even all versions of "internet explorer" are fast.
mercredi 11 août 2010
The prototype should be online this month
The "Image Zoomer" has been reworked a little bit: there is a small timeout at both the mouseOver and the mouseOut events. Now it also Downloads a bigger image when zooming (higher quality), so its more than a simple zoom that would use the same image but just resized (low quality).
All business cards are now rounded top and bottom, that simple design change makes an enormous difference, so much that I am wondering why did not I try that earlier? It's so much better.
Nearly all the code is now generated from jSon data, but the generators still need more work to make them fully fool-proof for production. Also there is a trick to make the search-engine work: all words (with the possible exception of common stopwords) must be enclosed in html tags: <i>"word"</i>, this should be implemented today.
The new HTMl structure is very strict, this has the advantage of making the precise indexing easier, so much that it can be done with basic javascript, no need for big stuff like jQuery or other behemoths (they are good, but sooo big that I would like to lessen their use to get a even faster page load). It will be possible to index each informations in the business card more precisely, subject by subject, so searches would compare adresses with adresses, summaries with summaries... Now each search is global, a search on the summary also searches in the adresses at the same time it looks at the title, the phone number. That will be implemented later since the actual search method is quite good and there is more need for making pages for other cities than Montreal.
All business cards are now rounded top and bottom, that simple design change makes an enormous difference, so much that I am wondering why did not I try that earlier? It's so much better.
Nearly all the code is now generated from jSon data, but the generators still need more work to make them fully fool-proof for production. Also there is a trick to make the search-engine work: all words (with the possible exception of common stopwords) must be enclosed in html tags: <i>"word"</i>, this should be implemented today.
The new HTMl structure is very strict, this has the advantage of making the precise indexing easier, so much that it can be done with basic javascript, no need for big stuff like jQuery or other behemoths (they are good, but sooo big that I would like to lessen their use to get a even faster page load). It will be possible to index each informations in the business card more precisely, subject by subject, so searches would compare adresses with adresses, summaries with summaries... Now each search is global, a search on the summary also searches in the adresses at the same time it looks at the title, the phone number. That will be implemented later since the actual search method is quite good and there is more need for making pages for other cities than Montreal.
lundi 9 août 2010
New Zoom javascript design
It could have been be done with simple CSS code but you get a smoother user experience using a javascript timeout to init the automatic zoomer.
Plus the new background of each business card (vCard or hCard) is a simple gif that should work better in obsolete or/and old or/and buggy web browsers. It's OK to me because the new design is way simpler visually, the HTML is also new and very very simple and is completely generated from a jSon dataBase that is very easy to write and read. For that I have writen a 3 javaScript functions that write HTML from jSon. The main idea was to find a way to represent an HTML structure in a function, so it would be very easy to modify and that I could write like :
htmlTag('html',
htmlTag('h1',data1),
htmlTag('p',data2),
htmlTag('div',data3)
)
Plus the new background of each business card (vCard or hCard) is a simple gif that should work better in obsolete or/and old or/and buggy web browsers. It's OK to me because the new design is way simpler visually, the HTML is also new and very very simple and is completely generated from a jSon dataBase that is very easy to write and read. For that I have writen a 3 javaScript functions that write HTML from jSon. The main idea was to find a way to represent an HTML structure in a function, so it would be very easy to modify and that I could write like :
htmlTag('html',
htmlTag('h1',data1),
htmlTag('p',data2),
htmlTag('div',data3)
)
jeudi 28 janvier 2010
lundi 18 janvier 2010
Backstage programming
I have been busy for the whole last week designing a web form to help me writing the HTML code for the artimap.com search engine. This new page use javaScript to translate what I write into the complex HTML needed by the artimap.com search engine. Each phrases are analysed and words are all enclosed in i HTML TAGS. These i TAGS are essentials for the javaScript engine.
Now I will add some PHP to build a jSon database functionality plus an image "loader".
Very soon artimap.com will have a lot of new informations flowing continually in easily and fast !!!
lundi 28 décembre 2009
Inscription à :
Articles (Atom)