Search Engine
This project was done as a project for an information and data retrieval course at Concordia University. Here are some qualities and specifications it has.
- Crawler and Query processor engine built in Python
- Index is of first 10,000 pages from https://concordia.ca
- Returns first 15 most relevant results after scoring
- Uses the BM25 ranking algorithm
- Crawler uses SPIMI to construct the inverted index
- Records frequency of word per doc along with doc ID in the index
- Crawler built from scratch using requests and urlparse libraries
- The Code