KnowledgeMap is a project implemented to try to answer the following questions:
- What do I know? Get objective awareness of the subjects in which I have deeper knowledge. This would enable to identify own’s area of expertise. E.g. Do I know more about History or Science? Do I know more about Biology or Physics?
- What do I don’t know? Get awareness of the subjects in which I have very little knowledge. This would enable to discover further subjects to explore. E.g. «Baseball in US» is a large subject with tons of data and trivia of which I’m completely unaware (and will keep it so).
- Do I know more about a subject than any other particular person? Objectively compare the knowledge of two different persons using a randomly generated quiz.
- Which books or sources can expand my knowledge? Whenever I read a book, I need it to be not too trivial (if I already know most of its content) and not too technical (if I lack the basis to understand large portions). By generating the KnowledgeMap of a certain book and overlapping with own’s KnowledgeMap, it would be possible to determine whether the book fits to my own knowledge boundaries and may help to expand it, without losing interest midway.
The name «KnowledgeMap» tries to use the metaphor of a cartographic map. If we represent all the different areas of knowledge as a bidimensional map, there will be shadowy unknown areas (fog of war) representing «ignorance» and some bright zones representing «knowledge».
The first problem arises: what is «all knowledge»? For this purpose, we may use a simplified approach by saying: Wikipedia.
The second problem follows: knowledge is NOT bidimensional, but multidimensional! There are many ways to classify knowledge and the same content could be classified within several disjoint categories at the same time. Therefore we do another simplification here:
- We take «Articles» as the top category and everything follows a hierarchy downwards from there.
- We only take the shortest path within Wikipedia Categories from an certain article to that top category «Articles«.
These are briefly the main ideas:
- We take a Wikipedia dump and upload to a graph database.
- We generate quiz questions from Wikipedia articles.
- The user answers those questions, with either a positive or negative result.
- Parent categories (following the shortest path to Wikipedia category «Articles») inherit those results.
- The system generates a hierarchical heat-map visualization, with white areas representing known categories and black areas representing unknown categories.
Questions are generated by removing one of the Wikipedia links in the article, showing some sentences around that link to provide context and asking the user to fill in the gap. The quiz interface looks like this:
A KnowledgeMap looks like this:
An interactive demo visualization is available here:
There is also another visualization of the individual pages about which questions were asked:
The link between the user and the known (or unknown categories) is calculated with Neo4j using such a Cypher query
MATCH (u:User)-[k:Knows]->(n:Page) WHERE id(u)=193773 WITH n,k,u MATCH path=shortestPath((a:Category)<-[r:In_Category*]-(n)) WHERE a.title='Articles' RETURN path,u LIMIT 1
To sum up: with this approach, by now we may be able to answer previous question 1 («what do I know?«), but not the rest yet. As the old saying goes, now at least I know that I know nothing.
The complete source code and build instruction are available at my GitHub.