Connect the dots! (Unleashing some Zeitgeist Framework sweetness)

There are 3 types of relationships between items:

  1. Content: Lets say two documents have the same content "Hello World" thus there is a probabilty that they are about the same topic. This could be done using indexers and keywords extractors like Tracker Indexer.
  2. Metadata: Tags, mimetypes, etc... are all data that describe the uniqueness of an URI. Data sharing the same tags or "Artist name" or even exist in the same folder are related over these attributes. This area is very well covered by Tracker as well as Organise FW's "Path Projection".
  3. Context: Well there are some things where content is hard to compare such as Websites and Videos. One can extract the metadata, but what if they show no relationships? However the context of the usage they were used in is somehow related. Let's assume I visited Will Smith's profile on IMDB (y)  and somehow then started watching Prince of Bel Air (x). There are a lot of instances where I also watched PoBA(x) too such as work while editing work.py (z). Now to be able to determine which of my activities lead to watching PoBA(x) I need something that can look at PoBA as a single node and look for URI activities that lead to using it as well as URI activities that were initiated by the usage of PoBA. The intensity of the relationship can be determined with the number of times these activities were undertaken in any given time period. This set of nodes(URIs) around PoBA we called activity neighborhoods.

The rest of this post will cover point 3.

BTW the following images are actually plotted by a Zeitgeist framework extension developed by Alexander Gabriel and me in an attempt to implement related items for Gnome Shell and Gnome Zeitgeist.

Activity neighborhoods can be of any degree. By degree I mean the sequence of nodes(URIs) that could possibly lead to the acitviy(x). A relationship of N0(x)  can be seen as the set of items that directly lead to the usage x as well as URIs acitvites initiated by the usage of x.

10:01 edited y 10:04 opened x 10:05 watched z 10:06 edited x

The neighbourhood N0(x) = { -: [y,z], +:[z] }. This shows that z was used once before and after x and y was only used before x making z more relevant to x for now.

"-" stands for incoming and "+" stands for outgoing

N0(z) = { -: [x], +:[x]} we cat see the relationship over 0 nodes.  -N0(z) are the incoming URIs and +N0(z) are the outgoing URIs.

but what if we say N2(z) ??? With this we want all items that

  • either directly lead to or the items that lead to the items that lead to z
  • the items were initiated by z or items that initiated to the usage of the items that lead to the usage of z

It's very easy

N1(z) = N0(z) + ( N0(e)  ) where e is an element in N1(z) =====>          N1(z) = { -:[-N0(z) + y] , +:[N0(z)]} =====>          N1(z) = { -:[x, y] , +:[x]}

So enough theory lets connect the dots. The following images are plot of trees and graphs mapping my current Zeitgeist DB.

The nodes in the plots represent URIs and the arrows represent absolute events on URIs before or following the node attached to.

So first lets demo No("Bad Boys"), here bad boys is presented by "0":

badboysD1

This shows us that items 1-10 followed the usage of 0 while 11-12 were followed by 0. However 2,7,8 and 10 were also again used before 0.

Now for N1("Bad Boys"):

badboysD2

You can see here that everything leads over maximum 1 Node to "0". And only 12 of them lead directly to 0.

The numbers in the plots don't really match since the traversing differs from N0 to N1, however if you take a good look the amount of directly incoming and outgoing arrows to/from "0" is still the same! 10 outgoing and 12 incoming!

Currently the arrows have a low weight because the history of my current netbook is less than a week plus we only cover open/save events. Tomorrow we will implement window focus events to make things more dynamic. Once we can set weights of the arrows with the duration of the event to start or the lifetime of the items we can filter and prioritize the related nodes.

Right now all items in a small cycle with high weightend arrows are considered definite relationships!

Now for a more bad ass plot of my todays activities :)

graph