Roistr API
This section is to help people use Roistr's semantic relevance engine from the command line. This currently has no limits on analysis and we'll keep it that way as long as we can. We ask that you don't submit more than 100 documents in one go to make sure everyone else can have a crack at this.
We made the API so that companies can assess our relevance engine better. The lack of limits means you can try it with real-world type materials. The example code here is in Python because it's very easy to use. Any language with the ability to read and write JSON should be suitable.
We are in the process of writing a Python script that does most of the heavy lifting which can be downloaded here
Planned categorisation
Unplanned categorisations
Compare all
Status: Working
Also known as best matches, this allows you to submit what we call a category description and test a whole bunch of other documents against it to see which ones have the most semantically similar content.
Case examples
You have a bunch of adverts and your customers have to submit their Twitter name to use your site. However, you want to know which of 4 products / services are best suited. In this case, extract the 20 most recent tweets this customer made and put them into a document. This is your category description and descriptions of the 4 products / services are the documents.
Code
The process is first to encode the category descriptions and the documents into JSON objects. Python's dictionaries convert very well to these.
Each document has its own unique reference. This is your choice and allows you to preserve your company's reference scheme. For simplicitys sake, category descriptions will be encoded with numbers and documents with letters.
The first example will use the API linked to above.
>>> cats = { 1: "...job description in here" }
>>> docs = { a: "...resume number 1...", b: "...resume number 2...", c: "...resume number 3..." }
>>> results = python_roistr.best_matches(docs, cats)
And that's it! We talk about how to interpret the results below.
And now to do it the long way. All the API script does is what is below but it saves you a lot of typing.
>>> cats = { 1: "... most recent 20 tweets ...", 2: "... most recent 20 tweets ..." }
>>> docs = { a: "a ground-breaking action-thriller movie with Film Star as the good guy who gets the bad guys", b: "... (description of product / service 2) ...", c: "... (description of product / service 3) ...", d: "... (description of product / service 4) ..."}
# encode each into json
>>> jcats = json.JSONEncoder().encode(cats)
>>> jdocs = json.JSONEncoder().encode(docs)
# package into a dictionary
>>> vars = { 'docs': docs, 'cats': cats }
# encode using urllib
>>> url_vars = urllib.urlencode(vars)
# send to Roistr!
>>> message = urllib2.urlopen('http://roistr.com/api-planned_categorisations', url_vars)
# wait a few moments and then get the results
>>> encoded_results = message.read()
# convert back from jsonv >>> results = json.loads(encoded_results)
And the variable 'results' now holds your results! It might look something like this:
This looks quite confusing so it's best to break it up.
We submitted 2 category descriptions and the top level of the dictionary contains 2 key-value pairs (keys of '1' and '2') for each of them.
{ u'2': { u'a': { u'sim': 0.44354768, u'rank': 3.0 }, u'b': { u'sim': 0.7843278321, u'rank': 2.0 }, u'c': { u'sim': 0.004342332, rank: 4.0 }, u'd': { u'sim': 0.92354665, u'rank': 1.0 } }
Each of these describes the semantic similarity of the 4 products / services to the 2 category descriptions. Each document that was compared to this category has its own dictionary. This dictionary contains two values: the similarity of this document to the category (0.0 for entirely dissimilar and 1.0 for exactly the same); and its rank.
so for the above example, we can see that documents 'a', 'b', 'c' and 'd' were compared against category description 1. Document 'a' showed a similarity of 0.54354768 and was the highest rank (the most similar). Document 'b' had a similarity of 0.3843278321 and was rank 2; document 'c' has a similarity of 0.02312332 and was rank 4; and document 'd' showed a similarity of 0.12354665 and was rank 3. Therefore document 'a' is the most similar and should be the most relevant to this customer.
For the second customer (category description '2'), product / service 'd' is the most relevant.
Status: Not yet working
If you have a bunch of documents and you want them to be sorted into categories, we can it! All you need to do is suggest a suitable number of categories and Roistr's semantic relevance engine will work out how to be cluster your documents based upon their content.
Case examples
You have a lot of qualitative data about customers: again, it could be their Twitter, Facebook or forum posts; complaints; pretty much anything in plain English. However, you want them sorted meaningfully.
Awaiting API to be set up
Status: Not yet working
This is where you have a bunch of documents and you want to figure out how each relates to every other document. Even with just 50 documents, that's a lot of comparisons for a human to handle! (1275 in case you're curious)
Case example
An information architect is given the content for a large website. There might be hundreds of items, far too many for a card sort. What she can do is to submit them to Roistr, and we'll work out the semantic similarity of each document to every other document. Results will be returned as a similarity matrix along with a graphic dendrogram.
Awaiting API to be set up



