Saturday, August 29, 2015

Presenting WikiDataDotNet - Client API for WikiData

WikiData

WikiData is one of those things that sets the mind boggling at the possibilities of the internet. It's a project, started by the WikiMedia foundation, to collect structured data on everything. If you are doing anything related to machine learning, it is the best source of data I have so far found.

It aims to contain an items on everything and for each item a collection of statements describing aspects of it and it's relationship to other items. Everything makes more sense with an example, here is it's record on the item Italy which can be found in the API like so:

This will return a JSON file with sections like:

       "id": "Q38",
       "labels": {  
          "en": {  
           "language": "en",
           "value": "Italy"
         }, 

Here we see the id of the item, in this case Q38 that is used for looking Italy up. Then labels contains the name of Italy in each language. Further down there is also a section aliases that contains alternate names for Italy in every language.

Futher down we get to the really interesting stuff, claims.

          "P36": [  
           {  
             "mainsnak": {  
               "snaktype": "value",  
               "property": "P36",  
               "datavalue": {  
                 "value": {  
                   "entity-type": "item",  
                   "numeric-id": 220  
                 },  
                 "type": "wikibase-entityid"  
               },  
               "datatype": "wikibase-item"  
             },  
             "type": "statement",  
             "qualifiers": {  
               "P580": [  

These are a series of statements about the different aspects of the item. For example the above P36 is a claim about what the capital of Italy is. Claims are also entities in the API, so they can also be looked up like so https://www.wikidata.org/w/api.php?action=wbgetentities&ids=P36

mainsnak is the main statement associated with this claim (a Snak in wikidata is any basic assertion that can be made about an item). These all have a value and a type. In this case the claim that about Italy's capital, the value is a reference to a wiki entry, which can again be looked up from WikiData if you append a Q to the beginning of the numeric id, you my have already worked out what the entity here is https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q220

Other claims on Italy include location, who it shares a border with, public holidays, provinces, basic form of government, head of state, population(across history), head of government, the list is endless(no wait, actually it's 64 entries long).


Presenting WikiDataDotNet

I've been working on a project that needed to query against WikiData from .Net. The only existing .Net API for this I could find is Wikibase.NET for writing wiki bots. It hasn't been updated in a while and unfortunately a quick test reveals it no longer works. At a future date I may fix it up, but in the meantime I've created this quick query only API: WikiDataDotNet

Usage

It currently provides the ability to request entities:
F#
 let italy = WikiDataDotNet.Request.request_entity "Q38"   
C#
 var italy = WikiDataDotNet.Request.request_entity("Q38");  

and do a text search against wiki data:
F#
 let search_result = WikiDataDotNet.Request.search "Headquarters of the U.N"  
C#
 var searchResult = WikiDataDotNet.Request.search("en", "Headquarters of the U.N");  

That's it for functionality so far. My next plans are to make it easier to look up Claims against items and do caching of Claims. Also maybe some kind of LINQ style querying interface would be nice.

Tuesday, August 18, 2015

How do I get my team to start unit testing

A team lead recently asked me(this genuinely happened, this isn't just a rhetorical tick), "How do I get my team to start unit testing?". Which sounds like a great title for a blog post...

In my opinion task of getting a team to write unit tests is really the task of getting a programmer to believe it is in their best interests to write unit tests. There are plenty of tools such as sonar qube to give technical feedback on unit coverage, but without a team buying in they won't achieve much. It is very easy and of little benefit to do unit testing badly. So like a good salesman you need to sell them on why they will benefit from taking the extra time to unit test there already perfectly acceptable code(as they see it(if they don't believe the code they are currently writing is acceptable then there are other problems)).

There are many reasons a person should unit test. Some reasons are noble and good, to do with doing the best job you can, for your company and your fellow professional. But that doesn't work for everyone, so for those less nobly inclined there are also selfish reasons that are still valid.

The noble reasons:

  • Next level success: 1st level success, someone reports a bug, you fix it. Next level success, someone reports a bug, you fix it and you write tests that means no one in the future can reintroduce this bug. Unit testing allows you to future proof your code.
  • Quick feedback: One of the biggest factors in your ability to improve in any activity is your feedback loop. If you want to get good at chess, if you are playing against a good player they can tell you, "that move was bad" immediately after you make a bad move. Otherwise you may have to play the rest of the game and then lose a number of similar games before you work out it was that particular type of move that was the mistake. Unit testing allows you to get much quicker feedback. When you make a change to an application, you don't have to run it, set up the scenario by hand, then check for correct behavior for multiple different behavior. With unit testing you can get the feedback across multiple scenarios across the application in sub 10 seconds.
  • Encourages good design: There have been loads of articles written by better writers than me on this subject. Good application design goes hand in hand with designing for unit testing. Separation of concerns, single responsibility principle, dependency injection, etc.

The less than noble reasons:

  • Plausible deniability: If something goes wrong any where near their code they can point to the unit tests and say "well I know my code works, it must be someone else's problem". This has happened to me, I was asked to write some code that displayed the number of business days old a certain item was. When it started displaying -1 days old in prod. I could take there inputs put them in to my unit test and show that my code was correct(The problem turned out to be we were being sent items from the future due to incorrect date conversion further upstream).
  • Your future employ-ability: Now a days unit testing is so widespread, you will be asked a question about unit testing in most interviews. You may not care so much about how you do in this job, but don't you want to be applying best practice so you can get that shiny new future job.
  • Holding on to requirements: User A asks for a feature to work in a particular way. You make the change and put it into prod, then User B comes to you to complain, he asked for the feature to be in that particular way and now it doesn't work for him. Unit tests can remove a lot of these kind of problems because you can mark the unit test with who requested the functionality on the test. Unit tests are a great way to capture requirements permanently and raise these kinds of conflicting requests earlier.

So now the team are fully behind the plan and raring to go. Well probably not immediately, in teams I've been involved with it takes a good few months of pushing these points and including making sure that unit test percentages are reviewed, committed code is reviewed and unit tests are always required as a part of it. People need to see the benefit from doing increased testing and this may take time and energy. But over time it will happen if you're persistent.