Friday, November 13, 2015

State of JSON UDF functions

When I was leaving Oracle I wrote a blog post saying what I stop working on JSON UDF project. However I still receive bug reports privately and publicly.

Although I feel good seeing my project used I still don't want to restart work on it. After all idea of it was mostly preparing interface for native JSON functions, available in 5.7. Since 5.7 is GA now I strictly recommend to upgrade for everyone who can do it.

If you need JSON functions native ones have big advantage: they are much faster than UDF. This is not only because algorithms used (I never tested mine for speed btw), but also because 5.7 has functional indexes (you can use them with any other functions) and native JSON data type. There are also many new improvements in Performance Schema which make troubleshooting as easy as it never was before, although this topic is not related to JSON.

If you cannot use 5.7 for some reason and want to explore the fact what JSON UDF functions can work with any version you still can download them from Oracle website. I don't know why Oracle removed public link: probably to avoid mix-up with native functions, but download link still works.

Regarding to bugs I don't mind helping to fix them, but I don't want to do it myself solely. I have other toys to play with now. Functions were distributed under GPL license, so everyone can make GitHub project, based on them. I suggest to take version 0.4.0 I could eventually contribute to such a project, but I don't want to lead it myself and don't want to contribute until I have luxury of working with version control system.

Here is the list of current known bugs with JSON UDFs:
  1. "Array indexes are ok only for an array in last position", reported here, fix for JSON_EXTRACT is here , another fix is for JSON_SEARCH. One case still not fixed
  2. "select json_extract('{"f":["foo"],"b":["bar"]}',"$.b[0]"); returns NULL", posted here
  3. SET @doc2 := '[{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]';        
    SELECT @doc2, json_search( @doc2, '1003') AS doc_find, json_extract( @doc2, '$[0].id' ) AS doc_find2;

    /*
    @doc2                                                          docfind    json_extract( @doc2, '$[0].id' ) 
    -------------------------------------------------------------  -------  ----------------------------------
    [{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]  $[0].id  1001                             

    */
    Reported privately

  4. Hi, I saw there was still a bug with json_search and arrays. Will this patch help you?

    https://gist.github.com/xplinscott/10a8e3e82c39c6c07c09

    > select json_search('[[1,2,3],[4,5,6]]', '4');
    outputs:
    $[1][0]


    It seems to fix most things for me, except a case of empty arrays:

    > select json_search('[[],[4,5,6]]', '4');
    outputs:
    $[0][0]
    Reported here

I also received a private question about difference between JSON UDFs and new native functions. I think now answer should be clear, but just in case. UDF functions was a project, devoted to prepare interface for native functions. Idea was to make them pluggable, let users try, so final native version do not have issues which XML functions had. This aim succeed. Blog post by Roland Bauman confirms it. Another confirmation of this is existence of functions JSON_DEPTH and JSON_LENGTH: ideas for these functions were offered by community, firstly implemented in UDFs and now exist in natives. Code base of functions is different, they share user interface only. I did not check, but native functions should pass all tests which existed in test suite for UDFs. And, of course, many other tests.

Besides speed issues UDF interface has other limits. For example, it is not possible to output proper error message after initialization step passed. It is not possible to pass something like "ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "Missing a name for object member." at position 1 in '{]'." into MySQL command prompt or application. I created logging into error log to avoid this limitation, but this workaround is hardly useful in production environment.

I can only repeat myself: if you need JSON functions you should upgrade to 5.7 and use native ones. Only use of UDFs now is for those users who have to stay with 5.6 or older.