Friday, November 13, 2015

State of JSON UDF functions

When I was leaving Oracle I wrote a blog post saying what I stop working on JSON UDF project. However I still receive bug reports privately and publicly.

Although I feel good seeing my project used I still don't want to restart work on it. After all idea of it was mostly preparing interface for native JSON functions, available in 5.7. Since 5.7 is GA now I strictly recommend to upgrade for everyone who can do it.

If you need JSON functions native ones have big advantage: they are much faster than UDF. This is not only because algorithms used (I never tested mine for speed btw), but also because 5.7 has functional indexes (you can use them with any other functions) and native JSON data type. There are also many new improvements in Performance Schema which make troubleshooting as easy as it never was before, although this topic is not related to JSON.

If you cannot use 5.7 for some reason and want to explore the fact what JSON UDF functions can work with any version you still can download them from Oracle website. I don't know why Oracle removed public link: probably to avoid mix-up with native functions, but download link still works.

Regarding to bugs I don't mind helping to fix them, but I don't want to do it myself solely. I have other toys to play with now. Functions were distributed under GPL license, so everyone can make GitHub project, based on them. I suggest to take version 0.4.0 I could eventually contribute to such a project, but I don't want to lead it myself and don't want to contribute until I have luxury of working with version control system.

Here is the list of current known bugs with JSON UDFs:
  1. "Array indexes are ok only for an array in last position", reported here, fix for JSON_EXTRACT is here , another fix is for JSON_SEARCH. One case still not fixed
  2. "select json_extract('{"f":["foo"],"b":["bar"]}',"$.b[0]"); returns NULL", posted here
  3. SET @doc2 := '[{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]';        
    SELECT @doc2, json_search( @doc2, '1003') AS doc_find, json_extract( @doc2, '$[0].id' ) AS doc_find2;

    /*
    @doc2                                                          docfind    json_extract( @doc2, '$[0].id' ) 
    -------------------------------------------------------------  -------  ----------------------------------
    [{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]  $[0].id  1001                             

    */
    Reported privately

  4. Hi, I saw there was still a bug with json_search and arrays. Will this patch help you?

    https://gist.github.com/xplinscott/10a8e3e82c39c6c07c09

    > select json_search('[[1,2,3],[4,5,6]]', '4');
    outputs:
    $[1][0]


    It seems to fix most things for me, except a case of empty arrays:

    > select json_search('[[],[4,5,6]]', '4');
    outputs:
    $[0][0]
    Reported here

I also received a private question about difference between JSON UDFs and new native functions. I think now answer should be clear, but just in case. UDF functions was a project, devoted to prepare interface for native functions. Idea was to make them pluggable, let users try, so final native version do not have issues which XML functions had. This aim succeed. Blog post by Roland Bauman confirms it. Another confirmation of this is existence of functions JSON_DEPTH and JSON_LENGTH: ideas for these functions were offered by community, firstly implemented in UDFs and now exist in natives. Code base of functions is different, they share user interface only. I did not check, but native functions should pass all tests which existed in test suite for UDFs. And, of course, many other tests.

Besides speed issues UDF interface has other limits. For example, it is not possible to output proper error message after initialization step passed. It is not possible to pass something like "ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "Missing a name for object member." at position 1 in '{]'." into MySQL command prompt or application. I created logging into error log to avoid this limitation, but this workaround is hardly useful in production environment.

I can only repeat myself: if you need JSON functions you should upgrade to 5.7 and use native ones. Only use of UDFs now is for those users who have to stay with 5.6 or older.

Friday, August 14, 2015

Ease of use or consistency

I am working on New features in Performance Schema 5.7 in action tutorial for Percona Live Amsterdam for quite a time already. Probably since version 5.7.3 when instrumentation for metadata locks were introduced and which I presented as a teaser in my combined "General MySQL Troubleshooting" and "Troubleshooting MySQL Performance" seminar I did in South Korea for Oracle University (for 5.6 that time).

In version 5.7.6 instrumentation for variables and status variables were introduced. It supports session, global and user variables. I was very happy to see this addition, especially because originally there was not any duplication in session_variables and global_variables tables. You could simply run query like SELECT * FROM session_status WHERE variable_value>0;  to see list of all status variables which were changed during session. Amazing, isn't it?

But in version 5.7.8 it was fixed. Release notes contain:

When the Performance Schema session variable tables produced output,
they included no rows for global-only variables and thus did not fully
reflect all variable values in effect for the current session. This has
been corrected so that each table has a row for each session variable,
and a row for each global variable that has no session counterpart. This
change applies to the session_variables and session_status tables.

And yes, now session_status table contains such status as innodb_buffer_pool_pages_flushed and session_variables contain innodb_buffer_pool_size as well as all other global-only status and variables which make no sense in session content:

mysql> select * from session_status where variable_name='innodb_buffer_pool_pages_flushed';
+----------------------------------+----------------+
| VARIABLE_NAME                    | VARIABLE_VALUE |
+----------------------------------+----------------+
| Innodb_buffer_pool_pages_flushed | 45             |
+----------------------------------+----------------+
1 row in set (0.00 sec)


mysql> select * from session_variables where variable_name='innodb_buffer_pool_size';
+-------------------------+----------------+
| VARIABLE_NAME           | VARIABLE_VALUE |
+-------------------------+----------------+
| innodb_buffer_pool_size | 25165824       |
+-------------------------+----------------+
1 row in set, 1 warning (0.00 sec)


Honestly I don't know what to think about this change. I am so unhappy what I created bug report, but I am not sure if it will be fixed and even makes sense.

Of course there is workaround for my one-liner to see all status changes in a session:

select ss.variable_name, ss.variable_value from session_status ss left join global_status gs using(variable_name) where ss.variable_value != gs.variable_value or gs.variable_value is null and ss.variable_value>0;

It is simply longer and harder to write. Probably good idea for sys schema to have views with only-session and only-global status and variables.

What do you think?

Wednesday, April 1, 2015

Scripts which I use to automatically build MySQL servers and run tests

Few months ago, when MySQL Engineering Team moved MySQL Server sources to GitHub I found it would be waste of time to manually copy all scripts which I use to regularly and automatically build and test all versions, needed for verifying bug reports. I run these scripts on 3 machines at least. So I started my own GitHub project, called mysql-scripts

Now this project contains four scripts. First one is build.sh which  I use for building. By default it checks out MySQL Server sources, builds them in directory $HOME/src/mysql-VERSION, then installs to $HOME/build/mysql-VERSION But all configuration is settable. For example, now I regularly build Percona Server with single command `build.sh -g percona-server -p "percona_server-" -b 5.1 -b 5.5 -b 5.6` Before using this script you should clone either MySQL or Percona server and checkout each of branches at least once. You also need to create all src/server-name-VERSION directories.

Second script is do_test.sh which I use to run MTR test cases automatically on any number of MySQL or Percona server versions.

Third is ar_test.sh which archives tests.

scp_test.sh copies tests to remote machine.

I wrote about do_test.sh, ar_test.sh and scp_test.sh already They changed a bit after that blog post, but not significantly and functionality remains same.

Thursday, March 12, 2015

Questions for old posts at https://blogs.oracle.com/svetasmirnova/

Tomorrow will be my last day at Oracle. Next day I would not have access to my working blog at https://blogs.oracle.com/svetasmirnova/ But content will stay.

However Oracle blogs require manual managing of comments and I simply don't expect my colleagues will have time to do it in timely manner. Therefore if you have questions about my old blog posts just write them here together with the URL of the post.