Thursday, February 21, 2019

MySQL 8.0 allows unprivileged user access to its data if table mysql.user does not exists

Yesterday my Percona colleague Ceri Williams found a nasty bug in MySQL 8.0. Its twin for Percona Server reported at PS-5431.

He was installing MySQL Server 8.0 having not supported option in his configuration file. Thus initialization failed, but, surprisingly, the subsequent restart was successful and he was able to create, modify and delete tables in his MySQL installation. In other words: he got full access to the database. But he did not create any user account yet!

This new instance of MySQL Server did not have privilege tables, particularly mysql.user, and silently started with --skip-grant-option.

At the first glance starting just initialized MySQL Server with a non-safe option should not cause any harm, because datadir has no data yet. But this is what the experienced user expects! Newbies may miss the fact that their server is now accessible worldwide and start inserting data.

This is bad by itself, but we experienced more and found that the same behavior may happen if someone removes table mysql.user, then let MySQL Server read its privilege tables. E.g., performs a restart.

To demonstrate this behavior I created a table which holds credit card information and inserted data into it:

mysql> create table test.very_important_table(
    -> id int not null primary key,
    -> credit_card_num char(16),
    -> credit_card_owner varchar(256),
    -> credit_card_expire_month char(2),
    -> credit_card_expire_year char(2),
    -> credit_card_cvv char(3))
    -> engine=innodb;
Query OK, 0 rows affected (3.80 sec)

mysql> insert into test.very_important_table values(1, '1234123412341234', 'Sveta Smirnova', '02', '20', '123');
Query OK, 1 row affected (0.04 sec)

mysql> select current_user();
+----------------+
| current_user() |
+----------------+
| root@localhost |
+----------------+
1 row in set (0.00 sec)


After that, I removed the table mysql.user (which I did not expect I will be allowed to do):

mysql> drop table mysql.user;
Query OK, 0 rows affected (0.08 sec)

Then I restarted the server and logged in as user foo and was able to access data in the table with credit card information:

$ mysql -ufoo test
...
mysql> select current_user();
+-----------------------------------+
| current_user()                    |
+-----------------------------------+
| skip-grants user@skip-grants host |
+-----------------------------------+
1 row in set (0.00 sec)

mysql> select * from very_important_table;
+----+------------------+-------------------+--------------------------+-------------------------+-----------------+
| id | credit_card_num  | credit_card_owner | credit_card_expire_month | credit_card_expire_year | credit_card_cvv |
+----+------------------+-------------------+--------------------------+-------------------------+-----------------+
|  1 | 1234123412341234 | Sveta Smirnova    | 02                       | 20                      | 123             |
+----+------------------+-------------------+--------------------------+-------------------------+-----------------+
1 row in set (0.00 sec)

A bug report was considered a feature request in the first turn and now still in the discussion on how serious it is. There are a few reasons why you may not take this flaw seriously.

First, there is no harm when someone has access to the empty MySQL instance. Allowing to connect to such an instance may help DBA to solve the issue, for example, by running the mysql_upgrade command. Thought it does not restore mysql.user table anyway:

mysql> show tables from mysql like 'user';
Empty set (0.00 sec)

mysql> \q
Bye
$ mysql_upgrade
Checking if update is needed.
Checking server version.
Error occurred: Query against mysql.user table failed when checking the mysql.session.

$ mysql

mysql> show tables from mysql like 'user';
Empty set (0.00 sec)

Second, it is not expected that someone would drop table mysql.user manually. It is also not harmful for SQL injection attack if you limit access to your application user: one has to have a privilege which allows updating mysql database to perform the drop. It is good practice to do not grant such privileges for application users.

Third, while it is possible that mysql tablespace may be corrupted on the file system level, it is very unlikely that it will damage only data, belonging to the privilege tables. There are much higher chances that MySQL Server won't be able to start at all after such a corruption happens.

Fourth, to perform such an attack on the file system level, one needs to have access to the file system and can do more significant harm than the behavior reported.

These are the reasons why this behavior may be not considered bad security flow.

However, for new users of MySQL, it is possible to make such a mistake and silently open access to their database for everyone.

Saturday, October 7, 2017

Stepicon 2017 and thoughts on MySQL MOOCs

I periodically do MySQL Troubleshooting tutorials. Every time I run them I have a dilemma on how to make tutorials more interactive. I tried:
  1. Just talk. But troubleshooting is about action, not about sitting and listening 
  2. Put tasks on my own website which required a lot of coding, not related to the talk and tasks themselves.
  3. Put tasks on VirtualBox image. Many attendees do not want to use flash drive or download big image from Internet.
  4. Put tasks into Amazon EC2 instance. This is practically good option, but expensive.
  5. Use machines, provided by tutorial organizers (conference or training center). This is, probably, the best option, but few do it.
I did not like any of these options as final solution. That's why for my last two tutorials I tried Stepik platform. This is how "Troubleshooting MySQL Performance" course was born. Course is in Russian language.

This is not really standard online course which usually include video lectures and tasks. This is just set of tasks which follow my tutorial. I test for what I teach, therefore it is hard to use this course without attending tutorial. However it is something worked and I was pretty satisfied watching how attendees complete tasks online. Good method to measure effectiveness of the tutorial and adjust it on fly.

As usual I found few issues with tasks modules and reported them. Stepik answered very quickly and offered professional video studio for making complete online course.

The offer implies a lot of work for me, but also very attractive, because:
  1. This is a chance to make really great MySQL MOOC course, open for everyone
  2. Professional video would help to perfect course content.
  3. I am free to choose any topic, so course can be for beginners as well as for advanced users.
  4. Current situation with MySQL users is many of them still live in year 2003. I was considering to quit speaking when one of my friends, working with data in really big company, told me in year 2016 what he hears about partitioning support in MySQL for the first time. He attended many conferences with MySQL talks. This shows we are promoting new nice features for audience which already very interested in MySQL. But others quit using MySQL to try competing technologies without knowledge of what MySQL can do.
  5. Stepik has very nice learning concept: short videos each less than 10 minutes and many tests. I like it when I learn.
  6. Test modules, while not perfect for MySQL Troubleshooting course, are very handy to learn new things. I am certainly interested in using them for tutorials.
  7. Stepik can help with student's homework reviews.
However there are things which prevent me from starting this project:
  1. Time to create the course. I have to work and I should have really strong motivation if want to spend a lot of free time, doing this course.
  2. Quality. No matter which experience I have I need help from my honest colleagues who will review full course and tell me what I am doing wrong and what I miss. Or maybe even work on content together.
  3. Time to maintain the course. It is common with MOOCs what students leave comments and expect help with their homeworks. Therefore making a course does not stop when you create final video or homework. Instead, if course becomes popular, it will never end.
So this is how I was invited to Stepicon 2017: event for people who teach on this platform. It was interesting to find out who else use Stepik and what is their motivation. Here is what I found interesting.
  1. Universities, teachers, large and middle-size companies are main content creators at Stepik.
  2. Companies use platform to teach their own employees, popularize products and seek for new hires.
  3. There are closed and open courses.
  4. Some do courses, similarly as I did: offline content and Stepik only for tasks. But I was the only one who did it as tutorial add-on. Most common case is when such tasks are either companion for closed course, accessible inside a company or for a fee, or for offline course in University.
  5. Stepik supports LSM and used inside internal learning platforms as well as together with other MOOC platforms, such as Coursera.
  6. JetBrains demonstrated its own learning tool: PyCharm Edu which is integrated with Stepik.
I am still in doubt if it makes sense for me as individual to do something further than assistance for tutorial, but would be happy if you comment and let me know what you think.

Friday, November 13, 2015

State of JSON UDF functions

When I was leaving Oracle I wrote a blog post saying what I stop working on JSON UDF project. However I still receive bug reports privately and publicly.

Although I feel good seeing my project used I still don't want to restart work on it. After all idea of it was mostly preparing interface for native JSON functions, available in 5.7. Since 5.7 is GA now I strictly recommend to upgrade for everyone who can do it.

If you need JSON functions native ones have big advantage: they are much faster than UDF. This is not only because algorithms used (I never tested mine for speed btw), but also because 5.7 has functional indexes (you can use them with any other functions) and native JSON data type. There are also many new improvements in Performance Schema which make troubleshooting as easy as it never was before, although this topic is not related to JSON.

If you cannot use 5.7 for some reason and want to explore the fact what JSON UDF functions can work with any version you still can download them from Oracle website. I don't know why Oracle removed public link: probably to avoid mix-up with native functions, but download link still works.

Regarding to bugs I don't mind helping to fix them, but I don't want to do it myself solely. I have other toys to play with now. Functions were distributed under GPL license, so everyone can make GitHub project, based on them. I suggest to take version 0.4.0 I could eventually contribute to such a project, but I don't want to lead it myself and don't want to contribute until I have luxury of working with version control system.

Here is the list of current known bugs with JSON UDFs:
  1. "Array indexes are ok only for an array in last position", reported here, fix for JSON_EXTRACT is here , another fix is for JSON_SEARCH. One case still not fixed
  2. "select json_extract('{"f":["foo"],"b":["bar"]}',"$.b[0]"); returns NULL", posted here
  3. SET @doc2 := '[{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]';        
    SELECT @doc2, json_search( @doc2, '1003') AS doc_find, json_extract( @doc2, '$[0].id' ) AS doc_find2;

    /*
    @doc2                                                          docfind    json_extract( @doc2, '$[0].id' ) 
    -------------------------------------------------------------  -------  ----------------------------------
    [{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]  $[0].id  1001                             

    */
    Reported privately

  4. Hi, I saw there was still a bug with json_search and arrays. Will this patch help you?

    https://gist.github.com/xplinscott/10a8e3e82c39c6c07c09

    > select json_search('[[1,2,3],[4,5,6]]', '4');
    outputs:
    $[1][0]


    It seems to fix most things for me, except a case of empty arrays:

    > select json_search('[[],[4,5,6]]', '4');
    outputs:
    $[0][0]
    Reported here

I also received a private question about difference between JSON UDFs and new native functions. I think now answer should be clear, but just in case. UDF functions was a project, devoted to prepare interface for native functions. Idea was to make them pluggable, let users try, so final native version do not have issues which XML functions had. This aim succeed. Blog post by Roland Bauman confirms it. Another confirmation of this is existence of functions JSON_DEPTH and JSON_LENGTH: ideas for these functions were offered by community, firstly implemented in UDFs and now exist in natives. Code base of functions is different, they share user interface only. I did not check, but native functions should pass all tests which existed in test suite for UDFs. And, of course, many other tests.

Besides speed issues UDF interface has other limits. For example, it is not possible to output proper error message after initialization step passed. It is not possible to pass something like "ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "Missing a name for object member." at position 1 in '{]'." into MySQL command prompt or application. I created logging into error log to avoid this limitation, but this workaround is hardly useful in production environment.

I can only repeat myself: if you need JSON functions you should upgrade to 5.7 and use native ones. Only use of UDFs now is for those users who have to stay with 5.6 or older.

Friday, August 14, 2015

Ease of use or consistency

I am working on New features in Performance Schema 5.7 in action tutorial for Percona Live Amsterdam for quite a time already. Probably since version 5.7.3 when instrumentation for metadata locks were introduced and which I presented as a teaser in my combined "General MySQL Troubleshooting" and "Troubleshooting MySQL Performance" seminar I did in South Korea for Oracle University (for 5.6 that time).

In version 5.7.6 instrumentation for variables and status variables were introduced. It supports session, global and user variables. I was very happy to see this addition, especially because originally there was not any duplication in session_variables and global_variables tables. You could simply run query like SELECT * FROM session_status WHERE variable_value>0;  to see list of all status variables which were changed during session. Amazing, isn't it?

But in version 5.7.8 it was fixed. Release notes contain:

When the Performance Schema session variable tables produced output,
they included no rows for global-only variables and thus did not fully
reflect all variable values in effect for the current session. This has
been corrected so that each table has a row for each session variable,
and a row for each global variable that has no session counterpart. This
change applies to the session_variables and session_status tables.

And yes, now session_status table contains such status as innodb_buffer_pool_pages_flushed and session_variables contain innodb_buffer_pool_size as well as all other global-only status and variables which make no sense in session content:

mysql> select * from session_status where variable_name='innodb_buffer_pool_pages_flushed';
+----------------------------------+----------------+
| VARIABLE_NAME                    | VARIABLE_VALUE |
+----------------------------------+----------------+
| Innodb_buffer_pool_pages_flushed | 45             |
+----------------------------------+----------------+
1 row in set (0.00 sec)


mysql> select * from session_variables where variable_name='innodb_buffer_pool_size';
+-------------------------+----------------+
| VARIABLE_NAME           | VARIABLE_VALUE |
+-------------------------+----------------+
| innodb_buffer_pool_size | 25165824       |
+-------------------------+----------------+
1 row in set, 1 warning (0.00 sec)


Honestly I don't know what to think about this change. I am so unhappy what I created bug report, but I am not sure if it will be fixed and even makes sense.

Of course there is workaround for my one-liner to see all status changes in a session:

select ss.variable_name, ss.variable_value from session_status ss left join global_status gs using(variable_name) where ss.variable_value != gs.variable_value or gs.variable_value is null and ss.variable_value>0;

It is simply longer and harder to write. Probably good idea for sys schema to have views with only-session and only-global status and variables.

What do you think?

Wednesday, April 1, 2015

Scripts which I use to automatically build MySQL servers and run tests

Few months ago, when MySQL Engineering Team moved MySQL Server sources to GitHub I found it would be waste of time to manually copy all scripts which I use to regularly and automatically build and test all versions, needed for verifying bug reports. I run these scripts on 3 machines at least. So I started my own GitHub project, called mysql-scripts

Now this project contains four scripts. First one is build.sh which  I use for building. By default it checks out MySQL Server sources, builds them in directory $HOME/src/mysql-VERSION, then installs to $HOME/build/mysql-VERSION But all configuration is settable. For example, now I regularly build Percona Server with single command `build.sh -g percona-server -p "percona_server-" -b 5.1 -b 5.5 -b 5.6` Before using this script you should clone either MySQL or Percona server and checkout each of branches at least once. You also need to create all src/server-name-VERSION directories.

Second script is do_test.sh which I use to run MTR test cases automatically on any number of MySQL or Percona server versions.

Third is ar_test.sh which archives tests.

scp_test.sh copies tests to remote machine.

I wrote about do_test.sh, ar_test.sh and scp_test.sh already They changed a bit after that blog post, but not significantly and functionality remains same.

Thursday, March 12, 2015

Questions for old posts at https://blogs.oracle.com/svetasmirnova/

Tomorrow will be my last day at Oracle. Next day I would not have access to my working blog at https://blogs.oracle.com/svetasmirnova/ But content will stay.

However Oracle blogs require manual managing of comments and I simply don't expect my colleagues will have time to do it in timely manner. Therefore if you have questions about my old blog posts just write them here together with the URL of the post.