MySQL and jdbc batch inserts

I was doing some benchmarks on MySQL. Both locally and on Google Cloud SQL. And one performance issue struck me.

Batch inserts are as slow as individual inserts

What it means basically is if you insert 10,000 rows, one row at a time or 10,000 in a batch, it takes about the same time. You pay a server round-trip for each row. How uncool is that? I’ve never seen a database behave this way. I’m already not a big SQL fan. This doesn’t help me become one.

Turns out there’s an easy fix, that is not activated by default by Connector/J, the MySQL jdbc driver. Add rewriteBatchedStatements=true to the connection properties and there you go, the batch is executed in batch mode with a single round-trip to the server. That’s a huge improvement.

Why on Earth is it not the default?

I don’t have a good answer to this question. I’ve got a bad answer though: When values are sent in batch, all values must fit into the max_allowed_packet envelope. If it doesn’t fit, the query will fail. So, it’s safer to let each row fit than try to fit all rows.

A good implementation would activate this behaviour by default, get rid of the max_allowed_packet altogether and use as many server round-trip (one per N rows, not one per row) as needed to send all the data.

Well at least I solved my problem…

Yes and No. Yes, I solved this batch insert counterintuitive behaviour. No, every round trip to the server is still very long. Specially on the Cloud Platform, whether I use Google Cloud SQL or my own SQL hosted on a Compute Engine instance.

Maybe I need to search for some more counterintuitive magic property!

5 thoughts on “MySQL and jdbc batch inserts”

  1. Did you try to combine your insert with “multivalues”, I mean one SQL order for all inserts instead of one batch for 10K inserts?

  2. In my experience, if you’re using hibernate and MySQL, you’ll get the best performance with:
    – ?rewriteBatchedStatements=true in MySQL JDBC URL
    – hibernate.jdbc.batch_size set to something different than 0 (default). Don’t overdo it: 100 might be sufficient.
    – stay away from Identity generator for your ids which effectively disable batching too! Consider using a table generator instead with a reasonably large prefetch of ids depending on your INSERT workload.

    1. Oops. If all you have is a hammer, everything looks like a nail. You did not mention JPA or any ORM. You are probably working directly with JDBC API. Then you are fine with rewriteBatchedStatements alone. Sorry for the noise.

Comments are closed.