A site for solving at least some of your technical problems...
A site for solving at least some of your technical problems...
One of our computers was setup to delete rows from a database. Anything older than 90 days should be deleted. Only the script to run that DELETE had the incorrect IP address and we weren't watching the database closely so it grew to over 107 million rows and on a small VPS that filled up the hard drive.
At that point, trying to run the DELETE command would generate this error:
ERROR 3019 (HY000) at line 1: Undo Log error: No more space left over in system tablespace for allocating UNDO log pages. Please add new data file to the tablespace or check if filesystem is full or enable auto-extension for the tablespace
Looking around, I could see that the UNDO log pages are used for several reasons such as ROLLBACK and CURSOR consistencies—i.e. a CURSOR blocks the data it reads at the time the CURSOR command is emitted; changes after that do not show the the CURSOR user. For that reason the data needs to be kept until the CURSOR goes away.
What I had to do is change my command line to limit the number of deletions in one go. Until then, absolutely nothing would work otherwise. This is simple with SQL you just have to add a LIMIT order.
DELETE FROM <table-name> WHERE <expr> LIMIT 1000000;
I tried with different sizes and 1,000,000 worked 99% of the time. Less and it wouldtake forever and more would fail pretty often. Remember, when it fails, the DELETE rolls back so in other words nothing gets deleted. So it's important to choose the right size.
Of course, to delete 100+ million rows, I would have to repeat that DELETE over 100 times, so I wrote a little shell script with a couple of for loops:
for i in 1 2 3 4 5 6 7 8 9 0 do for j in 1 2 3 4 5 6 7 8 9 0 do MYSQL_PWD=top-secret mysql -h 127.0.0.1 \ --user=alexis <database-name> \ -e 'DELETE FROM <table-name> WHERE <expr> LIMIT 1000000;' done done
It still took forever, but at least I did not have to run the command manually that many times. My database went from 107 million to just a little under 18 million. So I just deleted around 90 million rows of data!
Now, the main result which I found surprising was the fact that my harddrive was still 100% full. The fact is that you need to then compact the database table to reclaim the freespace. In other words, the DELETE command only marks a row as being deleted, it does not actually get rid of it from the database table file until later. some of that space may get reused, though.
To reclaim the space you want to run the following command:
OPTIMIZE TABLE <table-name>;
If your hard drive was 100% full and was being used live, you may want to run it after you deleted a few million rows and not once you're done. That should get you space quickly. But know that it may have to move a lot of data if you have a really large table and that can take forever too. (I do not know how this database files are being optimized. In Cassandra, they write a new file with the data that was not deleted—so they skip the tombstone—when a full cleanup is run. I don't think they would do that with MySQL. They are more likely to rewrite the whole file by moving things around, a form of defragmentation.)
Only with a full hard drive that command failed for me because... it wanted to create a new file and obviously... there was no more space to do anything about it.
So to salvage the data, I first did a dump as in:
mysqldump <database-name> <table-name> >/backup/data.sql
Then I tried to do a TRUNCATE which you would think would work, right, it just deletes all the data of the table. Well... nope! Even the TRUNCATE failed with an error.
mysql> TRUNCATE log; ERROR 1030 (HY000): Got error 168 from storage engine
So even that was not enough to fix the problem. Now that I had the data in a file, though, I also try a DROP of the table. So I tried that and finally, I got 78% of my hard drive back!
Then I could just restore the data and voilà I deleted the extra data and saved my data (part of it, at least).
mysql --user snaplog --password <database-name> < /backup/data.sql
The restore can take some time as well.
In my case that used 2% of the drive so final result was a saving of 76%. With time the table will use some more, but it should hover to a similar size. The DELETE runs once a day so it should remains contained now. I reduced the amount of data so in my case it will probably take another 10% of my hard drive space once full. The extra 2% is because we DELETE after a day.
Recent Posts on The Linux Page: