XtremIO Gotcha

17th Sept update:  I’ve been contacted directly by EMC, and they say they’ll work through this with us. Sounds promising. There’s been a lot of coverage on the topic. I’ve provided additional links at the end of the post.The funniest comment I’ve read so far is “More like Xtrem uh-oh”.

16th Sept update: here’s an official response from Chad Sakac http://virtualgeek.typepad.com/virtual_geek/2014/09/on-disruptive-upgrades.html

While at VMworld last month, I was networking with attendees when I mentioned my current employer purchased an XtremIO half X-Brick for a VDI project.

One of the guys told me an issue with the array I didn’t believe, so I went to verify it from other sources. The ‘rumour’, was that to upgrade from the current 2.4.x firmware, to version 3.0, the data would be lost, and there’d need to be a complete backup/restore. My first thoughts were “that’s crazy”. 
On returning from VMworld, I passed the news onto my employer, for them to follow up with the integrator and EMC. 
The news came back that in fact it was true. I was stunned.

OMG!! (funny image removed as requested by EMC)

To clarify, firmware 3.0 has NOT been released, but is scheduled for release at the end of September, beginning of October. The upgrade process will require all data to be moved off the array, as all data will be wiped during the upgrade process. The new firmware includes performance benefits and inline compression.

As a customer with limited funds, this is the only array for a VDI project, where the business runs 24/7, so to have to wipe the array has massive impacts.

The integrator has offered a loan device for when firmware 3.0 is available to do the upgrade, but if the project has gone live, it will need to be equivalent in performance with the XtremIO X-Brick. Now that we know, we can plan accordingly. This is the intent of the post.

My opinion, that in 2014, if we need any disruption to update/expand a production storage array, we're doing it wrong.

I'm not sure if they will continue development of the 2.x branch, but you'd hope they would support it for the next 3 years or so if you don't have another array to migrate data to so you can perform an upgrade. I'd hate for the next support call to say "Please upgrade to the latest firmware" before they even begin to troubleshoot an issue.

As firmware 3.0 has NOT been released, perhaps the upgrade process will change from what I've been told when it GA's. I'd like to hope so. We'll have to wait until the firmware has officially been released to know for sure.

I still find it hard to believe, so I'd be Xtremly happy to have EMC correct me. If you have XtremIO units, chase it up with your official channels to confirm the upgrade process.

This also highlights the benefits of networking with other users in your field. Here you'll find out real user experiences. Although always confirm them with the vendor. The devil is in the details.

Updated: For more info, the topic has been reported and debated in the comments on the following websites:

Plenty of other vendors have reached out, stepping on EMC so they can be the successor. If we're interested, we'll contact you.

Comments

Comment by Anonymous on 2014-09-14 00:45:07 +0000

You should talk to the guys from Nutanix and see if they can help you with a sweet replacement deal.

Comment by Anonymous on 2014-09-14 07:36:01 +0000

There are plenty of options out there. The simple fact is EMC should make good. How? I don't know. You should double check for other gotchas though. And don't pay for anything like services from the integrator hat sold you the lemon. Good luck.

Comment by Anonymous on 2014-09-14 12:50:51 +0000

This will be the second, and possibly even the third, time a full data evacuation will have been required with XtremIO firmware upgrades. Keeping all of that metadata in memory has consequences – it increases coupling. It also highlights how pressured EMC has been to push XtremIO early in its history (which had no code written until 2009, and no customers until late 2012).

Comment by Anonymous on 2014-09-14 12:55:22 +0000

Here is a great November 2013 YouTube video from EMC, promising "disruptive upgrades are never required" with XtremIO : [ https://www.youtube.com/watch?v=-uTkO758Wxw ] (XtremIO High Availability Deep Dive) . Eventually every product, after several years, requires some type of disruption, but seriously… this wasnt even a year ago, and the previous upgrade in 2013 was already a full evacuation.

Comment by Anonymous on 2014-09-14 12:58:31 +0000

Note: I just realized the above comments did not include my identity from Twitter. The 10:50pm and 10:55pm Sep 14 comments are from myself, Mark Kulacz (https://twitter.com/markkulacz). I am an employee of NetApp Corp, and my comments do not reflect the opinion of my employer.

Comment by Anonymous on 2014-09-14 16:04:06 +0000

I just want to thank you for bringing this issue to light. I first heard about it some months back, and was under the impression that it was, at least at that time, NDA/futures stuff that couldn't be commented on publicly.

I'm hoping EMC really steps up here and, if the GA code will still require full evacuation, is free and loose with the loaner hardware to get customers through this awkward situation. The big concern here is that I think we've heard "this is the last disruptive upgrade" a couple times now.

Comment by Aditya on 2014-09-14 16:20:11 +0000

In this day and age having to wipe data off to do an upgrade is totally unacceptable. Especially when you have mission critical workloads like you do.

Storage should cater to the applications not the other way around. Manageability of the system should be a key tenant and something they clearly missed the ball on hear. Hopefully they help you out to make this it as painful.

Take a look at cohodata (www.cohodata.com) as they have an interesting solution that elevates the issues you are dealing with.

Comment by Anonymous on 2014-09-14 21:25:10 +0000

Word is that the XtremIO de-dupe block size is being doubled from 4k to 8k (i.e. fixed block sizes) with XIOS 3.0, to reduce the total amount of metadata required. While the new compression capability will be great for e.g. database workloads, if your environment is VDI, you may actually lose a significant amount of your current de-dupe efficiency. It's clear that 3.0 is a major re-write, so I would treat it as essentially "version zero" code and be very cautious about adopting it too quickly. If you consider;
– the regular need for complete, fully destructive/disruptive updates,
– the inability of different EFD sizes/types to co-exist in a cluster (how's that going to work over time?)
– a complete re-architecture of the block size
– the inability to grow an X-Brick to a cluster without a disruption when adding IFB fabrics
– the lack of many basic storage array functions (e.g. replication, QoS)
it's clear that XtremIO is far from being a finished platform. I also wonder if eMLC was actually a viable choice medium-long term? Maybe Betamax re-visited?

Comment by Anonymous on 2014-09-15 15:27:17 +0000

If you want non disruptive upgrade today, scale out, thin provisioning, data compression, data de-duplication, full API integration, no SPOF for true 24/7/365 running http://www.solidfire.com

Comment by rugby01ful on 2014-09-15 19:50:25 +0000

Coming from the leader in De-Dup technology (DataDomain), when you double the de-dup block size – you will decrease the duplication rate massively! I hope EMC is promising that you will see the same disk savings with compression and DeDup. If not – I would start asking for my money back or free storage upgrades like Pure. Sounds like 3.0 is not worth it!

Comment by Chad Sakac on 2014-09-15 23:27:50 +0000

Disclosure – EMCer here (was trying to post earlier)

I've done a detailed blog post with architectural reasons, details, forward-looking comments, and more here: http://virtualgeek.typepad.com/virtual_geek/2014/09/on-disruptive-upgrades.html

It's always fun to see the competitive vendors all pile on – particularly when several are going through big disruptive migrations of their own 🙂 This is indeed something that plagues ALL (yes, ALL) persistence architectures. If you disagree, read my post, and then disagree.

To all the happy XtremIO customers, we stand by you (and will help you through the upgrade along with our partners), this upgrade adds new stuff, but if you don't want the new stuff, 2.4 is completely supported.

I welcome any and all feedback on the post, but if people want additional info, its there openly, transparently, and publicly.

Comment by Anonymous on 2014-09-22 00:40:42 +0000

It's still be marketed that way check their website, even knowing that this is not true. I appreciate a guy like Chad trying to be honest and forthright, but all the while EMC marketing is taking away any shred of credibility he may try to muster. I am wondering if this would be considered false advertising at this point? It is an insult those companies who deliver enterprise class firmware upgrades, and a better architecture.

From their website:

Uptime & Non-Disruptive Upgrades: XtremIO eliminates the need for planned downtime by providing non-disruptive software and firmware upgrades to ensure 7×24 continuous operations.

Comment by Anonymous on 2014-11-11 18:50:09 +0000

EMC was amazing in helping us transition to 3.0 from 2.4
They sent another full brick XTREMIO unit to temporarily transition data from the 2.4 production unit. Once the data was migrated between units, the upgrade to 3.0 on the production unit went super smooth. We transitioned the data back to the production unit, removed the EMC demo unit and voila. Happy and running on firmware 3.0. Thanks EMC!!!

Comment by Andrew Dauncey on 2014-11-11 18:53:59 +0000

That's great to hear EMC looked after you, and the process wasn't that bad.

Thanks for the feedback.

Comment by Anonymous on 2015-08-09 10:19:17 +0000

Your video went private…do you still have it for viewing?

Comment by Anonymous on 2016-02-04 20:36:55 +0000

Funny pic was good fun https://web.archive.org/web/20140926121548/http://www.theoddangryshot.com/2014/09/xtremio-gotcha.html