Friday, December 26, 2014

Upcoming Anime in 2015

Januari 8, the first episode of the second season (12 episodes) of Tokyo Ghoul (Pierrot) airs in Japan.  While it's nowhere as good as Parasyte - the maxim, it's still good fun.  Parasyte (Madhouse) will go on for another 12 episodes.  Like it a lot.
Januari 9, Death Parade (12 episodes) makes its debut,  also produced by Madhouse.  It looks really promising.  To my great surprise I grew quite fond of Jojo's Bizarre Adventure: Stardust Crusaders (David) last year.  The Egypt Arc happens to start also on Januari 9.  So nice of them to use Walk like an Egyptian by The Bangles as ending theme.  
One of the biggest disappointments of last year, Aldnoah.zero returns Januari 10,  it might be interesting how they will try to reanimate this cold turkey...

Saturday, December 20, 2014

Christmas beer tasting het Anker

Christmas beer night @ brewery het Anker (Mechelen) presented by Luc De Raedemaeker (bierinhuis.be)





Christmas beers are special beers brewed around this time.  
It was a very interesting evening.  Luc is a great host.  The tasting covered things like the ingredients of beer and the effect on the product, a brief history of beer,  basics of beer tasting and the differences with with wine -tasting and of course the practical beer tasting on this selection.

Presented beers by Luc:

Dobbel Palm (5,7%)


This was given as an introduction / "warming up"

St-Feuillien Christmas (9%)



For me the best beer in this selection.  You should really try this one!


Delirium Chirstmas (10%)


N’Ice Chouffe (10%)


Sint-Bernadus X-mas (10,5%)



Gouden Carolus Christmas (10,5%)


A nice ending of the Ankeravonden in 2014,  curious what 2015 will have in store for us :-)


Thursday, December 18, 2014

Wednesday, December 17, 2014

Westvleteren, Ploegsteert and Westhoek in Belgium

I often make visits to the Westhoek region to resupply my stash of Westvleteren.




It's the ideal moment to visit this beautiful region and its war memorials.   In WW1 this area was drained in blood.  Almost 450 000 people lost their lives in the trenches during the first industrialized conflict.  For the first time things like gas and tanks were used.
There are lots of interesting things to see like the Menin gate  in Ypres,  and the Tyne Cot Cemetery in Zonnebeke.
This time we went a little bit further to Ploegsteert.  Technically it's not Westhoek anymore, as it is situated in the Walloon part of the country.  

In 1990 the English band The Farm released a song 'All together now' about the extraordinary events that took place on Christmas 1914 in this little town. 

People from both sides came to a 'truce' and played a football match in 'no mans land'.   There's now  a memorial.  They even constructed some trenches next to the monument.








  

 Next to it is a cemetery.  Such a waste of young lives.  

In Ploegsteert we also vistited the Hyde Park Memorial.





A few km's from this memorial (back in Flanders) we also stopped at the Island of Ireland Peace Park.  Definitely worth a visit.  


Friday, December 12, 2014

1 year Ebola crisis roundup

With the exception of the Time - Person of the year coverage, it's awfully quiet on the Ebola crisis.  This may give a false sense of relief like we send in the cavalry and now everything is under control and contained over there.

This is simply not true.

Emile Ouamouno died Dec. 6 2013 in Guinea.  The 2 year old child is believed to be the first casualty in the ongoing epidemic (so called patient zero).

The WHO issued a situation report on Dec. 10. So this can serve as a roundup for the first year.

http://www.who.int/csr/disease/ebola/situation-reports/en/

I read somewhere the WHO had set a goal to isolate 100 percent of Ebola cases by Jan. 1.  This goal won't be met.

The WHO also states it will take several months before the outbreak will be under control.  According to Belgian biostatistics  the peak of the epidemic is nowhere near.  The most worrisome country is Sierra Leone,  where reported new cases rise exponentially.

So why did the media stop covering this pandemic like it used to?  Probably because there weren't any new cases in Europe or the U.S.

Meanwhile the tests of the Canadian developed vaccine in Switzerland are put on hold after unexpected side effects in patients.  On the positive side: there were reports that the U.S. vaccine appears to be safe and triggers an immune response.  Obviously a vaccine would turn the odds in our favour.

Thursday, December 11, 2014

Inside Out (Pixar) origineel?

Iedereen vindt Pixar films altijd origineel,  maar ik weet waar ze met Inside Out hun mosterd gehaald hebben.  

Hier de trailer van Inside Out.
https://www.youtube.com/watch?v=qFDQ7eA4394&feature=youtu.be


Everything You Always Wanted to Know About Sex * But Were Afraid to Ask (1972)

https://www.youtube.com/watch?v=dh4LikiGBrQ

Alles kan beter (1997) 
https://www.youtube.com/watch?v=Esc_wFjqHhM

Problem with VSS, hidden partitions,drivers, Windows 2008R2 and VMware

This morning we noticed one of our web services had ceased to work.

I looked at the server and this server was extremely unresponsive.  Yet according to vCenter, the load and I/O on the server were not peculiar.  There was also no problematic I/O on our filer.  There was also plenty of free space on the disks.

The explorer process froze in the console session  and Windows suggested to stop it after the timeout.  I grew tired of waiting and tried to shut down the server.   The server froze again in the shutting down process and after waiting for 15 minutes, I just turned the virtual machine off.
I rebooted and wanted to look what caused the virtual machine to behave like this.

The web application didn't spit out any clues.

Then I checked more general OS stuff and warnings.
I noticed some interesting messages:

  • Reset to device, \Device\RaidPort0, was issued , appeared since 22.00 yesterday (not coincidentally when our backup starts), every minute or so.
  • Event ID: 57 NTFS WarningThe system failed to flush data to the transaction log. Corruption may occur. Appears for ages on the machine.  I remembered I checked this a year or so ago, and wanted to write about this as a follow up on this article.  This issue is related to the System Reserved Partition Windows creates. This partition  holds the Boot Manager code and the Boot Configuration Database. It is also required by the BitLocker Drive Encryption feature. It also causes VSS to misbehave.


Relevant VMware KB's are: 


We updated the LSI driver to  1.32.01 and will now wait for the daily backup to see it the issue is resolved.  Strange the issue pops up now,  the server is running for more than 2 years now.

Wednesday, December 10, 2014

TURLA components for Linux discovered

People from Kaspersky discovered two Turla components designed for Linux.
For people who don't know Turla, a short introduction:   

Turla is a so called APT [= advanced persistent threat].  APT was the buzzword of the security industry this year.  

APT's are cyberattacks in which an attacker gains unauthorized network access and wants to stay undetected for obvious reasons (data theft, sabotage, spying).  Some of these APTs are targeted towards a specific target (e.g. Proximus, some government agency, ...). This is obviously pretty worrisome.

Turla was primarily targeted at the governments and embassies of a number of former Eastern Bloc countries but was found on computers in more than 45 countries. The attacks are probably still ongoing as we speak. 

During a browser session of a user,  the attackers could use a backdoor to do almost everything with the infected computer (eg. copy sensitive files,  install other malware, ...).  The communication was masked as web requests from the browser. Turla remained undetected for almost 4 years and is considered state-sponsored.   Several clues point to Russia.

The way victims were infected is also peculiar: they used two strategies to infect the victim with another malware Trojan.Wipbot:

  • spear phishing emails:  the attacker forges mails.  They appear to come from what the victim considers a trusted source (someone in the company, a trusted partner, ...)
  • a watering hole attack :  the attacker infects a website which the victim is known to visit, via a zero day exploit in Java, Flash  or Internet Explorer, the malware is delivered to the victim. The malware was only delivered to certain IP ranges, once again to avoid unnecessary detection.  According to Symantec,  least 84 (!!) legitimate websites (including government websites) were serving this exploit in 2012.   

Probably Wipbot had to verify if the machine was a possible target.  It then was used to download Turla.  You can read more in-depth info here.  Turla itself had a plug-in mechanism to install extra's when necessary. 

This almost reads like a thriller, don't you think?

Now Kaspersky discovered Linux users are also targeted.  The exploits they use are not known at the moment.  Obvoiusly no root privileges are necessary to transfer your files to Moscow.  

APTs are detected e.g. by analysing all traffic using big data techniques to find patterns or by sandboxing suspicious traffic.   Sandboxing [=first checking this possible threat in a secure container ] can be tricky as some malware has sandbox detecting algorithms.   
  
To use zero day exploits in your "projects" isn't that hard, companies like Vupen sell these.  

More reading about this Linux threat:
http://arstechnica.com/security/2014/12/powerful-highly-stealthy-linux-trojan-may-have-infected-victims-for-years/

Tuesday, December 9, 2014

TEDx presentation Mikko Hypponen

Great presentation of Mikko Hypponen (F-secure) about the current state of the Internet.

My first introduction to the Internet was in 1993.  The dad of my former girlfriend could actually afford it.  I was hooked immediately, such a wondrous place!  It has changed so much since then, and you could say it turned against us.   On the other hand,  we all together (users, companies, governments, ...) are the Internet.  It's up to all of us to make it a better place, and perhaps make the band stop once in a while.

https://www.youtube.com/watch?v=QKe-aO44R7k


Monday, December 8, 2014

NetApp troubles (continued)

A follow up to my blog entry of a few days ago

In december, VMware released vSphere 5.1 U3.  Of course, the first thing I checked was compatibility with our FAS2040.  It is supported!

So it seems NetApp still wants to certify this hardware for some updates.

So I've asked NetApp again:  why is there no 5.5 support?  Is this some kind of policy or did the filer fail some tests?

The answer could be very interesting for future hardware acquisitions...

U2 tickets ordering

9 am this morning:  a lot of fans click on the button "Buy your tickets".  They hope to purchase tickets for the U2 gigs in Antwerp on 13-14 October next year.   A lot of them will leave their PC frustrated the next hour.

As an avid concert lover,  I already have a lot of experience (and indeed experienced a lot of frustration) with these online reservation and purchase systems.   I had a wife and 3 computers "at my disposal", took a few hours leave and it was still a close call!  I had a page open with atomic clock time info, so I'm pretty sure I pushed the button at exactly 9.00 am.

The service that handled the U2 sale,  teleticketservice.com   already did a very good thing:  they seperated ordering and payment.  You have a few hours to relax and finalize payment.

A lot of things can go wrong in the payment procedure:

  • a user can accidently press the wrong button by sheer excitement when he finally gets on the purchase page (yep, happened to me once)  
  • a user's credit card or the credit card reader can fail repeatedly (yep also happened to me) 
  • but the most common cause by far are the timeouts caused by saturation of the service.  
Another important source of sorrow is Tomorrowland (here some touting from Paylogic).  They suggest everything went like a breeze in 2012 but I and a lot of other people were kicked out during the payment procedure.  In 2013 and 2014 it was much worse.  You get a message you are in a queue and have to wait. Refreshing means going back to the end of the queue. Then nothing happens...

Today a lot of people complained some required select boxes malfunctioned.  Indeed, I experienced the problem myself.   I think this problem was Chrome -related.  I immediately retried with Firefox and luckily made it back to the ordering page.  The values of the selectbox were now selectable so I could go on with the ordering procedure.

As I understand using this stuff costs you more than €7 per reservation.

Feel free to contribute some tips...  Definitely check out http://www.bandsintown.com/ to track your favorite artists.

Friday, December 5, 2014

Alignment issues

Before the last update of our NetApp filer,  Proximus asked us to check the alignment of the VM's.
Having read this great blog entry of Duncan Epping,  I already knew something about the problem.
It can have a huge impact on your performance so don't ignore it!

In this specific case, I looked up some NetApp documentation concerning the problem.  TR-3747 also explains the problem and solutions very well from a NetApp perspective.   

Alignment problems vary in complexity depending how you set up your VMware system.  If you use iSCSI you have to check the virtual disks, VMFS and the underlying storage array.  Obviously NFS does not need the VMFS layer so that's a bit easier. 

The best thing is to make sure your virtual machine is properly aligned right after you create it.   You can put these changes in a template for easy deployment.   If the virtual machine is already in production it can be a lot harder/more time consuming.  TR-3747 covers this.

Basically these alignment problems are quickly becoming a thing of the past for these reasons:
  • The OS on virtual machines is aware of the problem  Companies like Microsoft and Red Hat are aware of these problems and Windows 7, 8, 2008, 2008 R2, 2012, RHEL/CentOS 6, Debian 6, Ubuntu 10, 11, 12, SUSE 11 onwards are not affected .  For older OS like RHEL/CentOS 5 (or Windows 2003) you have a problem, though.
  • VMFS has become a lot smarter  If you did your Datastore configuration from vSphere 4 or later and don't use any legacy Datastores,  there are no alignment problems either.
NetApp makes the distinction between functionally aligned and actually aligned.
To make a virtual machine functionally aligned, it is moved to a special datastore which compensates the mis-alignment.  Everything is untouched.   Virtual machines with the same offset can be moved to the same datastore.  To make a virtual machine actually aligned you have to bring it offline and use a tool such as the VMware vCenter Converter, UberAlign or mbralign (NetApp specific) to align it.

NetApp has a built a tool in Virtual Storage Console (VSC) to detect and online (for people who have Storage vMotion) fix misalignments .    

This is how it works:

You scan your datastores.  Your VM's are then categorized into folders.
  • Aligned > Functionally Aligned and the Aligned > Actually Aligned folders :  a virtual machine is functionally (I/O only) or actually aligned. No steps required.
  • Misaligned > Online Migration folder:  a misaligned virtual machine can be aligned using the online feature
  • Offline Migration folder : When a virtual machine cannot use the online alignment (migration) feature.     
  • Misaligned > Other:  the VM cannot be aligned for some reason.  Most common cause :  the VM has multiple disks spanning multiple datastores.  But other causes exist. 
In the latest version of VSC,  mbralign  isn't supported anymore so NetApp probably thinks functional alignment is the preferred method. 

Thursday, December 4, 2014

The quality of VMware backups

Most agentless VMware backup solutions rely on VMware Tools to  quiesce a virtual machine before the actual backup starts.   This process is invoked by taking a temporary snapshot with the quiesce option enabled.   

This step ensures data integrityA snapshot without this important step is only crash-consistent. This means: all files that were open will still exist, but are not guaranteed to be free of incomplete I/O operations or data corruption.

Even so-called storage based snapshots take a VMware snapshot first to trigger VMware tools. 

If something goes wrong in this step, you get inconsistent backups.  Changing to another backup solution/storage vendor will not necessarily solve this problem if the problem is in the quiesce mechanism of the virtual machine.

VMware offers 3 mechanisms to quiesce:  
  • the sync driver
  • the vmsync module
  • Microsoft's Volume Shadow Copy (VSS) service.

The Sync Driver

Targeted VM's:  older Windows OS that doesn't have VSS

The SYNC driver holds incoming I/O writes while it flushes all dirty data to a disk, thus making file systems consistent after a while. If a lot of I/O is taking place on the virtual machine and the quiescing takes a long time, this can cause serious problems for the application causing the I/O.   It is advised to put so called pre-freeze and post-thaw scripts in place to gracefully shut down / pause /do whatever to this application to reduce the I/O during the snapshot.  

  • For Windows systems running ESXi 5.5 these scripts reside in c:\windows and have the fixed names pre-freeze-script.bat and post-thaw-script.bat.   
  • For Linux the files are /usr/sbin/pre-freeze-script and /usr/sbin/pre-freeze-thaw. An interesting Linux command to run in this file is fsfreeze

The vmsync module

Targeted VM's:  Linux systems

This module explicitly needs to be activated at the installation of VMware-tools in Linux.  And is considered experimental. Read: NOT SUPPORTED.

Microsoft's Volume Shadow Copy (VSS) service

This is what's probably used on your Windows VM's right now.

Targeted VM's:  all recent Windows OS from Windows 2003 on, but the implementation varies on the version:
  • Vista and 7:  File-system consistent quiescing
  • 2003:  Application-consistent quiescing
  • 2008, 2008R2, 2012 Application-consistent quiescing but we must meet certain conditions.
These conditions are (I just take this over from the docs)
  • Virtual machine must be running on ESXi 4.1 or later.
  • The UUID attribute must be enabled. It is enabled by default for virtual machines since 4.1
  • The virtual machine must use SCSI disks only and have as many free SCSI slots as the number of disks. Application-consistent quiescing is not supported for virtual machines with IDE disks.
  • The virtual machine must not use dynamic disks.
When some ignorant helpdesk guy tells you to change the enableUUID attribute to False to get rid of Windows errors, you don't meet the conditions anymore and your backups will be downgraded to crash-consistent.
So what happens on the background?
With the installation of VMware-tools, VMware provides the guest OS with a VSS Requestor
VMware Tools is responsible for initiating the VSS snapshot process as the VSS requestor, but the VSS mechanism itself is designed and provided by Microsoft.  Note that Windows Server Backup, Bacula Windows Client are also requestors.  The VSS requestor sets up the overall configuration for the backup operation, including whether the snapshot should be performed in component mode or not, whether to take a snapshot with a bootable system state, and whether the snapshot should be for a full copy or differential backup. 
The VSS provider is the component that takes care of keeping the shadow copies.  Microsoft provides one with Windows, this one is used by VMware.  There exist other VSS providers (like storage arrays etc.).   To check if you are using the Microsoft VSS provider, use vssadmin list providers in the virtual machine.   
Writers are Application-specific software that acts to ensure that application data is ready for shadow copy creation (eg. an Active Directory writer, an Exchange writer, ...). To list all possible writers and their state on the virtual machine, use vssadmin list writers.  Usually, when a writer is available, an application is considered VSS aware.
Depending on the OS, VMware Tools initiates VSS quiescing using either one of these contexts: 
  • VSS_CTX_BACKUP context (this is the standard backup context of VSS) for application quiescing capable guests with backup state set to select components, backup bootable system state with backup type VSS_BT_COPY and no partial file support.  Files on disk will be copied to a backup medium regardless of the state of each file's backup history, and the backup history will not be updated.  All writers and all components are involved by default.  A mechanism to exclude certain writers exists.
  • VSS_CTX_FILE_SHARE_BACKUP context for file system quiescing capable guests.  There is no writer involvement.
Currently there is no way to control any of these parameters.

By now you understand VSS brings together and orchestrates technologies from different parties (storage systems, OS, backup tools, VSS aware software...).  So a lot can go wrong with VSS!  And all the Linux friends need to be very quiet as there is no comparable system in Linux.
This Technet article explains VSS more in depth.

VMware or your backup tool will tell you the snapshot was created successfully but imho the only way to really tell is to check Event Viewer on the virtual machine.  I usually make a custom filter to quickly find these messages.  Look for things like timeouts, errors, warnings and  google on them.  The quality of your backup could depend on it!

Next time I'll present you a specific case.

Wednesday, December 3, 2014

Cool site: Serendipity

https://www.spotify.com/us/arts/serendipity/

System maintenance - WMIC

As patching technology like Shavlik or Secunia is quite expensive and as we are hitting budget constraints,  a responsible sysadmin has to go "the extra mile" to keep his network safe.

Although we have a tool that detects vulnerable applications on the clients and has a reliable way to install updates, we don't have the actual patches.

This means we have to "roll our own patches".   This turns out to be not that difficult except for the usual crap like Adobe software (Flash, Acrobat Reader) where you have to apply for a distribution license and of course Java.

Not only keeps Java nagging about updates to users who don't have the necessary privileges to install  them (why don't they just use Windows update...) but when you install updates it kept the previous version.  Oracle waked up from this stupidity and from 8u20  on  the installation leaves you (the PRIVILEGED user, of course...) with some kind of option to delete older versions,  but as we use unattended installations to update Java this doesn't work either.

Now I found a neat trick with WMI to do this uninstalling.
In a batch script you can query WMI for  (well not all as it turns out...) applications and carry out operations with the results by using call...

Just open a command prompt and type wmic product to see the list of installed software (this can take a while)

To see all Java installations on your machine,  type wmic product where "name like '%%Java%%'"

OK, you also might want to search on %%J2SE%%

Uninstall all those insecure versions of Java 7 with

wmic product where "name like '%%Java 7%%'" call uninstall /nointeractive 

(do the query first without the call part to be sure...),  OK my script is a bit more complex but with a little bit of googling you can find some examples :-)

I distribute the .cmd file with the tool and gone are those pesky Java installations.

Such a very useful command, how un-Microsoft like, I thought...   Turns out it only lists programs installed with MSI packages.  Oh well... my bad.

NetApp troubles

I've been bugged by a problem for more than 2 weeks now.

May 2012 we let Belgacom (now Proximus) build us a small VMware vSphere 5.0 setup with 3 hosts.

We chose NetApp as our shared storage vendor and bought a FAS2040.  A few months later we added an extra shelf.  For a small organisation (130 people) this was quite the investment, but hey some peace of mind has its price.

Since then we never had noteworthy issues, so we were happy customers.  Until a few weeks ago...

As a policy we try to follow new VMware releases (as we pay for  them) for new functionality (supported OSes).  Almost a year ago (January 2014), we did an upgrade from vSphere 5.1 to 5.5.   We did check NetApp's IMT tool and found out this was possible.  The upgraded system worked flawlessly (only a minor issue with logging and VSC).  We also did an ONTAP upgrade [= the OS on the NetApp device] a few weeks ago.

Now we wanted to update to vSphere 5.5U2.
In my preparations I started to read the VMware release notes.
One of the first steps is to check hardware compatiblity in the VMware HCL [=hardware compatibility list]

http://www.vmware.com/resources/compatibility/search.php

To my surprise our FAS wasn't supported anymore for 5.5U2.   Moreover I discovered we were already running an unsupported configuration.   Last supported VMware vSphere version for our filer is 5.1U2 and we were already at 5.5 ...  I (and Proximus for that matter) didn't check the HCL in January assuming that a system costing thousands of Euros still would be supported.

Now every VMware architect will acknowledge the importance of this HCL in designing a configuration as this has implications on the level of support you get from VMware.

Proximus immediately did what they had to do and created a case at NetApp.

We got a reply that NetApp will support our configuration and they work closely together with VMware.  But that was not the question.  Our question was: will VMware support our configuration   as it used to be supported?  NetApp then pointed Proximus to VMware support "as we [=NetApp] has no control of their [=VMware] HCL" (sic).

So Proximus did just that:  VMware confirmed we were running an unsupported configuration and pointed out what NetApp should do to "make it back into the list":  send a request to VMware to (re)certify.  This means VMware will check its performance and compatibility.

The thing is we can't easily go back to the latest supported configuration (5.1U2) because this isn't supported with the new ONTAP version.

The EOA [= End of availability] date of the FAS2040 is 2 nov 2012, software supported 31 oct 2017.  Does this mean EOA also means end of VMware certification for NetApp?

Bottom line: we only had the system for 1,5 years when a minor update of VMware was released, and this isn't supported anymore.  I understand they won't support the next major version of VMware, you can't support stuff forever.

Be careful with this stuff: check and doublecheck the HCL with every upgrade and update, you have been warned!   Also check (if possible) whether the proposed components of your system are about to be replaced by the vendor.

I will post new findings.

For the sake of completeness: the history of VMware 5 releases
https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html