2013年7月30日 星期二

Passed EX436 Clustering and Storage Management last sunday.

For sake of self achievement, I planned to get RHCA for some time since 2011 and hopefully this could be done before end of 2013, if everything go smooth.

Last sunday, I just finished EX436 which is my 2nd RHCA exam out of a series of 5. The score was 268 out of 300 which is pretty good enough for me. The major focus of EX436 is clustering and storage management which is an area that I have some experience on but would definitely love to improve. The exam itself is pretty interesting, challenging and fun. Like other famous RHCA blogger out there,  I have the comment that the exam is not really tough given that you are well prepared with the topics.

For those that would like to take this exam, no doubt the best bet would certainly be joining the training class provided by Redhat :-) . But for those that want to save some bucks, you gotta work on your own and a good start would be to revisit the Course Outline here (link). For me, I basically stick with the official Redhat guide of clustering, GFS/GFS2, Multipath, Fencing, LVM and CLVM(all can be found here) and keep practice on my own lab. And, base on latest outline, you may also want to check with XFS and Gluster ( I am not sure how can you find resources from redhat site though). For me, I didn't practise XFS and Gluster in my own lab as I was sticking with previous Course Outline which didn't include XFS and Gluster at all (!!!). So it is a big surprise when I saw those questions in the exam. However, due to my job duties, I did have few exposure on them and luckily those exposure helped me to survive in the exam.
 
People may interest to know the distro or exact version in question, but due to N.D.A agreement I can't say which version it is here :-) . What I would say is , clustering, GFS/GFS2 don't have a major different on RHEL5/6, at least from exam perspective. For me, my lab was based on RHEL 5 and i didn't subscribe to any Redhat subscription service (I copied all rpm based off the ISO and create my own repository tree as well as yum repository configuration files to allow the lab machine to fetch the required packages).

My another advise is to stay calm during the exam. During my exam, I was stucked on a particular task and was going a bit nervous, this lead me mistakenly reboot my host (!!) at the middle of the exam. For those that already took RHCSA/RHCE exam, you may know that your exam system is on a VM that sit on a physical machines dedicated to you. So in my situation I was rebooting my host that caused all my VM suspended and then at that point I didn't know what could happen. You know, the worse case would be a re-image of all exam VM and I have to rebuild everything in the remaining 2 hours out of 4 hours exam duration.  Luckily all my VMs are still there after the host reboot and the only impact was a 15 minutes downtime on my exam environment (million thanks to the examiner who helped recovering my environment though).

The upcoming exam for me would be

EX442    Red Hat Enterprise System Monitoring and Performance Tuning Expertise Exam
EX333    Red Hat Enterprise Security: Network Services Expertise Exam
EX401    Red Hat Enterprise Deployment and Systems Management Expertise Exam

Hopefully I would be taking EX442 on Sept if everything go smooth.  EX442 was well known among RHCA-er for its complexity so I would look forward to give a try on it.

2013年7月24日 星期三

A quick and dirty munin plugins to count number of VM running on RHEL/CentOS based KVM host.

So recently I was configuring munin to monitor some QEMU/KVM hosts which based on generic RHEL servers (Noted, not RHEV) which run libvirtd and QEMU/KVM.

So here is a plugins that I created, it is quick and dirty but this should work as expected. Just copy and paste the plugins file into /etc/munin/plugins/ directory and make sure it is executable (755, ideally), then you should be good.

So here is the content of the file.

[root@localhost plugins]# cat /etc/munin/plugins/vm_count
#!/bin/sh

case $1 in
   config)
        cat <<'EOM'
graph_title Number of VMs
graph_vlabel VMcount
vmcount.label VMcount
vmcount.graph_category Vserver
EOM
        exit 0;;
esac

i=`ps auxww | grep [/]usr/libexec/qemu-kvm | wc -l`
echo -n "vmcount.value "
echo $i


And it is how it would work.

# You should be able to execute it directly from system shell. In this example I had 17 VMs running on the host.

[root@localhost plugins]# pwd
/etc/munin/plugins

[root@localhost plugins]# ./vm_count
vmcount.value 17


# Alternatively, you can test it with munin-run. This is how the script will look like when it is being loaded

[root@localhost plugins]# munin-run vm_count
vmcount.value 17





# And here is the parameters of this plugins.

[root@localhost plugins]# munin-run vm_count  config
graph_title Number of VMs
graph_vlabel VMcount
vmcount.label VMcount
vmcount.graph_category Vserver

2013年7月22日 星期一

Resource [Host:N] is unreachable: Host N: Unable to start instance due to Template systemvm-kvm-3.0.0 has not been completely downloaded to zone N

So,  because of my job duty and I have to deal with Citrix Cloudstack day by day. Recently we are deploying a new advanced zone and for some reason we are seeing errors like this during deploy of our first VM instance.


2013-07-22 22:09:27,625 WARN  [api.commands.DeployVMCmd] (Job-Executor-50:job-534828) Exception:
com.cloud.exception.AgentUnavailableException: Resource [Host:N] is unreachable: Host N: Unable to start instance due to Template systemvm-kvm-3.0.0 has not been completely downloaded to zone N

................
Caused by: com.cloud.utils.exception.CloudRuntimeException: Template systemvm-kvm-3.0.0 has not been completely downloaded to zone N
................
2013-07-22 22:09:27,626 WARN  [cloud.api.ApiDispatcher] (Job-Executor-50:job-534828) class com.cloud.api.ServerApiException : Resource [Host:N] is unreachable: Host N: Unable to start instance due to Template systemvm-kvm-3.0.0 has not been completely downloaded to zone N


So, basically, what Cloudstack doing is to
1. check if there is any valid systemvm template (in this case systemvm-kvm-3.0.0) deployed to the zone.
2. If things works as it should, you should be able to find the installed/downloaded template from table cloud.vm_template, cloud.template_zone_ref and template_host_ref. Hence, if you scan through the template list from the Web GUI, you should be able to see the template be downloaded.

In my case, the template was not downloaded as it should (or marked as downloaded at DB layer), and if you look at the table cloud.template_host_ref, there is some abnormality here.

mysql> select * from  template_host_ref where id=11111\G
*************************** 1. row ***************************
            id: 11111
       host_id: *masked*
   template_id: *masked*
       created: 2013-07-18 17:50:43
  last_updated: 2013-07-22 20:04:52
        job_id: 75a75e55-5280-4ba5-b823-cadbcbe2cc7a
  download_pct: 0
          size: 0
 physical_size: 0
download_state: DOWNLOAD_ERROR
     error_str: No route to host

    local_path: /mnt/SecStorage/04ab8f0b-c4e0-34a4-80b3-457c433acde3/template/tmpl/2/1686/dnld6951269530983090325tmp_
  install_path: NULL
           url: http://download.cloud.com/templates/acton/acton-systemvm-02062012.qcow2.bz2
     destroyed: 0
       is_copy: 0


So, basically the things are 1) download_pct is 0 (while it should be 100 if download succeed), 2) download_state is DOWNLOAD_ERROR (while it should be DOWNLOADED if download successed and 3) error_str is "No route to host".

In my case, the template installation procedures was not completed (though I have completed the cloud-install-sys-tmplt script per official installation guide), at least at DB layer.

So I double checked the secondary storage to make sure the template file is completely downloaded (IMPORTANT!!!, if the file is not there, go through installation guide and re-run cloud-install-sys-tmplt script) and hacked the DB by updating the cloud.template_host_ref table. (Replace "N" with the correct account id and template id respectively)

mysql> updated template_host_ref set download_pct=100, download_state='DOWNLOADED', error_str=NULL, localpath='template/tmpl/N/N' where id=11111\G
*************************** 1. row ***************************
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


Now cloudstack could launch VM as it should.

2013年7月20日 星期六

EX436: Add iptables rule to allow fence_xvmd

Assuming you want to make sure of fence_xvmd to do the VM fencing and you have iptables enabled, you may see issue while fence_xvm (client) send request to fence_xvmd (server). Here is an example,

Here is an example of fence_xvmd (server side)running on dom0 and the multicast address is on 225.0.0.12 (which is the default if option "-a" is not defined when you start fence_xvmd)

[root@dom0 images]# fence_xvmd -L -X -fd -I eth0
-- args @ 0x7fff92fb76d0 --
  args->addr = 225.0.0.12
  args->domain = (null)
  args->key_file = /etc/cluster/fence_xvm.key
  args->op = 2
  args->hash = 2
  args->auth = 2
  args->port = 1229
  args->ifindex = 5
  args->family = 2
  args->timeout = 30
  args->retr_time = 20
  args->flags = 259
  args->debug = 1
-- end args --
My Node ID = 1
Domain                   UUID                                 Owner State
------                   ----                                 ----- -----
Domain-0                 00000000-0000-0000-0000-000000000000 00001 00001
iscsitgt                 743affaf-eae7-6e40-0d1d-e3a3bb1b1eaf 00001 00002
lab1                     20a6e8b6-26a6-a700-b656-63b72b0a407e 00001 00002
lab2                     cb3f49a8-9841-d917-50ab-97425d900da4 00001 00002
Storing iscsitgt
Storing lab1
Storing lab2



So assuming you now fence the vm "lab1" from vm 'lab2" with fence_xvm (the client), you will be seeing something like this. Apparently, the fence_xvm request doesn't seem to connect to fence_xvmd (the fence server) and it keep complaining "Waiting for connection from XVM host daemon."

[root@lab2 ~]# fence_xvm -ddd -H lab1
Debugging threshold is now 3
-- args @ 0x7fffebce4540 --
  args->addr = 225.0.0.12
  args->domain = lab1
  args->key_file = /etc/cluster/fence_xvm.key
  args->op = 2
  args->hash = 2
  args->auth = 2
  args->port = 1229
  args->ifindex = 0
  args->family = 2
  args->timeout = 30
  args->retr_time = 20
  args->flags = 0
  args->debug = 3
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0x7fffebce34f0 (4096 max size)
Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
Sending to 225.0.0.12 via 192.168.0.202
Sending to 225.0.0.12 via 192.168.0.199
Sending to 225.0.0.12 via 10.0.0.202
Sending to 225.0.0.12 via 172.16.0.202
Sending to 225.0.0.12 via 172.16.1.202
Waiting for connection from XVM host daemon.
Sending to 225.0.0.12 via 127.0.0.1
Sending to 225.0.0.12 via 192.168.0.202
Sending to 225.0.0.12 via 192.168.0.199
Sending to 225.0.0.12 via 10.0.0.202
Sending to 225.0.0.12 via 172.16.0.202
Sending to 225.0.0.12 via 172.16.1.202
Waiting for connection from XVM host daemon.


In my scenario, the xen host is enabled with iptables and looking at xvmd side, there is no fence request coming in too. That seems like the fence request was filtered.

To allow the fence request to get in via multicast traffic, we can add below rule to allow the traffic.

# iptables -I INPUT -d 225.0.0.12 -p udp -m udp --dport 1229 -j ACCEPT

Given that the fence_xvmd listen on default ip (225.0.0.12) and port (udp 1229).

Once the rule is added, you can retry fencing and now you would see something similar to this.

[root@lab2 ~]# fence_xvm -ddd -H lab1
Debugging threshold is now 3
-- args @ 0x7fffb74cc2a0 --
  args->addr = 225.0.0.12
  args->domain = lab1
  args->key_file = /etc/cluster/fence_xvm.key
  args->op = 2
  args->hash = 2
  args->auth = 2
  args->port = 1229
  args->ifindex = 0
  args->family = 2
  args->timeout = 30
  args->retr_time = 20
  args->flags = 0
  args->debug = 3
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0x7fffb74cb250 (4096 max size)
Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
Sending to 225.0.0.12 via 192.168.0.202
Sending to 225.0.0.12 via 192.168.0.199
Sending to 225.0.0.12 via 10.0.0.202
Sending to 225.0.0.12 via 172.16.0.202
Sending to 225.0.0.12 via 172.16.1.202
Waiting for connection from XVM host daemon.
Issuing TCP challenge
Responding to TCP challenge
TCP Exchange + Authentication done...
Waiting for return value from XVM host
Remote: Operation was successful