Canonical Voices

Posts tagged with 'industry'

Louis

I have seen this setup documented a few places, but not for Ubuntu so here it goes.

I have used this many time to verify or diagnose Device Mapper Multipath (DM-MPIO) since it is rather easy to fail a path by switching off one of the network interfaces. Nowaday, I use two KVM virtual machines with two NIC each.

Those steps have been tested on Ubuntu 12.04 (Precise) and Ubuntu 14.04 (Trusty). The DM-MPIO section is mostly a cut and paste of the Ubuntu Server Guide

The virtual machine that will act as the iSCSI target provider is called PreciseS-iscsitarget. The VM that will connect to the target is called PreciseS-iscsi. Each one is configured with two network interfaces (NIC) that get their IP addresses from DHCP. Here is an example of the network configuration file :

$ cat /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
#
auto eth1
iface eth1 inet dhcp

The second NIC resolves to the same hostname with a « 2 » appended (i.e. PreciseS-iscsitarget2 and PreciseS-iscsi2)

Setting up the iSCSI Target VM

This is done by installing the following packages :

$ sudo apt-get install iscsitarget iscsitarget-dkms

Edit /etc/default/iscsitarget and change the following line to enable the service :

ISCSITARGET_ENABLE=true

We now proceed to create an iSCSI target (aka disk). This is done by creating a 50 Gb sparse file that will act as our disk :

$ sudo dd if=/dev/zero of=/home/ubuntu/iscsi_disk.img count=0 obs=1 seek=50G

This container is used in the definition of the iSCSI target. Edit the file /etc/iet/ietd.conf. At the bottom, add :

Target iqn.2014-09.PreciseS-iscsitarget:storage.sys0
        Lun 0 Path=/home/ubuntu/iscsi_disk.img,Type=fileio,ScsiId=lun0,ScsiSN=lun0

The iSCSI target service must be restarted for the new target to be accessible

$ sudo service iscsitarget restart


Setting up the iSCSI initiator

To be able to access the iSCSI target, only one package is required :

$ sudo apt-get install open-iscsi

Edit /etc/iscsi/iscsid.conf changing the following:

node.startup = automatic

This will ensure that the iSCSI targets that we discover are enabled automatically upon reboot.

Now we will proceed to discover and connect to the device that we setup in the previous section

$ sudo iscsiadm -m discovery -t st -p PreciseS-iscsitarget
$ sudo iscsiadm -m node --login
$ dmesg | tail
[   68.461405] iscsid (1458): /proc/1458/oom_adj is deprecated, please use /proc/1458/oom_score_adj instead.
[  189.989399] scsi2 : iSCSI Initiator over TCP/IP
[  190.245529] scsi 2:0:0:0: Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
[  190.245785] sd 2:0:0:0: Attached scsi generic sg0 type 0
[  190.249413] sd 2:0:0:0: [sda] 104857600 512-byte logical blocks: (53.6 GB/50.0 GiB)
[  190.250487] sd 2:0:0:0: [sda] Write Protect is off
[  190.250495] sd 2:0:0:0: [sda] Mode Sense: 77 00 00 08
[  190.251998] sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  190.257341]  sda: unknown partition table
[  190.258535] sd 2:0:0:0: [sda] Attached SCSI disk

We can see in the dmesg output that the new device /dev/sda has been discovered. Format the new disk & create a file system. Then verify that everything is correct by mounting and unmounting the new file system.

$ fdisk /dev/sda
n
p
1
<ret>
<ret>
w
$  mkfs -t ext4 /dev/sda1
$ mount /dev/sda1 /mnt
$ umount /mnt

 

Setting up DM-MPIO

Since each of our virtual machines have been configured with two network interfaces, it is possible to reach the iSCSI target through the second interface :

$ iscsiadm -m discovery -t st -p
192.168.1.193:3260,1 iqn.2014-09.PreciseS-iscsitarget:storage.sys0
192.168.1.43:3260,1 iqn.2014-09.PreciseS-iscsitarget:storage.sys0
$ iscsiadm -m node -T iqn.2014-09.PreciseS-iscsitarget:storage.sys0 --login

Now that we have two paths toward our iSCSI target, we can proceed to setup DM-MPIO.

First of all, a /etc/multipath.conf file must exist.  Then we install the needed package :

$ sudo -s
# cat << EOF > /etc/multipath.conf
defaults {
        user_friendly_names yes
}
EOF
# exit
$ sudo apt-get -y install multipath-tools

Two paths to the iSCSI device created previously need to exist for the multipath device to be seen.

# multipath -ll
mpath0 (149455400000000006c756e30000000000000000000000000) dm-2 IET,VIRTUAL-DISK
size=50G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 4:0:0:0 sda 8:0   active ready  running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 5:0:0:0 sdb 8:16  active ready  running

The two paths are indeed visible. We can move forward and verify that the partition table created previously is accessible :

$ sudo fdisk -l /dev/mapper/mpath0

Disk /dev/mapper/mpath0: 53.7 GB, 53687091200 bytes
64 heads, 32 sectors/track, 51200 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0e5e5db1

              Device Boot      Start         End      Blocks   Id  System
/dev/mapper/mpath0p1            2048   104857599    52427776   83  Linux

All that is remaining is to add an entry to the /etc/fstab file so the file system that we created is mounted automatically at boot.  Notice the _netdev entry : this is required otherwise the iSCSI device will not be mounted.

$ sudo -s
# cat << EOF >> /etc/fstab
/dev/mapper/mpath0-part1        /mnt    ext4    defaults,_netdev        0 0
EOF
exit
$ sudo mount -a
$ df /mnt
Filesystem               1K-blocks   Used Available Use% Mounted on
/dev/mapper/mpath0-part1  51605116 184136  48799592   1% /mnt

Read more
caribou

Recently, I have realized a major difference in how customer support is done on Ubuntu.

As you know, Canonical provides official customer support for Ubuntu both on server and desktop. This is the work I do : provice customer with the best level of support on the Ubuntu distribution.  This is also what I was doing on my previous job, but for the Red Hat Enterprise Linux and SuSE Linux Enterprise Server distributions.

The major difference that I recently realized is that, unlike my previous work with RHEL & SLES, the result of my work is now available to the whole Ubuntu community, not just to the customers that may for our support.

Here is an example. Recently one of our customer identified a bug with vm-builder in a very specific case.  The work that I did on this bug resulted in a patch that I submitted to the developers who accepted its inclusion in the code. In my previous life, this fix would have been made available only to customers paying a subscription to the vendors through their official update or service pack services.

With Ubuntu, through Launchpad and the regular community activity, this fix will become available to the whole community through the standard -updates channel of our public archives.

This is true for the vast majority of the fixes that are provided to our customers. As a matter of fact, the public archives are almost the only channel that we have to provide fixes to our customers, hence making them available to the whole Ubuntu community at the same time.  This is different behavior and something that makes me a bit prouder of the work I’m doing.

Read more
caribou

I’m not a big fan^c^c^c user of Twitter, but this morning, there was a retweet of an interesting blog post that I want to share here :

Remote worker vs distributed team

Being a remote worker myself working in a somewhat distributed team, I can very well relate to what the author is talking about. I thought that it was worth sharing in a different way.

Read more
caribou

Profession : support engineer

A while back, Mark talked about some of the professions in the computer industry. Unfortunately I can’t seem trackback those articles.  Instead of waiting for him to talk about the on I am part of, I thought of doing it myself.

In a train back from Paris, where I have spent the day doing an intensive session of troubleshooting for one of our customers, I am reminiscing about the work I have achieved today, the good decision I took which helped us identify a potential cause for  the suspend/resume issue that we were investigating.

First of all, this time I was not the one with the knowledge : Colin King was.  I was there to assist and help in identifying why the laptop was not resuming from suspend. Since the customer plans to deploy 2000 of this specific model, it is better to identify and fix those kind of issues beforehand.

So we went in, him in somewhere in England, me in Paris, both in a constant IRC chat, shooting ideas back and forth.  Well mostly him sending ideas and me shooting back the results and observations, things that I was seeing, important or not, trivial or what I thought was important.  And in those situations where you are part of a chain and not necessarily the one with the most knowledge in the issue being investigated, it is very important to avoid judging on the importance of the information that  you provide.

Too often, the dreaded « oh, I thought that this wasn’t important so I didn’t bother to mention » spell just brings tons of bad magic on the investigation efforts.  I learned about it the hard way, being too often at the other end of the chain, trying to make sense of an investigation with the help of a more junior support colleague or a customer who, too many times, has more important things to do.

This time, one of the most important piece of information came about this way :

(12:08:29) caribou: it’s about lunch time, I may step out to grab a sandwich in the meantime
(12:12:08) cking: sure
(12:12:16) caribou: fyi, it doesn’t completely powers off
(12:12:30) cking: AH
(12:12:45) cking: this means it’s definitely a problem with the power management on the southbridge
(12:12:51) caribou: after the « shutdown », the screen stays on with [28. ....] Power down.
(12:13:19) cking: if it can’t S5 then we can do a S3 either, that’s the same power management register on the southbridge

By simply indicating that, after invoking the ‘shutdown’ command, the laptop did not power off but stayed powered on in some limbo, I brought to Colin’s attention, a situation that for me was trivial, but for him was very important.  I did not know that the suspend/resume functionality and the power off functionality use the same mechanism, the same power management register and that if one did not work, the other couldn’t work either !

From that point on, we were able to target the whole power management subsystem.  After a few more tests on the operating system side, cking became convinced that it was a hardware issue. One easy way to find out is to swap the operating system and see if the problem persists. So here I go, installing Windows 7 on this laptop, installing the graphic driver to get the suspend functionality working to finally be able to test the same functionality on Windows 7.

One other important thing highlighted by cking was to make sure that Windows 7 was also using the ACPI functionalities and not the old APM, which would have rendered our test useless.  A few minutes of searching the web with the support engineer’s best friend : the web search engine pointed to many articles on Microsoft’s website talking about ACPI & Win7. That was sufficient to confirm that it was also using ACPI.

So I went ahead and actioned the « Suspend » functionality on Win7.  The laptop’s screen went blank, but the power button kept blinking with a few more LEDs turned on, which was the main symptom on Ubuntu as well.  This and the fact that shutting down Windows 7 left the laptop in the same kind of limbo’s half powered down state, achieved to convince us that we were facing a hardware problem and not some bug of our operating system.  The customer still have to find out if this is systematic on that model of Sandybridge laptop or this is only happening on this single unit. But now, they know were to look at, especcially  since their back up plan was to deploy those laptop on Windows 7 instead of Ubuntu.

I wanted to write about today’s session because I thing that it shows the kind of investigating work that is expected from a support engineer.  Most of the time, we are not there to fix bugs, but to clearly identify them, to find ways to reproduce them as much as possible.  Our job is to provide the data that will be necessary to identify the flaw and fix it.  Sometimes, the first action is to find workarounds so the user can continue his work while we look for a more definitive solution.  Some of us will even go further and suggest fixes to the developpers or at least help them in fixing those bugs.

A support engineer needs to be investigative, to keep an open mind, to avoid the mistake of being convinced that he already knows the answer before starting to look at the data.  He must be able to ask for help, to be humble about his knowledge and respect the knowledge of others.  And often, it is best to take a step back and look at a problem from different angles.  Sometimes the solution is easier to find from a few steps back.

I have been doing this job for almost fifteen years and I still  get a kick at identifying bugs, in helping users with their issues, in trying to make their computing experience as easy as it can be.  I know that many of my colleagues feel the same about it.  I also think that a support engineer is not just some sysadmin who doesn’t have the skills required to become a developer.  I’ve seen a few too many devs digging in the source code, looking for answers to a problem, when simply opening the log files and searching for known issues was enough to identify and fix the problem.

And don’t get me wrong, I admire the skills of the developers and I’m still hoping to get more and more involved in works similar to theirs, but I also have a great respect all those support engineers that go after the next problem and find a fix for it.

Read more