Today I encountered a problem that the backup of a Cisco ISE VM (primary/standby deployment) wasn’t working anymore because of the /opt partition run full of disk space. It seems that there is a threshold (>70%) for the opt partition which prevents the ISE from taking backups. It doesn’t matter if you do a backup to a FTP server or use the internal ISE storage, since the ISE Server always creates a local archive file first, using the /opt disk, before uploading it to an external server.
The only way to fix this was to open a TAC case.
disclaimer:
- Manipulating the Linux Filesystem of a Cisco ISE is not recommended Cisco and you risk to lose support.
- The alternative solution worked in my case without side effects but I clearly not recommend this as I permanent workaround
The situation:
- The customer runs an ISE VM deployment consists of two VMs (ISE-VM-Small 600G) running code version 2.1.0.474 (patch 8).
- Config Backup job stopped working a couple of weeks ago because /opt disk space was full (84%) on the primary VM. Secondary VM was fine but fine with (60%).
- Since there were „a lot“ of changes since the last backup, reinstall the ISE VMs with an old backup was no option
Show tech/show disk from the primary ISE VM:
*****************************************
Checking Disk Space...
*****************************************
df -h output...
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 15G 2.8G 11G 21% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 177M 7.6G 3% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sda6 1.9G 6.3M 1.8G 1% /tmp
/dev/sda1 477M 106M 342M 24% /boot
/dev/sda7 173G 134G 30G 82% /opt
/dev/sda3 93M 1.6M 85M 2% /storedconfig
tmpfs 1.6G 0 1.6G 0% /run/user/0
tmpfs 1.6G 0 1.6G 0% /run/user/440
tmpfs 1.6G 0 1.6G 0% /run/user/301
tmpfs 1.6G 0 1.6G 0% /run/user/304
tmpfs 1.6G 0 1.6G 0% /run/user/303
tmpfs 1.6G 0 1.6G 0% /run/user/322
after doing some research and deleting every logfile, purging every database, cleaning up every patch file in the filesystem I was able to lower the size to 74%, still not enough to get the backup job to work again.
So the next thought was to expand the disk space since we are talking about a Virtual Machine this should be pretty easy, huh?
Unfortunately not, during my research, I realized that there is no way to access the underlying filesystem without TAC. There might be a way by booting the VM with a Linux CD and mounting the filesystem, but that is clearly out of scope if you are working on a live system in a productive environment.
So I ended up opening a TAC case.
The TAC engineer responded quickly and we arrange a Webex Session so that he was able to dig into this issue on a root shell level. She sent me two files, which I had to install on a repository, which she used to activate the root shell.
these are the files:
-rw-r--r--@ 1 samuelheinrich staff 5.1K Jul 19 10:58 Jul-RootKey-appbundle-1.0-x86_64.tar.gz -rw-r--r--@ 1 samuelheinrich staff 10K Jul 19 10:58 RootPatch-appbundle-1.4.SSA_NOT_FOR_RELEASE.x86_64.tar.gz
since she connected to the ISE VM with my PC, I was able to capture the root shell login process.
Applying the Root Patch:
HKVISE-02/alpiq# application install RootPatch-appbundle-1.4.SSA_NOT_FOR_RELEASE.x86_64.tar.gz hkvbackup-01 Save the current ADE-OS running configuration? (yes/no) [yes] ? yes Generating configuration... Saved the ADE-OS running configuration to startup successfully Getting bundle to local machine... Unbundling Application Package... Verifying Application Signature... % Bundle signature could not be verified HKVISE-02/alpiq# application install Jul-RootKey-appbundle-1.0-x86_64.tar.gz hkvbackup-01 Save the current ADE-OS running configuration? (yes/no) [yes] ? yes Generating configuration... Saved the ADE-OS running configuration to startup successfully Getting bundle to local machine... Unbundling Application Package... Verifying Application Signature... Initiating Application Install... % Notice: This RootKey bundle is used to install public keys that are % needed to verify the image signature of NOT_FOR_RELEASE % software, such as the RootPatch. Once the NOT_FOR_RELEASE % software is installed the RootKey bundle is no longer needed. % Therefore the RootKey bundle is temporary and will remove % itself in 4 hours. Please install RootPatch or whatever % NOT_FOR_RELEASE bundle needed within the next 4 hours. Application successfully installed HKVISE-02/alpiq# application install RootPatch-appbundle-1.4.SSA_NOT_FOR_RELEASE.x86_64.tar.gz hkvbackup-01 Save the current ADE-OS running configuration? (yes/no) [yes] ? yes Generating configuration... Saved the ADE-OS running configuration to startup successfully Getting bundle to local machine... Unbundling Application Package... Verifying Application Signature... Initiating Application Install... Application successfully installed HKVISE-02/alpiq# HKVISE-02/alpiq#
accessing the root shell:
HKVISE-01/alpiq# HKVISE-01/alpiq# HKVISE-01/alpiq# root_en Password : Password Again : Root patch enabled HKVISE-01/alpiq# root Enter root patch password : Starting root bash shell ... ade #
I assume you can set whatever password you want to access the root shell after you applied the patched. It does not seem like a hardcoded password, you actually set the password, which you’ll, later on, use to log in the root shell.
So can I uses those files from now on to access the root shell on any deployment forever?
I don’t think so, my best guess is, that they have some sort of rotation in those patches, as there is a hint in the filename which would indicate that this only works this month:
Jul-RootKey-appbundle-1.0-x86_64.tar.gz
After the TAC Engineer had access to the root shell, the Engineer checked the source of the disk space usage, which eventually bought her to the conclusion that the partition was filled up with „normal“ database data and there is no way to gain any more disk space by purging or deleting.
Thanks for your time over WebEx, below is the summary : - We accessed ISE root, the largest partition filled up was “oracle”, but no files were erasable as per the output below : ade # du -sch * 11G CSCOcpm 1.7M ORCLfmap 1.4G TimesTen 24K backup 408M irf 4.0K ise-support-bundle 1.2G localdisk 16K lost+found 103G oracle 58M pbis 7.7G storeddata 13M system 598M xgrid 125G total ade # - As per the output below, it seems that you have increased the disk space without applying a re-image of the node, which will cause the change to not take place as it should : ade # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 15G 2.1G 12G 16% / devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 8.8M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/sda6 1.9G 6.3M 1.8G 1% /tmp /dev/sda1 477M 106M 342M 24% /boot /dev/sda3 93M 1.6M 85M 2% /storedconfig /dev/sda7 173G 125G 39G 77% /opt tmpfs 1.6G 0 1.6G 0% /run/user/440 tmpfs 1.6G 0 1.6G 0% /run/user/301 tmpfs 1.6G 0 1.6G 0% /run/user/321 tmpfs 1.6G 0 1.6G 0% /run/user/0 tmpfs 1.6G 0 1.6G 0% /run/user/304 tmpfs 1.6G 0 1.6G 0% /run/user/303 tmpfs 1.6G 0 1.6G 0% /run/user/322 ade # - If you see the highlited above, then you can notice that the size is very small compared to the actual size (600GB). - Please check the document below which says that you need to re-image in case you change the disk space : https://www.cisco.com/c/en/us/td/docs/security/ise/2-4/install_guide/b_ise_InstallationGuide24/b_ise_InstallationGuide24_chapter_01.html#ID-1417-00000074 “Please note that if you increase the disk size of your virtual machine after initial installation, then you must perform a fresh installation of Cisco ISE on your virtual machine to properly detect and utilize the full disk allocation”.
The important detail she discovered was, that the VM Disk size is actually 600G, which would be normal for an ISE-VM-Small deployment.
Under normal circumstances, the /opt partition (/sda7) should alway allocate around 90% of the available disk space, which would be roughly come up with 550G. But as in this case the /opt partition only uses 173G in total.
So how could this happen?
Well, honestly I don’t know. Since this deployment was installed by another company years ago. My best guess is, that they either:
- installed the 200G „poc/lab“ version of the ISE ova and increased the VM disk to 600G, or
- Installed the 600G VM-Small image but reduced the disk to 200G before first boot.
TAC provided me with the following action plan to resolve this issue:
Below are the steps which we need to do now : Save the output of the command “show udi” from both nodes. Export all certifiactes from both nodes. Promote the secondary node to primary. Take a configuration and operational backup form the node . Reove the secondary (old primary) from the deployemnt . Re-image the removed node . Install the patch. Add back the certifiactes. Add the node back to the deployment . Promote the node to primary. Remove the secondary from the deployment . Re-image the node . Install the patch. Add back the certifiactes. Add the node to the deployment .
I’m okay with this action plan since I did not initially install those VM’s and they were clearly messed up from someone else, I, as I wireless professional clearly follow cisco’s advice, since this is a unsupported deployment. BUT…
Swapping primary secondary roles, just to get a backup from a VM, which obviously has enough VM Diskspace, but just not assigned to the filesystem?
Sounds like an unnecessary step to me, then just expanding the filesystem, then take a fresh backup from the primary VM, then move on with the action plan by re-imaging two new VM’s, agree?
Alternativ Workaround:
Here is how you can do it in 3 simple steps:
- Check if there is free unallocated space
- use fdisk to edit the partition table
- enlarging the filesystem
1 Check if there is free unallocated space on the VM disk:
ade # parted /dev/sda print free
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 644GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
32.3kB 1049kB 1016kB Free Space
1 1049kB 525MB 524MB primary ext4 boot
2 525MB 16.3GB 15.7GB primary ext4
3 16.3GB 16.4GB 105MB primary ext4
4 16.4GB 215GB 198GB extended
5 16.4GB 24.7GB 8389MB logical linux-swap(v1)
6 24.7GB 26.8GB 2097MB logical ext4
26.8GB 26.8GB 623kB Free Space
7 26.8GB 215GB 188GB logical ext4
215GB 644GB 429GB Free Space
2 Use fdisk to edit the partition table:
ade # fdisk /dev/sda Welcome to fdisk (util-linux 2.23.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): p Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000f2e67 Device Boot Start End Blocks Id System /dev/sda1 * 2048 1026047 512000 83 Linux /dev/sda2 1026048 31746047 15360000 83 Linux /dev/sda3 31746048 31950847 102400 83 Linux /dev/sda4 31950848 419430399 193739776 5 Extended /dev/sda5 31952896 48336895 8192000 82 Linux swap / Solaris /dev/sda6 48338944 52434943 2048000 83 Linux /dev/sda7 52436992 419430399 183496704 83 Linux
Take a note about the start and end sectors of partition 4, 5, 6 and 7
Those partitions need to be resized, for this, you first need to delete the partition number 4, as they all belong to the „extended partition“
command (m for help): d Partition number (1-7, default 7): 4 Partition 4 is deleted Check the partion table after you deleted the extended partition, it should look like: Command (m for help): p Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000f2e67 Device Boot Start End Blocks Id System /dev/sda1 * 2048 1026047 512000 83 Linux /dev/sda2 1026048 31746047 15360000 83 Linux /dev/sda3 31746048 31950847 102400 83 Linux /dev/sda4 31950848 1258291199 613170176 5 Extended
Then you need to recreate the partition table.
First, create the extended partition, use the whole unallocated disk space for that
Command (m for help): n Partition type: p primary (3 primary, 0 extended, 1 free) e extended Select (default e): e Selected partition 4 First sector (31950848-1258291199, default 31950848): 31950848 Last sector, +sectors or +size{K,M,G} (31950848-1258291199, default 1258291199): 1258291199 Partition 4 of type Extended and of size 584.8 GiB is set
Then move on and recreate the other partition as they were before.
Command (m for help): n All primary partitions are in use Adding logical partition 5 First sector (31952896-1258291199, default 31952896): 31952896 Last sector, +sectors or +size{K,M,G} (31952896-1258291199, default 1258291199): 48336895 Partition 5 of type Linux and of size 7.8 GiB is set Command (m for help): n All primary partitions are in use Adding logical partition 6 First sector (48338944-1258291199, default 48338944): 48338944 Last sector, +sectors or +size{K,M,G} (48338944-1258291199, default 1258291199): 52434943 Partition 6 of type Linux and of size 2 GiB is set
For the last partition (/opt) now use the maximum size (last sector)
Command (m for help): n All primary partitions are in use Adding logical partition 7 First sector (52436992-1258291199, default 52436992): 52436992 Last sector, +sectors or +size{K,M,G} (52436992-1258291199, default 1258291199): 1258291199 Partition 7 of type Linux and of size 575 GiB is set
The partition table should then look like this:
Command (m for help): p Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000f2e67 Device Boot Start End Blocks Id System /dev/sda1 * 2048 1026047 512000 83 Linux /dev/sda2 1026048 31746047 15360000 83 Linux /dev/sda3 31746048 31950847 102400 83 Linux /dev/sda4 31950848 1258291199 613170176 5 Extended /dev/sda5 31952896 48336895 8192000 83 Linux /dev/sda6 48338944 52434943 2048000 83 Linux /dev/sda7 52436992 1258291199 602927104 83 Linux
Note that partition 5 need to be converted to a swap again, you can do this with:
Command (m for help): t Partition number (1-7, default 7): 5 Hex code (type L to list all codes): 82 Changed type of partition 'Linux' to 'Linux swap / Solaris'
The final partition table should look like this:
Command (m for help): p Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000f2e67 Device Boot Start End Blocks Id System /dev/sda1 * 2048 1026047 512000 83 Linux /dev/sda2 1026048 31746047 15360000 83 Linux /dev/sda3 31746048 31950847 102400 83 Linux /dev/sda4 31950848 1258291199 613170176 5 Extended /dev/sda5 31952896 48336895 8192000 82 Linux swap / Solaris /dev/sda6 48338944 52434943 2048000 83 Linux /dev/sda7 52436992 1258291199 602927104 83 Linux
Now save the table!
Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8) Syncing disks. You have new mail in /var/mail/root ade #
it needs a reload of the appliance to reinitialize the filesystem:
ade # exit exit HKVISE-02/alpiq# reload Save the current ADE-OS running configuration? (yes/no) [yes] ? yes Generating configuration... Saved the ADE-OS running configuration to startup successfully Continue with reboot? [y/n] y
3 Enlarging the new filesystem
After the reload there is one more thing to do in the root shell, enlarging the filesystem:
ade # resize2fs -p /dev/sda7 resize2fs 1.42.9 (28-Dec-2013) Filesystem at /dev/sda7 is mounted on /opt; on-line resizing required old_desc_blocks = 22, new_desc_blocks = 72 The filesystem on /dev/sda7 is now 150731776 blocks long.
You should now be able to see the new disk space on the /opt partition:
ade # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 15G 2.1G 12G 16% / devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 8.8M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/sda1 477M 106M 342M 24% /boot /dev/sda6 1.9G 6.2M 1.8G 1% /tmp /dev/sda3 93M 1.6M 85M 2% /storedconfig /dev/sda7 566G 98G 444G 18% /opt tmpfs 1.6G 0 1.6G 0% /run/user/440 tmpfs 1.6G 0 1.6G 0% /run/user/301 tmpfs 1.6G 0 1.6G 0% /run/user/0 tmpfs 1.6G 0 1.6G 0% /run/user/304 tmpfs 1.6G 0 1.6G 0% /run/user/303 tmpfs 1.6G 0 1.6G 0% /run/user/322 ade # exit exit HKVISE-02/alpiq# show disk Internal filesystems: / : 16% used ( 2159940 of 14987616) /dev : 0% used ( 0 of 8124480) /dev/shm : 0% used ( 0 of 8134176) /run : 1% used ( 8932 of 8134176) /sys/fs/cgroup : 0% used ( 0 of 8134176) /boot : 24% used ( 108052 of 487634) /tmp : 1% used ( 6344 of 1983056) /storedconfig : 2% used ( 1584 of 95054) /opt : 18% used ( 101895372 of 593333176) /run/user/440 : 0% used ( 0 of 1626836) /run/user/301 : 0% used ( 0 of 1626836) /run/user/0 : 0% used ( 0 of 1626836) /run/user/304 : 0% used ( 0 of 1626836) /run/user/303 : 0% used ( 0 of 1626836) /run/user/322 : 0% used ( 0 of 1626836) all internal filesystems have sufficient free space
The workaround worked for this deployment without any issue, ISE is still working normal and is now able to backup. This workaround reduced the time to create a backup.
Maybe this would also be a valid way in scenarios where you have to increase the disk space permanently without reimaging.
hello… any chance you send me the root patch files?
thank you
regards
Nuno