increase Cisco ISE VM Diskspace without re-image the VM from the root shell

Today I encountered a problem that the backup of a Cisco ISE VM (primary/standby deployment) wasn’t working anymore because of the /opt partition run full of disk space. It seems that there is a threshold (>70%) for the opt partition which prevents the ISE from taking backups. It doesn’t matter if you do a backup to a FTP server or use the internal ISE storage, since the ISE Server always creates a local archive file first, using the /opt disk, before uploading it to an external server.
The only way to fix this was to open a TAC case.

disclaimer:

  • Manipulating the Linux Filesystem of a Cisco ISE is not recommended Cisco and you risk to lose support.
  • The alternative solution worked in my case without side effects but I clearly not recommend this as I permanent workaround

The situation:

  • The customer runs an ISE VM deployment consists of two VMs  (ISE-VM-Small 600G) running code version 2.1.0.474 (patch 8).
  • Config Backup job stopped working a couple of weeks ago because /opt disk space was full (84%) on the primary VM. Secondary VM was fine but fine with (60%).
  • Since there were „a lot“ of changes since the last backup, reinstall the ISE VMs with an old backup was no option

Show tech/show disk from the primary ISE VM:

*****************************************
Checking Disk Space...
*****************************************
df -h output... 
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 15G 2.8G 11G 21% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 177M 7.6G 3% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sda6 1.9G 6.3M 1.8G 1% /tmp
/dev/sda1 477M 106M 342M 24% /boot
/dev/sda7 173G 134G 30G 82% /opt
/dev/sda3 93M 1.6M 85M 2% /storedconfig
tmpfs 1.6G 0 1.6G 0% /run/user/0
tmpfs 1.6G 0 1.6G 0% /run/user/440
tmpfs 1.6G 0 1.6G 0% /run/user/301
tmpfs 1.6G 0 1.6G 0% /run/user/304
tmpfs 1.6G 0 1.6G 0% /run/user/303
tmpfs 1.6G 0 1.6G 0% /run/user/322

after doing some research and deleting every logfile, purging every database, cleaning up every patch file in the filesystem I was able to lower the size to 74%, still not enough to get the backup job to work again.

So the next thought was to expand the disk space since we are talking about a Virtual Machine this should be pretty easy, huh?

Unfortunately not, during my research, I realized that there is no way to access the underlying filesystem without TAC. There might be a way by booting the VM with a Linux  CD and mounting the filesystem, but that is clearly out of scope if you are working on a live system in a productive environment.

So I ended up opening a TAC case.

The TAC engineer responded quickly and we arrange a Webex Session so that he was able to dig into this issue on a root shell level. She sent me two files, which I had to install on a repository, which she used to activate the root shell.

these are the files:

-rw-r--r--@ 1 samuelheinrich staff 5.1K Jul 19 10:58 Jul-RootKey-appbundle-1.0-x86_64.tar.gz
-rw-r--r--@ 1 samuelheinrich staff 10K Jul 19 10:58 RootPatch-appbundle-1.4.SSA_NOT_FOR_RELEASE.x86_64.tar.gz

since she connected to the ISE VM with my PC, I was able to capture the root shell login process.

Applying the Root Patch:

HKVISE-02/alpiq# application install RootPatch-appbundle-1.4.SSA_NOT_FOR_RELEASE.x86_64.tar.gz hkvbackup-01
Save the current ADE-OS running configuration? (yes/no) [yes] ? yes
Generating configuration...
Saved the ADE-OS running configuration to startup successfully
Getting bundle to local machine...
Unbundling Application Package...
Verifying Application Signature...
% Bundle signature could not be verified
HKVISE-02/alpiq# application install Jul-RootKey-appbundle-1.0-x86_64.tar.gz hkvbackup-01
Save the current ADE-OS running configuration? (yes/no) [yes] ? yes
Generating configuration...
Saved the ADE-OS running configuration to startup successfully
Getting bundle to local machine...
Unbundling Application Package...
Verifying Application Signature...
Initiating Application Install...
% Notice: This RootKey bundle is used to install public keys that are
% needed to verify the image signature of NOT_FOR_RELEASE
% software, such as the RootPatch. Once the NOT_FOR_RELEASE
% software is installed the RootKey bundle is no longer needed. 
% Therefore the RootKey bundle is temporary and will remove 
% itself in 4 hours. Please install RootPatch or whatever 
% NOT_FOR_RELEASE bundle needed within the next 4 hours.
Application successfully installed
HKVISE-02/alpiq# application install RootPatch-appbundle-1.4.SSA_NOT_FOR_RELEASE.x86_64.tar.gz hkvbackup-01
Save the current ADE-OS running configuration? (yes/no) [yes] ? yes
Generating configuration...
Saved the ADE-OS running configuration to startup successfully
Getting bundle to local machine...
Unbundling Application Package...
Verifying Application Signature...
Initiating Application Install...
Application successfully installed
HKVISE-02/alpiq# 
HKVISE-02/alpiq#

accessing the root shell:

 
HKVISE-01/alpiq#
HKVISE-01/alpiq#
HKVISE-01/alpiq# root_en
Password : 
Password Again :
Root patch enabled

HKVISE-01/alpiq# root
Enter root patch password : 
Starting root bash shell ... 
ade #

I assume you can set whatever password you want to access the root shell after you applied the patched. It does not seem like a hardcoded password, you actually set the password, which you’ll, later on, use to log in the root shell.

So can I uses those files from now on to access the root shell on any deployment forever?


I don’t think so, my best guess is, that they have some sort of rotation in those patches, as there is a hint in the filename which would indicate that this only works this month:

Jul-RootKey-appbundle-1.0-x86_64.tar.gz 

After the TAC Engineer had access to the root shell, the Engineer checked the source of the disk space usage, which eventually bought her to the conclusion that the partition was filled up with „normal“ database data and there is no way to gain any more disk space by purging or deleting.

Thanks for your time over WebEx, below is the summary :
 
-          We accessed ISE root, the largest partition filled up was “oracle”, but no files were erasable as per the output below :
 
ade # du -sch *
11G     CSCOcpm
1.7M    ORCLfmap
1.4G    TimesTen
24K     backup
408M    irf
4.0K    ise-support-bundle
1.2G    localdisk
16K     lost+found
103G    oracle
58M     pbis
7.7G    storeddata
13M     system
598M    xgrid
125G    total
ade #
-          As per the output below, it seems that you have increased the disk space without applying a re-image of the node, which will cause the change to not take place as it should :
ade # df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        15G  2.1G   12G  16% /
devtmpfs        7.8G     0  7.8G   0% /dev
tmpfs           7.8G     0  7.8G   0% /dev/shm
tmpfs           7.8G  8.8M  7.8G   1% /run
tmpfs           7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/sda6       1.9G  6.3M  1.8G   1% /tmp
/dev/sda1       477M  106M  342M  24% /boot
/dev/sda3        93M  1.6M   85M   2% /storedconfig
/dev/sda7       173G  125G   39G  77% /opt
tmpfs           1.6G     0  1.6G   0% /run/user/440
tmpfs           1.6G     0  1.6G   0% /run/user/301
tmpfs           1.6G     0  1.6G   0% /run/user/321
tmpfs           1.6G     0  1.6G   0% /run/user/0
tmpfs           1.6G     0  1.6G   0% /run/user/304
tmpfs           1.6G     0  1.6G   0% /run/user/303
tmpfs           1.6G     0  1.6G   0% /run/user/322
ade #
 
-          If you see the highlited above, then you can notice that the size is very small compared to the actual size (600GB).
-          Please check the document below which says that you need to re-image in case you change the disk space :
https://www.cisco.com/c/en/us/td/docs/security/ise/2-4/install_guide/b_ise_InstallationGuide24/b_ise_InstallationGuide24_chapter_01.html#ID-1417-00000074
 
“Please note that if you increase the disk size of your virtual machine after initial installation, then you must perform a fresh installation of Cisco ISE on your virtual machine to properly detect and utilize the full disk allocation”.
 

The important detail she discovered was, that the VM Disk size is actually 600G, which would be normal for an ISE-VM-Small deployment.
Under normal circumstances, the /opt partition (/sda7) should alway allocate around 90% of the available disk space, which would be roughly come up with 550G. But as in this case the /opt partition only uses 173G in total.

So how could this happen?

Well, honestly I don’t know. Since this deployment was installed by another company years ago. My best guess is, that they either:

  • installed the 200G „poc/lab“ version of the ISE ova and increased the VM disk to 600G, or
  • Installed the 600G VM-Small image but reduced the disk to 200G before first boot.

TAC provided me with the following action plan to resolve this issue:

Below are the steps which we need to do now :
 
 Save the output of the command “show udi” from both nodes.
 Export all certifiactes from both nodes.
 Promote the secondary node to primary.
 Take a configuration and operational backup form the node .
 Reove the secondary (old primary) from the deployemnt .
 Re-image the removed node .
 Install the patch.
 Add back the certifiactes.
 Add the node back to the deployment .
 Promote the node to primary.
 Remove the secondary from the deployment .
 Re-image the node .
 Install the patch.
 Add back the certifiactes.
 Add the node to the deployment .

I’m okay with this action plan since I did not initially install those VM’s and they were clearly messed up from someone else, I, as I wireless professional clearly follow cisco’s advice, since this is a unsupported deployment. BUT…

Swapping primary secondary roles, just to get a backup from a VM, which obviously has enough VM Diskspace, but just not assigned to the filesystem?
Sounds like an unnecessary step to me, then just expanding the filesystem, then take a fresh backup from the primary VM, then move on with the action plan by re-imaging two new VM’s,  agree?

Alternativ Workaround:

Here is how you can do it in 3 simple steps:

  • Check if there is free unallocated space
  • use fdisk to edit the partition table
  • enlarging the filesystem

1 Check if there is free unallocated space on the VM disk:

ade # parted /dev/sda print free
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 644GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
 
Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
1      1049kB  525MB   524MB   primary   ext4            boot
2      525MB   16.3GB  15.7GB  primary   ext4
3      16.3GB  16.4GB  105MB   primary   ext4
4      16.4GB  215GB   198GB   extended
5      16.4GB  24.7GB  8389MB  logical   linux-swap(v1)
6      24.7GB  26.8GB  2097MB  logical   ext4
        26.8GB  26.8GB  623kB             Free Space
7      26.8GB  215GB   188GB   logical   ext4
        215GB   644GB   429GB             Free Space

2 Use fdisk to edit the partition table:

ade # fdisk /dev/sda
Welcome to fdisk (util-linux 2.23.2).
 
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
 
 
Command (m for help): p
 
Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000f2e67
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048    31746047    15360000   83  Linux
/dev/sda3        31746048    31950847      102400   83  Linux
/dev/sda4        31950848   419430399   193739776    5  Extended
/dev/sda5        31952896    48336895     8192000   82  Linux swap / Solaris
/dev/sda6        48338944    52434943     2048000   83  Linux
/dev/sda7        52436992   419430399   183496704   83  Linux

Take a note about the start and end sectors of partition 4, 5, 6 and 7

Those partitions need to be resized, for this, you first need to delete the partition number 4, as they all belong to the „extended partition“

command (m for help): d
Partition number (1-7, default 7): 4
Partition 4 is deleted
Check the partion table after you deleted the extended partition, it should look like:
Command (m for help): p
Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000f2e67
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048    31746047    15360000   83  Linux
/dev/sda3        31746048    31950847      102400   83  Linux
/dev/sda4        31950848  1258291199   613170176    5  Extended

Then you need to recreate the partition table.

First, create the extended partition, use the whole unallocated disk space for that

Command (m for help): n
Partition type:
   p   primary (3 primary, 0 extended, 1 free)
   e   extended
Select (default e): e
Selected partition 4
First sector (31950848-1258291199, default 31950848): 31950848
Last sector, +sectors or +size{K,M,G} (31950848-1258291199, default 1258291199): 1258291199
Partition 4 of type Extended and of size 584.8 GiB is set

Then move on and recreate the other partition as they were before.

Command (m for help): n
All primary partitions are in use
Adding logical partition 5
First sector (31952896-1258291199, default 31952896): 31952896
Last sector, +sectors or +size{K,M,G} (31952896-1258291199, default 1258291199): 48336895
Partition 5 of type Linux and of size 7.8 GiB is set
 
Command (m for help): n
All primary partitions are in use
Adding logical partition 6
First sector (48338944-1258291199, default 48338944): 48338944
Last sector, +sectors or +size{K,M,G} (48338944-1258291199, default 1258291199): 52434943
Partition 6 of type Linux and of size 2 GiB is set

For the last partition (/opt) now use the maximum size (last sector)

Command (m for help): n
All primary partitions are in use
Adding logical partition 7
First sector (52436992-1258291199, default 52436992): 52436992
Last sector, +sectors or +size{K,M,G} (52436992-1258291199, default 1258291199): 1258291199
Partition 7 of type Linux and of size 575 GiB is set

The partition table should then look like this:

Command (m for help): p
 
Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000f2e67
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048    31746047    15360000   83  Linux
/dev/sda3        31746048    31950847      102400   83  Linux
/dev/sda4        31950848  1258291199   613170176    5  Extended
/dev/sda5        31952896    48336895     8192000   83  Linux
/dev/sda6        48338944    52434943     2048000   83  Linux
/dev/sda7        52436992  1258291199   602927104   83  Linux

Note that partition 5 need to be converted to a swap again, you can do this with:

Command (m for help): t
Partition number (1-7, default 7): 5
Hex code (type L to list all codes): 82 
Changed type of partition 'Linux' to 'Linux swap / Solaris'

The final partition table should look like this:

Command (m for help): p
 
Disk /dev/sda: 644.2 GB, 644245094400 bytes, 1258291200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000f2e67
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048    31746047    15360000   83  Linux
/dev/sda3        31746048    31950847      102400   83  Linux
/dev/sda4        31950848  1258291199   613170176    5  Extended
/dev/sda5        31952896    48336895     8192000   82  Linux swap / Solaris
/dev/sda6        48338944    52434943     2048000   83  Linux
/dev/sda7        52436992  1258291199   602927104   83  Linux

Now save the table!

Command (m for help): w  
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
You have new mail in /var/mail/root
ade #

it needs a reload of the appliance to reinitialize the filesystem:

ade # exit
exit
HKVISE-02/alpiq# reload
Save the current ADE-OS running configuration? (yes/no) [yes] ? yes     
Generating configuration...
Saved the ADE-OS running configuration to startup successfully
Continue with reboot? [y/n] y
 

3 Enlarging the new filesystem

After the reload there is one more thing to do in the root shell, enlarging the filesystem:

ade # resize2fs -p /dev/sda7
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/sda7 is mounted on /opt; on-line resizing required
old_desc_blocks = 22, new_desc_blocks = 72
The filesystem on /dev/sda7 is now 150731776 blocks long.

You should now be able to see the new disk space on the /opt partition:

ade # df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        15G  2.1G   12G  16% /
devtmpfs        7.8G     0  7.8G   0% /dev
tmpfs           7.8G     0  7.8G   0% /dev/shm
tmpfs           7.8G  8.8M  7.8G   1% /run
tmpfs           7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/sda1       477M  106M  342M  24% /boot
/dev/sda6       1.9G  6.2M  1.8G   1% /tmp
/dev/sda3        93M  1.6M   85M   2% /storedconfig
/dev/sda7       566G   98G  444G  18% /opt
tmpfs           1.6G     0  1.6G   0% /run/user/440
tmpfs           1.6G     0  1.6G   0% /run/user/301
tmpfs           1.6G     0  1.6G   0% /run/user/0
tmpfs           1.6G     0  1.6G   0% /run/user/304
tmpfs           1.6G     0  1.6G   0% /run/user/303
tmpfs           1.6G     0  1.6G   0% /run/user/322
 
 
ade # exit
exit
 
HKVISE-02/alpiq# show disk
 
 
Internal filesystems:
/ : 16% used ( 2159940 of 14987616)
/dev : 0% used ( 0 of 8124480)
/dev/shm : 0% used ( 0 of 8134176)
/run : 1% used ( 8932 of 8134176)
/sys/fs/cgroup : 0% used ( 0 of 8134176)
/boot : 24% used ( 108052 of 487634)
/tmp : 1% used ( 6344 of 1983056)
/storedconfig : 2% used ( 1584 of 95054)
/opt : 18% used ( 101895372 of 593333176)
/run/user/440 : 0% used ( 0 of 1626836)
/run/user/301 : 0% used ( 0 of 1626836)
/run/user/0 : 0% used ( 0 of 1626836)
/run/user/304 : 0% used ( 0 of 1626836)
/run/user/303 : 0% used ( 0 of 1626836)
/run/user/322 : 0% used ( 0 of 1626836)
  all internal filesystems have sufficient free space

The workaround worked for this deployment without any issue, ISE is still working normal and is now able to backup. This workaround reduced the time to create a backup.
Maybe this would also be a valid way in scenarios where you have to increase the disk space permanently without reimaging.

 

Samuel Heinrich
Senior Network Engineer at Selution AG (Switzerland)
Arbeitet in Raum Basel (Switzerland) als Senior Network Engineer mit über 10 Jahren Erfahrung im Bereich Netzwerk und Telekommunikation.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.