Tag Archives: top

CTOP

CTOP is a tool for Linux System Administration that I’ve written in Python3, that uses only the System, and not third party libraries, in order to get all the information required.

The purpose of this tool is to help to identify problems and troubleshot with a single view to a single tool that has all the typical indicators.

It provides in a single view information that is typically provided by many programs:

  • top, htop for the CPU usage, process list, memory usage
  • meminfo
  • cpuinfo
  • hostname
  • uptime
  • df to see the free space in / and the free inodes
  • iftop to see real-time bandwidth usage
  • ip addr list to see the main Ip for the interfaces
  • netstat or lsof to see the list of listening TCP Ports
  • uname -a to see the Kernel version

Other cool things it does is:

  • Identifying if you’re inside an Amazon VM, Virtual Box, Docker or lxc
  • Uses colors, and marks in yellow the warnings and in red the errors, problems like few disk space reaming or high CPU usage according to the available cores and CPUs.
  • Redraws the screen and adjust to the size of the Terminal, bigger terminal displays more information
  • It doesn’t use external libraries, and does not escape to shell. It reads everything from /proc /sys or /etc files.
  • Identifies the Linux distribution
  • Shows the most repeated binaries, so you can identify DDoS attacks (like having 5,000 apache instances where you have normally 500 or many instances of Python)
  • Indicates if an interface has the cable connected or disconnected
  • Shows the Speed of the Network Connection (useful for Mellanox cards than can operate and 200Gbit/sec, 100, 50, 40, 25, 10…)
  • It displays the local time and the Linux Epoch Time, which is universal (very useful for logs and to detect when there was an issue, for example if your system restarted, your SSH Session would keep latest Epoch captured)

Limitations:

  • It only works for Linux, not for Mac or for Windows. Although the idea is to help with Server’s Linux Administration and Troubleshot
  • The list of process of the System is read every 30 seconds, to avoid adding much overhead on the System

I decided to code name the version 0.7 as “Catalan Republic” to support the dreams and hopes and democratic requests of the Catalans people to become and independent republic.

I created this tool as Open Source and if you want to help I need people to test under different versions of:

  • RedHat (I have no longer commercial licenses)
  • Atypical Linux distributions

If you are a Cloud Provider and want me to implement the detection of your VMs, so the tool knows that is a instance of the Amazon, Google, Azure, Cloudsigma, Digital Ocean… contact me through my LinkedIn.

Monitoring an Amazon Instance, take a look at the amount of traffic sent and received

Dropping caches in Linux, to check if memory is actually being used

I encountered that Server, Xeon, 128 GB of RAM, with those 58 Spinning drives 10 TB and 2 SSD of 2 TB each, where I was testing the latest version of my Software.

Monitoring long term tests, data validation, checking for memory leaks…
I notice the Server is using 70 GB of RAM. Only 5.5 GB are used for buffers according to the usual tools (top, htop, free, cat /proc/meminfo, ps aux…) and no programs are eating that amount, so where is the RAM?.
The rest of the Servers are working well, including models: same mode, 4U60 with 64 GB of RAM, 4U90 with 128 GB and All-Flash-Array with 256 GB of RAM, only using around 8 GB of RAM even under load.
iSCSI sharings being used, with I/O, iSCSI initiators trying to connect and getting rejected, several requests for second, disk pulling, and that usual stuff. And this is the only unit using so many memory, so what?.
I checked some modules to see memory consumption, but nothing clear.
Ok, after a bit of investigation one member of the Team said “Oh, while you was on holidays we created a Ramdisk and filled it for some validations, we deleted that already but never rebooted the Server”.
Ok. The easy solution would be to reboot, but that would had hidden a memory leak it that was the cause.
No, I had to find the real cause.

I requested assistance of one my colleagues, specialist, Kernel Engineer.
He confirmed that processes were not taking that memory, and ask me to try to drop the cache.

So I did:

sync
echo 3 > /proc/sys/vm/drop_caches

Then the memory usage drop to 11.4 GB and kept like that while I maintain sustained the load.

That’s more normal taking in count that we have 16 Volumes shared and one host is attempting to connect to Volumes that do not exist any more like crazy, Services and Cronjobs run in background and we conduct tests degrading the pool, removing drives, etc..

After tests concluded memory dropped to 2 GB, which is what we use when we’re not under load.

Note: In order to know about the memory being used by Kernel slab cache in real time you can use command:

 slabtop

You can also check:

sudo vmstat -m