Category Archives: Troubleshoot

Solving Linux Load key “ssh_yourserver”: invalid format when provisioning from Jenkins

If you are getting an error like this when you try to provision using rsync or running commands from SSH from a Docker Instance from a worker node in Jenkins, having your SSH Key as a variable in Jenkins, here is a way to solve it.

These are the kind of errors that you’ll be receiving:

Load key "ssh_yourserver": invalid format

web@myserver.carlesmateo.com: Permission denied (publickey).

rsync: connection unexpectedly closed (0 bytes received so far) [sender]

rsync error: unexplained error (code 255) at io.c(235) [sender=3.1.3]

script returned exit code 255

So this applies if you copied your .pem file as text and pasted in a variable in Jenkins.

You’ll find yourself with the load key invalid format error.

I would suggest to use tokens and Vault or Consul instead of pasting a SSH Key, but if you need to just solve this ASAP that’s the trick that you need.

First encode your key with base64 without any wrapping. This is done with this command:

cat keys/key_azure_myserver_carlesmateo_com.pem | base64 --wrap=0

In your Jenkins steps you’ll add this code:

#!/bin/bash
echo "Creating credentials"
echo $SSH_YOURSERVER | base64 --decode > ssh_yourserver
echo "Setting permissions"
chmod 600 ssh_yourserver

Having a certificate then you can define new steps that will deploy to Production by rsyncing:

#!/bin/bash
echo "Deploying www..."
rsync -e "ssh -i ssh_carlesmateo -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" -av --progress --exclude={} --stats --human-readable -z www/ web@myserver.carlesmateo.com:/var/www/myawesomeproject/www/

Note that in this case I’m ignoring Strict Host Key Checking, which is not the preferred option for security, but you may want to use it depending on your strategy and characteristics of your Cloud Deployments.

Note also that I’m indicating as User Known Hosts File /dev/null. That is something you may want to have is you provision using Docker Containers that immediately destroyed after and Jenkins has not created the user properly and it is unable to write to ~home/.ssh/known_hosts

I mention the typical errors where engineers go crazy and spend more time fixing.

Fixing the problems installing napalm-base in Ubuntu 20.04 LTS

One of my friends wanted to use SaltStack and https://github.com/napalm-automation/napalm-salt

But he had problems installing napalm-base package.

Note that the package is no longer maintained.

He tried with the last one, and with the previous one (0.25.0), but he always got the error: ModuleNotFoundError: No module named ‘pip.req’

pip3 install napalm-base==0.25.0

Defaulting to user installation because normal site-packages is not writeable
Collecting napalm-base==0.25.0
  Using cached napalm-base-0.25.0.tar.gz (35 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-gzd07xzq/napalm-base_aace1b03ac0e4045bbc85e27c788ebc1/setup.py", line 5, in <module>
          from pip.req import parse_requirements
      ModuleNotFoundError: No module named 'pip.req'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

TL;TR: The problem is that pip version 10, changed the structure for req.

There are several solutions that can be done to make it work, but the easiest way is to downgrade pip, and install the package. After pip can be upgraded again.

python -m pip install pip==9.0.3
pip3 install napalm-base

Solving problems when updating to GNS3 last version, running with VirtualBox

Last Updated: 2022-01-19 12:05 Irish Time

So here I explain how to solve a problem that was happening to a friend.

He uses GNS3 for the university, and after installing the latest version, which in this case is 2.2.29, it stopped working.

He had it configured to use the local Server and VirtualBox in Windows 10.

The first thing to check and to fix is the Ip address for Host Only.

If you use Linux or Mac, only certain Ip ranges can be used, or you’ll have to edit a config file inside /etc/vbox

So the first thing is to set an Ip Address in VirtualBox VM that will make you worry free.

So start VirtualBox VM directly, and when the VM boots, use the text menu application to Configure to a valid Ip from the range defined for Host Only.

You can check this in VirtualBox in File > Host Network Manager

In my initial test I picket this Ip for the VM:

192.168.56.100

But using 192.168.56.100 can bring problems as the default DHCP Server is defined with this Ip, so I switched to:

192.168.56.10

Press CTRL + X to save and exit.

The VM will reboot automatically. Wait until it has booted and ping 192.168.56.10 from the Command Prompt.

Now, open a Windows Command Prompt or a Linux/Mac Terminal in you computer and ping the Ip:

You should also be able to see the web interface going to:

http://192.168.56.10

If it works then power off the VM, as we will start it automatically when running GNS3 main program (not from VirtualBox).

Now launch GNS3 program. Wait 30 seconds until it initializes and go to Edit > Preferences

Make sure you have the configuration like this:

Pay special attention to the Port for the GNS3 VM.

It seems like the main problem of my friend was that he was using a previous version, and he updated, and the settings from the previous version were kept. In his previous version he had configured the port 3080, but the new GNS 3 Server version 2.2.29 in the VM was using port 80, as you saw in my previous screenshots. So GNS3 was unable to connect to the VM.

After fixing this, restart GNS3, stop the VM if was not automatically stopped, and start GNS3 again.

After one minute approx connecting, you’ll see it working fine.

Some weird things from Python 3 that you may not know

Last Update: 2022-06-06 10:29 IST

You can find those bizarre things and more in my book Python 3 Combat Guide.

I’m not talking about the wonderful things, like how big can the Integers be, but about the bizarre things that may ruin your day.

What sums 0.1 + 0.1 + 0.1 in Python?

0.3?

Wrong answer.

A bit of humor

Well, to be honest the computer was wrong. They way programming languages handle the Floats tend to be less than ideal.

Floats

Maybe you know JavaScript and its famous NaN (Not a number).

You are probably sure that Python is much more exact than that…

…well, until you do a big operation with Floats, like:

10.0**304 * 10.0**50 

and

It returns infinite

I see your infinite and I add one :)

However If we try to define a number too big directly it will return OverflowError:

Please note Integers are handled in a much more robust cooler way:

Negative floats

Ok. What happens if we define a number with a negative power, like 10 ** -300 ?

And if we go somewhere a bit more far? Like 10 ** -329

It returns 0.0

Ups!

I mention in my books why is better to work with Integers, and in fact most of the eCommerces, banks and APIs work with Integers. For example, if the amount in USD 10.00 they send multiplied by 100, so they will send 1000. All the actor know that they have to divide by 2.

Breaking the language innocently

I mentioned always that I use the MT Notation, the prefix notation I invented, inspired by the Hungarian Notation and by an amazing C++ programmer I worked with in Volkswagen and in la caixa (now caixabank), that passed away many years ago.

Well, that system of prefixes will name a variable with a prefix for its type.

It’s very useful and also prevents the next weird thing from Python.

Imagine a Junior wants to print a String and they put in a variable. And unfortunately they call this variable print. Well…

print = "Hello World!"
print("That will hurt")

Observe the output of this and try not to scream:

Variables and Functions named equally

Well, most of languages are able to differentiate a function, with its parenthesis, from a variable.

The way Python does it hurts my coder heart:

Another good reason to use MT Notation for the variables, and for taking seriously doing Unit Testing and giving a chance to using getters and setters and class Constructor for implementing limits and sanitation.

Nested Loops

This will work in Python, it doesn’t work in other languages (but please never do it).

for i in range(3):
    print("First Loop", i)
    for i in range(4):
        print("Second Loop", i)

The code will not crash by overwriting i used in the first loop, but the new i will mask the first variable.

And please, name variables properly.

Import… once?

Imports are imported only once. Even if different files imported do import the same file.

So don’t have code in the middle of them, outside functions/classes, unless you’re really know what you’re doing.

Define functions first, and execute code after if __name__ == “__main__”:

Take a look at this code:

def first_function():
    print("Inside first function")
    second_function()

first_function()

def second_function():
    print("Inside second function")

Well, this will crash as Python executes the code from top to bottom, and when it gets to first_function() it will attempt to call second_function() which has not been read by Python yet. This example will throw an error.

You’ll get an error like:

Inside first function
Traceback (most recent call last):
  File "/home/carles/Desktop/code/carles/python_combat_guide/src/structure_dont_do_this.py", line 14, in <module>
    first_function()
  File "/home/carles/Desktop/code/carles/python_combat_guide/src/structure_dont_do_this.py", line 12, in first_function
    second_function()
NameError: name 'second_function' is not defined

Process finished with exit code 1

Add your code at the bottom always, under:

if __name__ == "__main__":
    first_function()

The code inside this if will only be executed if you directly call this code as main file, but will not be executed if you import this file from another one.

You don’t have this problem with classes in Python, as they are defined first, completely read, and then you instantiate or use them. To avoid messing and creating bugs, have the imports always on the top of your file.

…Ellipsis

Today is Halloween and one of my colleagues asked me help to improve his Automation project.

I found something weird in his code.

He had something like that.

class Router:

    def router_get_info(self):
        ...

    def get_help_command(self):
        return "help"

So I asked why you use … (dot dot dot) on that empty method?.

He told me that when he don’t want to implement code he just put that.

Well, dot dot dot is Ellipsis.

And what is Ellipsis?.

Ellipsis is an object that may appear in slice notation.

A good explanation of what is Ellipsis is found in this answer in StackOverflow.

In Python all the methods, functions, if, while …. require to have an instruction at least.

So the instruction my colleague was looking for is pass.

Just a variable?

In Python you can have just a var, without anything else, like no operation with it, no call, nothing.

This makes it easy to commit an error and not detecting it.

As you see we can have just s_var variable in a line, which is a String, and this does not raises an error.

If we do from python interpreter interactively, it will print the String “I’m a pickle” (famous phrase from Rick and Morty).

Variables are case sensitive

So you can define true false none … as they are different from True False None

Variables in Unicode

Python3 accepts variables in Unicode.

I would completely discourage you to use variables with accents or other characters different from a-z 0-9 and _

Python files with these names yes, but kaboom if you import them

So you can create Python files with dash or beginning with numbers, like 20220314_programming_class.py and execute them, but you cannot import them.

RYYFTK RODRIGUEZ,LEELA,FRY, FUTURAMA, 1999

A Tuple of a String is not a Tuple, it’s a String

This can be very messy and confusing. Normally you define a tuple with parenthesis, although you can use tuple() too.

Parenthesis are the way we normally build tuples. But if we do:

print(type('this is a String'))

You get that this is a String, I mean

<class 'str'>

If you want to get a tuple of a String you can add a comma after the first String, which is weird. You can also do tuple("this is a String")

I think the definition of a tuple should be consistent and idempotent, no matter if you use one or more parameters. Probably as parenthesis are used for other tasks, like invoking functions or methods, or separating arithmetic operations, that reuse of the signs () for multiple purposes is what caused a different behavior depending on if there is one or more parameters the mayhem IMO.

See some example cases.

Python simplifies the jump of line \n platform independent and some times it’s messy

If you come from a C background you will expect text file in different platforms: Linux, Mac OS X (changes from old to new versions), Windows… to be represented different. In some cases this is an ASCii code 10 (LF), in others 13 (CR), and in other two characters: 13 and immediately after 10.

Python simplifies the Enter character by naming it \n like in C.

So, platform independent, whenever you read a text file you will get \n for any ASCii 10 [LF] or 13 [CR]. [CR] will be converted to [10] in Linux.

If you read a file in a Linux system, where enters are represented by 10, which was generated in a Windows system, so it has [CR][LF] instead of [LF] at the end of each line, you’ll get a \n too, but two times.

And if you do len(“\n”) to know the len of that String, this returns 1 in all the platform.

To read the [LF] and [CR] (represented by \r) you need to open the file as binary. By default Python opens the files as text.

You can check this by writting [LF] and [CR] in Linux and see how Python seamlessly reads the file as it was [LF].

A file generated by Windows will get \n\n:

Random code when the class is imported

In a procedural file, the code that is outside a function, will be executed when it is imported. But if this file is imported again it will not be re-executed.

Things are more messy if you import a class file. Inside the body of the class, in the space you would reserve for static variables definition, you can have random code. And this code will be only executed on the first import, not on subsequent.

Disclaimer: the pictures from Futurama are from their respective owners.

Cloning a Windows Application running in Wine

I’ve some very old Windows Applications running in Wine in my Linux workstations.

It’s Software I bought years ago and that is not available anymore.

Keeping and migrating or cloning to another Linux Workstation or Virtual machine is really easy.

I share the steps with you.

You just have to copy the contents from your /home/username/.wine folder.

Then, in the new workstation install wine. For Ubuntu this is:

sudo apt update && sudo apt install wine

Run winecfg so basic links and structures are created.

Then simply copy the .wine folder backup to your new machine /home/username/

Your programs will be in /home/username/.wine/drive_c_/Program Files/ or /home/username/.wine/drive_c_/Program Files (x86)/

If you want you can just copy your programs folder.

Remember that to cd to a directory with spaces you have to use “

For example:

$ pwd
/home/carles/.wine/drive_c
$ cd "Program Files"
$ pwd
/home/carles/.wine/drive_c/Program Files

You can also use \ (slash space) to escape space.

Then start your favorite program with:

wine yourprogram.exe

If that fails is very probably that creating a new configuration, for a new user, will make things right.

Update 2022-01-05: Take in count that you will be copying the Windows registry when doing this. I use this trick to clone applications that are no longer downloadable from the Internet. I clone wine to dedicated Virtual Machines. You may need different Virtual Machines for different programs if windows registry is different for them.

Solving Oracle error ORA 600 [KGL-heap-size-exceeded]

Time ago there was a web page that was rendered in blank for certain group of users.

The errors were coming from an Oracle instance. One SysAdmin restarted the instance, but the errors continued.

Often there are problems due to having two different worlds: Development and Production/Operations.

What works in Development, or even in Docker, may not work at Scale in Production.

That query that works with 100,000 products, may not work with 10,000,000.

I have programmed a lot for web, so when I saw a blank page I knew it was an internal error as the headers sent by the Web Server indicated 500. DBAs were seeing elevated number of errors in one of the Servers.

So I went straight to the Oracle’s logs for that Servers.

I did a quick filter in bash:

cat /u01/app/oracle/diag/rdbms/world7c/world7c2/alert/log.xml | grep "ERR" -B4 -A3

This returned several errors of the kind “ORA 600 [ipc_recreate_que_2]” but this was not the error our bad guy was:

‘ORA 600 [KGL-heap-size-exceeded]’

The XML fragment was similar to this:

<msg time='2016-01-24T13:28:33.263+00:00' org_id='oracle' comp_id='rdbms'
msg_id='7725874800' type='INCIDENT_ERROR' group='Generic Internal Error'
level='1' host_id='gotham.world7c.justice.cat' host_addr='10.100.100.30'
pid='281279' prob_key='ORA 600 [KGL-heap-size-exceeded]' downstream_comp='LIBCACHE'
errid='726175' detail_path='/u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc'>
<txt>Errors in file /u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc  (incident=726175):
ORA-00600: internal error code, arguments: [KGL-heap-size-exceeded], [0x14D22C0C30], [0], [524288008], [], [], [], [], [], [], [], []
</txt></msg>

Just before this error, there was an error with a Query, and the PID matched, so it seemed cleared to me that the query was causing the crash at Oracle level.

Checking the file:

/u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc

The content was something like this:

<msg time='2016-01-24T13:28:33.263+00:00' org_id='oracle' comp_id='rdbms'
msg_id='7725874800' type='INCIDENT_ERROR' group='Generic Internal Error'
level='1' host_id='gotham.world7c.justice.cat' host_addr='10.100.100.30'
pid='281279' prob_key='ORA 600 [KGL-heap-size-exceeded]' downstream_comp='LIBCACHE'
errid='726175' detail_path='/u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc'>
<txt>Errors in file /u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc  (incident=726175):
ORA-00600: internal error code, arguments: [KGL-heap-size-exceeded], [0x14D22C0C30], [0], [524288008], [], [], [], [], [], [], [], []
</txt>
</msg>

Basically in our case, the query that was launched by the BackEnd was using more memory than allowed, which caused Oracle to kill it.

That is a tunnable that you can modify introduced in Oracle 10g.

You can see the current values first:

SQL> select
2 nam.ksppinm NAME,
3 nam.ksppdesc DESCRIPTION,
4 val.KSPPSTVL
5 from
6 x$ksppi nam,
7 x$ksppsv val
8 where nam.indx = val.indx and nam.ksppinm like '%kgl_large_heap_%_threshold%';

NAME                              | DESCRIPTION                       | KSPPSTVL
=============================================================================================
_kgl_large_heap_warning_threshold | maximum heap size before KGL      | 4194304
                                    writes warnings to the alert log
---------------------------------------------------------------------------------------------
_kgl_large_heap_assert_threshold  | maximum heap size before KGL      | 4194304
                                    raises an internal error

So, _kgl_large_heap_warning_threshold is the maximum heap before getting a warning, and _kgl_large_heap_assert_threshold is the maximum heap before getting the error.

Depending in your case the solution can be either:

  • Breaking your query in several to reduce the memory used
  • Use paginating or LIMIT
  • Set a bigger value for those tunnables.

It will work setting 0 for these to variables, although I don’t recommend it to you, as you want your Server to kill queries that are taking more memory than you want.

To increase the value of , you have to update it. Please note it is in bytes, so for 32MB is 32 * 1024 * 1024, so 33,554,432, and using spfile:

SQL> alter system set "_kgl_large_heap_warning_threshold"=33554432
scope=spfile ;
 
SQL> shutdown immediate 

SQL> startup
 
SQL> show parameter _kgl_large_heap_warning_threshold
NAME                               TYPE      VALUE
==================================|=========|===============
 
_kgl_large_heap_warning_threshold | integer | 33554432
 

Or if using the parameter file, set:

_kgl_large_heap_warning_threshold=33554432

Fixing Wifi not enabled in Linux Kali with Dell XPS laptop

This is a trick I share, as I see many students having problems with this.

Assuming that your Kali distribution is recent (Linux Kernel bigger than Kernel 5.3), the most typical problem student have is that laptops xps from Dell and other brands have a combination of keys to enable or disable the Wifi.

On the Dell xps is on the key PrtScr, so if your Wifi is disabled, you can enable it in Kali Linux press:

CTRL + ALT + Fn + PrtScr

As you can see in the PrtScr the is an icon of Wifi Signal.

The Fn key is on the bottom left, next to Ctrl.

This is a very simple to fix problem, but many people suffer this problem and go crazy trying to update drivers or even having to use an external USB dongle.

A simple Bash one line script to log the temperature of your HDDs and CPUs in Ubuntu

I’ve been helping to troubleshoot the reason one Commodity Server (with no iDrac/Ilo ipmi) is powering off randomly. One of the hypothesis is the temperature.

This is a very simple script that will print the temperature of the HDDs and the CPU and keep to a log file.

First you need to install hddtemp and lm-sensors:

sudo apt install hddtemp lm-sensors

Then this is the one line script, that you should execute as root:

while [ true ]; do date | tee -a /var/log/hddtemp.log; hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd | tee -a /var/log/hddtemp.log; date | tee -a /var/log/cputemp.log; sensors | tee -a /var/log/cputemp.log; sleep 2; done

Feel free to change sleep 2 for the number of seconds you want to wait, like sleep 10.

Press CTRL + C to interrupt the script at any time.

You can execute this inside a screen session and leave it running in Background.

Note that I use tee command, so the output is print to the screen and to the log file.

News from the blog 2020-10-16

  • I’ve been testing and adding more instances to CMIPS. I’m planning on testing the Azure instance with 120 cores.
  • News: Microsoft makes an option to permanently remote work

https://www.bbc.com/news/business-54482245

  • One of my colleagues showed me dstat, a very nice tool for system monitoring, and bandwidth of a drive monitoring. Also ifstat, as complement to iftop is very cool for Network too. This functionality is also available in CTOP.py
  • As I shared in the past news of the blog, I’m resuming my contributions to ZFS Community.

Long time ago I created some ZFS tools that I want to share soon as Open Source.

I equipped myself with the proper Hardware to test on SAS and SATA:

  • 12G Internal PCI-E SAS/SATA HBA RAID Controller Card, Broadcom’s SAS 3008, compatible for SAS 9300-8I.
    This is just an HDA (Host Data Adapter), it doesn’t support RAID. Only connects up to 8 drives or 1024 through expander, to my computer.
    It has a bandwidth of 9,600 MB/s which guarantees me that I’ll be able to add 12 SAS SSD Enterprise grade at almost the max speed of the drives. Those drives perform at 900 MB/s so if I’m using all of them at the same time, like if I have a pool of 8 + 3 and I rebuild a broken drive or I just push Data, I would be using 12×900 = 10,800 MB/s. Close. Fair enough.
  • VANDESAIL Mini-SAS Cables, 1m Internal Mini-SAS to 4x SAS SATA Forward Breakout Cable Hard Drive Data Transfer Cable (SAS Cable).
  • SilverStone SST-FS212B – Aluminium Trayless Hot Swap Mobile Rack Backplane / Internal Hard Drive Enclosure for 12x 2.5 Inch SAS/SATA HDD or SSD, fit in any 3x 5.25 Inch Drive Bay, with Fan and Lock, black
  • Terminator is here.
    I ordered this T-800 head a while ago and finally arrived.

Finally I will have my empty USB keys located and protected. ;)

Remember to be always nice to robots. :)

Fixing problems with audio not sounding after upgrade from Ubuntu 18.04 LTS to 20.04.1 LTS

Two days ago I upgraded my Ubuntu Linux 18.04 LTS Workstation to Ubuntu 20.04.1 LTS and I experienced some audio problems.

Basically I noticed that the system was not playing any sound.

When I checked the audio config I noticed that only the external output of my motherboard was detected, but not the HDMI output from the monitor.

I have a 28″ Asus monitor with speakers embedded.

It didn’t make any sense, so I decided to restart pulseaudio:

pulseaudio -k

That fixed my problem.

However I noticed that when I lock my session, and so the monitor goes off for power saving my HDMI monitor output disappears from the list again.

Repeating the command pulseaudio -k will fix that again.

I checked that the power saving was enabled:

cat /sys/module/snd_hda_intel/parameters/power_save
cat /sys/module/snd_hda_intel/parameters/power_save_controller

I had 1 and Y.

To make the change permanently, I change the power mode settings:

sudo sh -c "echo 0 > /sys/module/snd_hda_intel/parameters/power_save" 
sudo sh -c "echo N > /sys/module/snd_hda_intel/parameters/power_save_controller"