Category Archives: Web development

News from the Blog 2021-11-11

New Articles

How to communicate with your Python program running inside a Docker Container, using Linux Signals

Hope you’ll have fun reading this article:

Communicating with Docker Containers via Linux Signals and Python

I migrated my last services from Amazon and the blog to Google Compute Engine (GCE / GCP)

I wrote a Postmortem analysis about the process of migrating my last services from my 11 year old Amazon account.

Updates

Updates to articles

I updated the article about Python weird things that you may not know adding the Ellipsis …

I’ve been working in some Cassandra examples. I may publish an article soon about using it from Python and Docker.

Updates to My Books

I updated my Python and Docker books.

I’m currently writing a book about using Amazon AWS Python SDK (boto3).

Updates to Open Source projects

I have updated ctop, fixed two bugs and increased Code Coverage.

I made a new tag and released the last Stable Version:

https://gitlab.com/carles.mateo/ctop/-/tags/0.8.7

On top of my local Unit Testing, I have Jenkins checking that I don’t commit anything that breaks the Tests.

Some time ago I wrote some articles about how you can setup jenkins in a Docker Container.

Miscellaneous

Charity

I’ve donated to Wikipedia.

Only 2% of the viewers donate, so I answered the call every time it was made.

This is my 5th donation to Wikimedia.

I consider that Freedom is very important.

I bought these new books

One of my secrets to be on top is that I’m always studying.

I study all the time, at work and in my free time.

I use Linux Academy and I buy books in paper. I don’t connect with reading in tablets. I think information is stored better when read in paper. I use also a marker and pointers to keep a direct access to the most interesting points on the books.

And I study all kind of themes. Obviously I know a lot of Web Scraping, but there is always room for learning more. And whatever new I learn helps me to be better with my students and more clear writing my books.

I’ve never been a Front End, but I’ve been able to fix bugs in the Front End engines from the companies I worked for, like Privalia. I was passed a bug that prevented the Internet Explorer users to buy just one hour before we launching a massive campaign. I debugged and I found a variable named “value” so the html looked like <input name="value" value="">. In less than 30 minutes I proved to the incredulous Head of Development and the CTO that a bug in Internet Explored was causing a conflict when fetching the value from the input named value. We deployed to Production the update and the campaign was a total success. So I consider knowing Javascript and Front also a need, even if I don’t work directly with it. I want to be able to understand all the requirements and possibilities, and weaknesses, so I can fix bugs and save the day. That allowed me to fix scalability problems in Nodejs and Phantomjs projects too. (They are Javascript Server Side, event driven, projects)

It seems that Amazon.co.uk works well again for Ireland. My two last orders arrived on time and I had no problems of border taxes apparently.

Nice Python article

I enjoyed a lot this article, cause explains part of what I did with my student and friend Albert, in a project that analyzes the access logs from Apache for patterns of attempts of exploits, then feeds a database, and then blocks those offender Ip Addresses in the Firewall.

The article only covers the part of Pandas, of reading the access.log file and working with it, but is a very well redacted article:

https://mmas.github.io/read-apache-access-log-pandas

Nice Virtual Volumes article from VMware

I prefer Open Source, but there are very good commercial products too.

I liked this article about Virtual Volumes from VMWare:

Understanding Virtual Volumes (vVols) in VMware vSphere 6.7/7.0 (2113013)

https://kb.vmware.com/s/article/2113013

Thanks Blizzard (again)

There is a very nice initiative where we can nominate 4 colleagues a year, that we think that deserve a recognition.

My colleagues voted for me, so I received a gift voucher that I can spend in Ireland stores like Ikea, Pc World, Argos, Adidas, App Store & iTunes…

So thanks a million buds. :)

Migrating my 11 years Amazon AWS account services (Postmortem Analysis)

I started to explain that I was migrating some services from Amazon and that some of my sites were under Maintenance and that I would provide more information.

Here is the complete history of why I migrated all the services from my 11 years old Amazon account to other CSP.

Some lessons can be learned from my adventure.

I migrated my last services from Amazon to GCP

Amazon sent me an email on October 6th, this year 2021, telling me that they will disable EC2-Classic by August 2022. I thought I would not be able to keep my Static Ip’s as in the past VPC Ip’s and EC2-Classic Ip’s were not transferable, so considering that I would loss my Static Ip’s anyway I started to migrate to some to other providers like Digital Ocean.

Is not cool losing Static Ip (Elastic Ip in AWS) Addresses as this is bad for SEO, so given that I though I would lose my Static Ips that have been with me for years, I started to migrate certain services to providers much more economic.

Amazon is terrible communicating, and I talked with some product managers in the past about that, when they lost one of my Volumes, and the email was so cold and terrible that actually that hurt more than Amazon losing my Data. I believed that it was a poorly made Scam and when I realized it was true I reached one of my friends, that is manager there, as I know they care for doing things right, and he organized a meeting with two PM so I can pass my feedback.

The Cloud providers are changing things very fast, and nobody is able to be up to date with the changes, unless their work position allows plenty of time to get updated. Even if pages of documentation are provided, you have to react to an event that they externally generated forcing you to action. Action to read all the documentation about EC2-Classic migrations, action to prepare to have migrated by August 2022.

So August 2022… I was counting that I had plenty of time but I’m writing a new book about using the Amazon SDK for Python, boto3, and I was doing some API calls and they started to fail in a very unusual way, Exceptions with timeout, but only for the only region where I had EC2-Classic.

urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f0347d545e0>: Failed to establish a new connection: [Errno -2] Name or service not known

My config was:

        o_config = Config(
            region_name="us-east-1a",
            signature_version="v4",
            retries={
                'max_attempts': 10,
                'mode': 'standard'
            }
        )

But if I switched to another region name, it would work:

            region_name='us-west-2',

I made a mistake in here, the region name is “us-east-1” and not “us-east-1a“. “us-east-1a” is the availability zone. So the SDK was giving a timeout because in order to connect to the endpoint it uses the region name as part of the hostname. So it doesn’t find that endpoint because it doesn’t exist.

I never understood why a company like Amazon is unable to provide the SDK with a sample project or projects 100% working, with the source code so people has a base that works to build up.

Every API that I have created, I have provided it with documentation but also with example for several languages for how to use it.

In 2013 I was CTO of an online travel agency, and we had meta-searchers consuming our API and we were having several hundreds of thousands requests per second. Everything was perfectly documented, examples were provided for several languages, the document and the SDK had version numbers…

Everybody forgets about Developers and companies throw terrible and cold products to the poor Developers, so difficult to use. How many Developers would like to say: Listen Mr. President of the big Cloud Company XXXX, I only want to spawn a VM that works, and fast, with easy wizards. I don’t want to learn 50 hours before being able to use your overpriced platform, by doing 20 things before your Ip’s are reflexes of your infrastructure and based in Microservices. Modern JavaScript frameworks can create nice gently wizards even if you have supercold APIs.

Honestly, I didn’t realize my typo in the region and I connected to the Amazon Console to investigate and I saw this.

Honestly, when I read it I understood that they were going to end my EC2 Networking the 30th of October. It was 29th. I misunderstood.

It was my fault not reading it well to the end, I got shocked by the first part telling about shutdown and I didn’t fully understood as they were going to shutdown EC2-Classic for the zones I didn’t had anything running only.

From the long errors (3 exceptions chained) I didn’t realize that the endpoint is built with the region name. (And I was passing the availability zone)

botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.us-east-1a.amazonaws.com/"

Here is when I say that a good SDM would had thought and cared for the Developers more, and would had made the SDK to check if that region exists. How difficult is to create a SDK a bit more clever that detects a invalid region id?. It is not difficult.

It is true that it was late in the evening and I was tired of all the day, and two days of the week between work and zoom university classes I work 15 hours and 13 hours respectively, not counting the assignments, so by the end of the week I am very tired. But that’s why it is very important to follow methodology and to read well. I think Amazon has 50% of the fault by the way they do things: how the created the SDK, how they communicate, and by the errors that the console returned me when I tried to create a VPC instance of an EC2-Classic AMI (they seem related to the fact I had old VPC Network objects with shorter hash than the current they use) and the other 50% was my fault for not identifying the source of the error, and not reading the message in their website well.

But the fact that there were having those errors in the API’s and timeouts made me believe they were going to cut the EC2-Classic Networking the next day.

All the mistakes fall together in a perfect storm.

I checked for documentation and I saw it was possible to migrate my Static Ip’s to VPC Static Ip’s.

It was Friday evening, and I cancelled my plans, in order to migrate the Blog to VPC in an attempt to keep running it with Amazon.

As Cloud Architect, I like to have running instances in several CSP as it allows me to stay up to date with the changes they do.

I checked the documentation for the migration. Disassociating the Static Ip (Elastic Ip in AWS jargon) was easy. Turning into VPC as well.

As I progressed, what had to be easy turned into a nightmare, as I was getting many errors from the Amazon API, without any information, and my Instances were not created.

I figured out that their API could have problems with old VPC objects I created time ago, so I had to create new objects for several things.

I managed to spawn my instances but they were being launch and terminated instantly without information. Frustrating.

When launching a new instance from the AMI (a Snapshot of the blog), I was giving shown options to add more volumes without any sense. My Instance was using 16GB from a 20GB total Space, and I was shown different volume configs, depending on the instance, in some case an additional 20GB volume, in other small SSD, ephemeral and 10 GB for the AMI (which requires at least 16GB).

After some fight I manage to make it work after deleting the volumes that made no sense, and keeping only one of 20GB, the same size of my AMI.

But then my nightmare started to make the VPC Instance to have Internet access and to be seen from outside. I had to create a new Internet Gateway, NAT, Network, etc…

As mentioned the old objects I was trying to reusing were making the process to fail.

I was running out of time, and I thought in few time they were going to shutdown EC2-Classic network (as I did not read correctly), so I decided to download everything and to migrate to another provider. For doing that first I blocked all the traffic, except for my Ip.

I worked in parallel, creating the new config in Google Cloud, just in case I had forgot something. I had created a document for the migration and it was accurate.

I managed to do everything fast enough. The slower part was to download all the Data, as I hold entire VM’s for projects like Cassandra Universal Driver.

Then I powered off my Amazon Instance for the Blog forever.

In GCP I blocked all the traffic in the firewall, except for my Ip, so I could work calmly.

When everything was ready, I had to redirect the DNS to the new static Ip from Google.

The DNS provider I used had implemented some changes in their API so I was getting errors replacing my old entry ‘.’ (their JSON calls returned Internal Server Error). Finally I figured it out how to workaround it and I was able to confirm that the first service was up and running.

I did some tests to make sure there were not unexpected permission problems, entries in the logs, etc…

Only then I opened the Google Firewall. I have a second firewall in each instance where I block or open at Ip tables level what I want. Basically abusive bot’s IPs trying to find exploits or brute force by dictionary passwords.

I checked with my phone, without Wifi that the Firewall was all good. (It is always a good idea to use another external Ip, different from the management one, to check)

I added a post explaining that I was migrating some of my Services and were under maintenance.

I mentioned in the blog that some of my services were being migrated from Amazon to Digital Ocean.

For some reasons, in the Backup of the Database one user was lost, so I created it in the MySQL with the typical commands:

CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON mydatabase.* TO 'username'@'localhost';

My Sites are under Maintenance

2021-11-08 Update: There is a Postmortem analysis of what happened with Amazon here.

TLTR: I’m undergoing a Maintenance on all my sites.

The main reason was that I was getting unexpected API Exceptions on the AWS SDK for Python (boto3), so I connected to the AWS Console to get more information.

Then I saw a message indicating that they will stop EC2-Classic today 30th of October. (Please read the Update on the Postmortem analysis as I understood incorrectly that banner message)

I already started migrating my Services, some I move to other providers like Digital Ocean. Other I had plant to keep in Amazon.

EOL (End of Life) was scheduled for 2022 August, so when I saw the message from Amazon the evening of the 29th, I decided to migrate my EC2-Classic Public Ip’s and Compute to VPC. Trying to deploy from an AMI, Amazon APIs were returning many internal errors, and as I figured out where their failures would be I was able get instances being launch without being Terminated immediately without an explanation. Still I had many problems with the Internet Gateway, VPC NAT, etc… after hours fighting with their errors, and their console, that is more a bunch of pages to manage Infrastructure rather than a user/developer friendly Cloud Tool I decided that I had enough.

After 11 years using Amazon AWS, including a trip to Dublin to be hired as Manager for Cloud Watch, and giving them the idea to add AutoScaling (I was told the project was too easy for me and that I would get bored in a year or too so I was not hired), I decided to move my Services to Google Cloud and to Digital Ocean.

I’m very polite and I saw that when I told to one Manager that the User Interface was terrible he didn’t like, but I have to speak up and say that tools for developers cannot be cold as your evil girlfriend. Cannot be API alike, stand alone pages to manage infinite parts of Architecture. Web providing services for developers cannot be created like in cold SysAdmin style. If the infrastructure is hard to manage and internally you use APIs, build nice Wizards in Javascript. I was leading a Team of Developers with infinite less resources than Amazon or Google and we wrote a Multi-Cloud product, with nice, and clever, and easy to use Wizards, and they were infinitely more better that those giant CSPs. We won a prize at European level at that time. But it was 2013.

I’ve migrated everything, moved all the data, statics, VMs… but I’m completing the adjustments for certain services like Cassandra nodes, web sites, bootstrapping some of my sites based of my PHP Catalonia Framework, adding Firewall rules to GCP, doing changes for Ansible provisioning, deploying the Server scripts from IaC, Docker, etc…

I’ll be posting updates in Twitter.

Web Top – Displaying top with Python 3 Web Server and Carleslibs

So this is a super simple example on how quickly you can create nice solutions with my package carleslibs.

In this example I use Python 3 incorporated Web Server and carleslibs, to execute top and display in the browser.

Requisites:

Having Python3 and have installed carleslibs 1.0.1 or superior.

pip3 install carleslibs

Having this running in a Linux with top installed. All of them come with top, as long as I know.

This is the 84 lines code for WebTop:

from http.server import BaseHTTPRequestHandler, HTTPServer

from carleslibs.subprocessutils import SubProcessUtils
from carleslibs.datetimeutils import DateTimeUtils


class Top():

    def __init__(self, o_subprocess):
        self.o_subprocess = o_subprocess

    def get_top(self):

        a_domains_offline = []

        s_command = "/usr/bin/top -n 1 -b"
        i_code, s_stdout, s_stderr = self.o_subprocess.execute_command_for_output(s_command, b_shell=True, b_convert_to_ascii=True)

        return i_code, s_stdout, s_stderr


class WebServer(BaseHTTPRequestHandler):

    def do_GET(self):
        o_subprocess = SubProcessUtils()
        self.o_top = Top(o_subprocess)

        self.o_datetime = DateTimeUtils()

        self.i_max_domains_offline = 0

        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()

        WebTop.log(self.path)
        if self.path == "/favicon.ico":
            return

        s_html = "<html><body>"
        s_html = s_html + "<h1>Web Top</h1>"
        s_html = s_html + '<small>by Carles Mateo - <a href="https://blog.carlesmateo.com">blog.carlesmateo.com</a></small>'

        i_code, s_stdout, s_stderr = self.o_top.get_top()

        if i_code != 0:
            s_html = s_html + "Error Code: " + str(i_code) + "&lt;/br&gt;"
            s_html = s_html + "Message: " + s_stderr + "&lt;/br&gt;"
        else:
            s_html = s_html + "<pre>"
            s_html = s_html + s_stdout
            s_html = s_html + "</pre>"

        s_html = s_html + "</body>"
        s_html = s_html + "</html>"

        by_html = bytes(s_html, encoding="utf-8")

        self.wfile.write(by_html)


class WebTop():

    o_datetime = DateTimeUtils()

    @staticmethod
    def log(s_text):
        s_datetime = WebTop.o_datetime.get_datetime()
        print(s_datetime, s_text)


if __name__ == "__main__":

    o_webserver = HTTPServer(("localhost", 80), WebServer)
    WebTop.log("Server started")

    try:
        o_webserver.serve_forever()
    except KeyboardInterrupt:
        pass

    o_webserver.server_close()
    WebTop.log("Server stopped")

Just run the code and go to localhost with your favorite browser.

If you get an error like this it means that another process is listening on port 80. Just use another like 8080, 8181, etc…

Traceback (most recent call last):
  File "/home/carles/Desktop/code/carles/json-realm-live/web_top.py", line 74, in <module>
    o_webserver = HTTPServer(("localhost", 80), WebServer)
  File "/usr/lib/python3.8/socketserver.py", line 452, in __init__
    self.server_bind()
  File "/usr/lib/python3.8/http/server.py", line 138, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/usr/lib/python3.8/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
PermissionError: [Errno 13] Permission denied

And you will see the requests coming:

2021-10-09 10:47:46 Server started
127.0.0.1 - - [13/Oct/2021 10:47:48] "GET / HTTP/1.1" 200 -
2021-10-09 10:47:48 /
127.0.0.1 - - [13/Oct/2021 11:25:24] "GET / HTTP/1.1" 200 -
2021-10-09 11:25:24 /

So instead of using top, you can use ctop.py :)

Just replace the command by:

s_command = "ctop.py -b -n=1 --rows=30 --columns=200"

You can also create a Dockerfile very easily and run this in a Container.

News of the blog 2021-08-16

  • I completed my ZFS on Ubuntu 20.04 LTS book.
    I had an error in an actual hard drive so I added a Troubleshooting section explaining how I fixed it.
  • I paused for a while the advance of my book Python: basic exercises for beginners, as my colleague Michela is translating it to Italian. She is a great Engineer and I cannot be more happy of having her help.
  • I added a new article about how to create a simple web Star Wars game using Flask.
    As always, I use Docker and a Dockerfile to automate the deployment, so you can test it without messing with your local system.
    The code is very simple and easy to understand.
mysql> UPDATE wp_options set option_value='blog.carlesmateo.local' WHERE option_name='siteurl';
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0

This way I set an entry in /etc/hosts and I can do all the tests I want.

  • I added a new section to the blog, is a link where you can see all the articles published, ordered by number of views.
    /posts_and_views.php

Is in the main page, just after the recommended articles.
Here you can see the source code.

  • I removed the Categories:
    • Storage
      • ZFS
  • In favor of:
    • Hardware
      • Storage
        • ZFS
  • So the articles with Categories in the group deleted were reassigned the Categories in the second group.
  • Visually:
    • I removed some annoying lines from the Quick Selection access.
      They came from inherited CSS properties from my WordPress, long time customized, and I created new styles for this section.
    • I adjusted the line-height to avoid separation between lines being too much.
  • I added a link in the section of Other Engineering Blogs that I like, to the great https://github.com/lesterchan site, author of many super cool WordPress plugins.

My PHP Script to see WordPress Posts and Views ordered by Views

This article shows the code for my PHP program posts_and_views.php that you can see at the bottom of in the Quick Selection bar.

Page rendered of all my Published Posts sorted by Views

So I wrote this small PHP Code to display the number of Post Views recorded in my WordPress Database, sorted by Views in descending order.

In other words, sorted from Most Viewed.

Please note: In order to track the post views I use the excellent plugin WP-PostViews by Lester ‘GaMerZ’ Chan

Here is the code for posts_and_views.php

As you can see I use wp-config.php file to get the Database Settings (username, password, database) but I don’t use the WordPress Engine, it is just a stand alone PHP. So I wrote my own function to get the slug based on the category name. I believe the slug for it is in the Database and I could have added this as a SubQuery with JOINs, which would be better, but I wanted to keep the Database workload lightweight and especially, I did not want to invest more time investigating how the get the slug.

As I tested my function I didn’t find any Category failing but I saw that I had the Category Storage repeated in two different structure tree in Categories, and I will only always link to just the first one (So I was not linkin to the slug storage-2). I fixed that Category that I found repeated but to be honest if this script was a commercial solution or an Open Source solution properly maintained rather than just a sample, I would update it to have Categories and Tags’ slugs coming from the Database.

I would probably make it work with a Cron that would generate a cached page, updated every hour or every ten minutes. I did this in other projects I worked or in my PHP Framework Catalonia Framerwork.

But honestly, the load of the page does not justify a major effort in optimizing here, in this case.

<!DOCTYPE html>
<html>
<head>
    <style>
        .first {

            background-color: blue;
            padding: 12px;
        }
        .second {
            background-color: rgba(254, 253, 252, 0.7);
            text-align: center;
            padding:20px 0;
            font-size: 20px;
        }

        .centered {
            text-align: center;
        }

        /* unvisited link */
        a:link {
            color: blue;
            text-decoration: none;
        }

        /* visited link */
        a:visited {
            color: blue;
            text-decoration: none;
        }

        /* mouse over link */
        a:hover {
            color: #00A8EF;
            text-decoration: underline;
        }

        /* selected link */
        a:active {
            color: blue;
        }
    </style>
<body>
<?php

include "wp-config.php";

$s_site_name = "Carles Mateo's blog";
$s_site_link = "https://blog.carlesmateo.com";

function get_category_slug($s_text) {
    $s_output_text = strtolower($s_text);
    $s_output_text = str_replace(" ", "-", $s_output_text);

    return $s_output_text;
}

?>
<h1><a href="<?php print $s_site_link; ?>"><?php print($s_site_name); ?></a></h1>
<?php


$s_sort = "views DESC, post_date, post_title DESC";

if (array_key_exists("sort", $_GET)) {
    if ($_GET["sort"] == "date") {
        $s_sort = "post_date, views, post_title";
    }
}

$s_servername = "localhost";
$s_database = DB_NAME;
$s_username = DB_USER;
$s_password = DB_PASSWORD;


// Create connection
$o_conn = new mysqli($s_servername, $s_username, $s_password, $s_database);

// Check connection
if ($o_conn->connect_error) {
    die("Connection failed: " . $o_conn->connect_error);
}

/*
mysql> DESCRIBE wp_posts;
+-----------------------+-----------------+------+-----+---------------------+----------------+
| Field                 | Type            | Null | Key | Default             | Extra          |
+-----------------------+-----------------+------+-----+---------------------+----------------+
| ID                    | bigint unsigned | NO   | PRI | NULL                | auto_increment |
| post_author           | bigint unsigned | NO   | MUL | 0                   |                |
| post_date             | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_date_gmt         | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_content          | longtext        | NO   |     | NULL                |                |
| post_title            | text            | NO   |     | NULL                |                |
| post_excerpt          | text            | NO   |     | NULL                |                |
| post_status           | varchar(20)     | NO   |     | publish             |                |
| comment_status        | varchar(20)     | NO   |     | open                |                |
| ping_status           | varchar(20)     | NO   |     | open                |                |
| post_password         | varchar(255)    | NO   |     |                     |                |
| post_name             | varchar(200)    | NO   | MUL |                     |                |
| to_ping               | text            | NO   |     | NULL                |                |
| pinged                | text            | NO   |     | NULL                |                |
| post_modified         | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_modified_gmt     | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_content_filtered | longtext        | NO   |     | NULL                |                |
| post_parent           | bigint unsigned | NO   | MUL | 0                   |                |
| guid                  | varchar(255)    | NO   |     |                     |                |
| menu_order            | int             | NO   |     | 0                   |                |
| post_type             | varchar(20)     | NO   | MUL | post                |                |
| post_mime_type        | varchar(100)    | NO   |     |                     |                |
| comment_count         | bigint          | NO   |     | 0                   |                |
+-----------------------+-----------------+------+-----+---------------------+----------------+
 */

/*
mysql> describe wp_postmeta;
+------------+-----------------+------+-----+---------+----------------+
| Field      | Type            | Null | Key | Default | Extra          |
+------------+-----------------+------+-----+---------+----------------+
| meta_id    | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| post_id    | bigint unsigned | NO   | MUL | 0       |                |
| meta_key   | varchar(255)    | YES  | MUL | NULL    |                |
| meta_value | longtext        | YES  |     | NULL    |                |
+------------+-----------------+------+-----+---------+----------------+
 */

$s_sql = "SELECT DISTINCT post_title, post_content,
                 (SELECT CAST(meta_value AS SIGNED) FROM wp_postmeta WHERE wp_postmeta.meta_key = 'views' AND wp_postmeta.post_id = wp_posts.ID) AS 'views',
                 (SELECT group_concat(wp_terms.name separator ',') 
                         FROM wp_terms
                  INNER JOIN wp_term_taxonomy 
                          ON wp_terms.term_id = wp_term_taxonomy.term_id
                  INNER JOIN wp_term_relationships wpr 
                          ON wpr.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
                  WHERE 
                         taxonomy= 'category' AND wp_posts.ID = wpr.object_id
                ) AS 'Categories',
                (SELECT group_concat(wp_terms.name separator ', ') 
                        FROM wp_terms
                 INNER JOIN wp_term_taxonomy 
                         ON wp_terms.term_id = wp_term_taxonomy.term_id
                 INNER JOIN wp_term_relationships wpr 
                         ON wpr.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
                 WHERE 
                        taxonomy= 'post_tag' AND wp_posts.ID = wpr.object_id
                ) AS 'Tags',
               ID, post_name, post_date, post_modified, post_type
          FROM wp_posts
          WHERE 
                post_type = 'post' AND post_status = 'publish'
          ORDER BY
                $s_sort";

$o_result = $o_conn->query($s_sql);

if ($o_result->num_rows > 0) {
    ?><table style="border:1px solid">
    <tr class="first"><th>Title</th><th style="min-width:100px">Views</th><th style="min-width:150px">Creation Date</th><th>Categories</th><th>Tags</th></tr>

    <?php
    $i_total_views = 0;
    $i_articles = 0;

    // output data of each row
    while($row = $o_result->fetch_assoc()) {
        $s_style='style="border:1px solid"';
        $s_style='';
        $s_url = $row['post_name'];
        print('<tr>');
        print("<td $s_style>");
        print('<a href="'.$s_url.'" target="_blank">');
        print($row["post_title"]);
        print('</a>');
        print("</td>");
        print('<td class="centered" '.$s_style.'>');
        print(number_format($row["views"]));
        print("</td>");
        print("<td $s_style>");
        print("<small>");
        print($row["post_date"]);
        print("</small>");
        print("</td>");
        print("<td $s_style>");
        $s_categories = $row["Categories"];
        $a_categories = explode (",", $s_categories);
        $s_categories_content = "";
        foreach($a_categories as $s_category) {
            $s_category_slug = "/category/".get_category_slug($s_category)."/";
            $s_categories_content = $s_categories_content .'<a href="'.$s_category_slug.'" target="_blank">';
            $s_categories_content = $s_categories_content .$s_category;
            $s_categories_content = $s_categories_content ."</a>, ";
        }

        if (strlen($s_categories_content) > 0) {
            $s_categories_content = substr($s_categories_content, 0, -2);
        }

        print($s_categories_content);
        print("</td>");
        print("<td $s_style>");
        print($row["Tags"]);
        print("</td>");
        // $row["post_content"];
        $i_total_views = $i_total_views + intval($row["views"]);
        $i_articles++;
        echo "</tr>";
    } ?></table><?php
    print("<strong>Total articles:</strong> ".number_format($i_articles)." <strong>Total Views:</strong> ".number_format($i_total_views));
    print("<br>");
} else {
    echo "<p>0 results</p>";
}
$o_conn->close();

?>
</body>
</html>

A sample Flask application

Today I bring you a game made with Python and Flask extracted from my book Python 3 Combat Guide.

It is a very simple game where you have to choose what Star wars robot you prefer.

Then an internal counter, kept in a static variable, is updated.

I display the time as well, to show the use of a in import and dynamic contents printed as well.

I added a Dockerfile and a bash script to build the Docker Image, so you can run the Docker Container without installing anything in your computer.

You can download the code from here:

https://gitlab.com/carles.mateo/python-flask-r2d2

Or clone the project:

git clone https://gitlab.com/carles.mateo/python-flask-r2d2.git

Then build the image with the script I provided:

sudo ./build_docker.sh 

After Docker Image flask_app is built, you can run a Docker Container based on it with:

sudo docker run -d -p 5000:5000 --name flask_app flask_app

After you’re done, in order to stop the Container type:

sudo docker stop flask_app

Here is the source code of the Python file flask_app.py:

#
# flask_app.py
#
# Author: Carles Mateo
# Creation Date: 2020-05-10 20:50 GMT+1
# Description: A simple Flask Web Application
#              Part of the samples of https://leanpub.com/pythoncombatguide
#              More source code for the book at https://gitlab.com/carles.mateo/python_combat_guide
#

from flask import Flask
import datetime


def get_datetime(b_milliseconds=False):
    """
    Return the datetime with miliseconds in format YYYY-MM-DD HH:MM:SS.xxxxx
    or without milliseconds as YYYY-MM-DD HH:MM:SS
    """
    if b_milliseconds is True:
        s_now = str(datetime.datetime.now())
    else:
        s_now = str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

    return s_now


app = Flask(__name__)

# Those variables will keep their value as long as Flask is running
i_votes_r2d2 = 0
i_votes_bb8 = 0


@app.route('/')
def page_root():
    s_page = "<html>"
    s_page += "<title>My Web Page!</title>"
    s_page += "<body>"
    s_page += "<h1>Time now is: " + get_datetime() + "</h1>"
    s_page += """<h2>Who is more sexy?</h2>
<a href="r2d2"><img src="static/r2d2.png"></a> <a href="bb8"><img width="250" src="static/bb8.jpg"></a>"""
    s_page += "</body>"
    s_page += "</html>"

    return s_page


@app.route('/bb8')
def page_bb8():
    global i_votes_bb8

    i_votes_bb8 = i_votes_bb8 + 1

    s_page = "<html>"
    s_page += "<title>My Web Page!</title>"
    s_page += "<body>"
    s_page += "<h1>Time now is: " + get_datetime() + "</h1>"
    s_page += """<h2>BB8 Is more sexy!</h2>
                <img width="250" src="static/bb8.jpg">"""
    s_page += "<p>I have: " + str(i_votes_bb8) + "</p>"
    s_page += "</body>"
    s_page += "</html>"

    return s_page


@app.route('/r2d2')
def page_r2d2():
    global i_votes_r2d2

    i_votes_r2d2 = i_votes_r2d2 + 1

    s_page = "<html>"
    s_page += "<title>My Web Page!</title>"
    s_page += "<body>"
    s_page += "<h1>Time now is: " + get_datetime() + "</h1>"
    s_page += """<h2>R2D2 Is more sexy!</h2>
                <img src="static/r2d2.png">"""
    s_page += "<p>I have: " + str(i_votes_r2d2) + "</p>"
    s_page += "</body>"
    s_page += "</html>"

    return s_page


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)

As always, the naming of the variables is based on MT Notation.

The Dockerfile is very straightforward:

FROM ubuntu:20.04

MAINTAINER Carles Mateo

ARG DEBIAN_FRONTEND=noninteractive

RUN apt update && \
    apt install -y vim python3-pip &&  pip3 install pytest && \
    apt-get clean

ENV PYTHON_COMBAT_GUIDE /var/python_combat_guide

RUN mkdir -p $PYTHON_COMBAT_GUIDE

COPY ./ $PYTHON_COMBAT_GUIDE

ENV PYTHONPATH "${PYTHONPATH}:$PYTHON_COMBAT_GUIDE/src/:$PYTHON_COMBAT_GUIDE/src/lib"

RUN pip3 install -r $PYTHON_COMBAT_GUIDE/requirements.txt

# This is important so when executing python3 -m current directory will be added to Syspath
# Is not necessary, as we added to PYTHONPATH
#WORKDIR $PYTHON_COMBAT_GUIDE/src/lib

EXPOSE 5000

# Launch our Flask Application
CMD ["/usr/bin/python3", "/var/python_combat_guide/src/flask_app.py"]

Solving Oracle error ORA 600 [KGL-heap-size-exceeded]

Time ago there was a web page that was rendered in blank for certain group of users.

The errors were coming from an Oracle instance. One SysAdmin restarted the instance, but the errors continued.

Often there are problems due to having two different worlds: Development and Production/Operations.

What works in Development, or even in Docker, may not work at Scale in Production.

That query that works with 100,000 products, may not work with 10,000,000.

I have programmed a lot for web, so when I saw a blank page I knew it was an internal error as the headers sent by the Web Server indicated 500. DBAs were seeing elevated number of errors in one of the Servers.

So I went straight to the Oracle’s logs for that Servers.

I did a quick filter in bash:

cat /u01/app/oracle/diag/rdbms/world7c/world7c2/alert/log.xml | grep "ERR" -B4 -A3

This returned several errors of the kind “ORA 600 [ipc_recreate_que_2]” but this was not the error our bad guy was:

‘ORA 600 [KGL-heap-size-exceeded]’

The XML fragment was similar to this:

<msg time='2016-01-24T13:28:33.263+00:00' org_id='oracle' comp_id='rdbms'
msg_id='7725874800' type='INCIDENT_ERROR' group='Generic Internal Error'
level='1' host_id='gotham.world7c.justice.cat' host_addr='10.100.100.30'
pid='281279' prob_key='ORA 600 [KGL-heap-size-exceeded]' downstream_comp='LIBCACHE'
errid='726175' detail_path='/u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc'>
<txt>Errors in file /u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc  (incident=726175):
ORA-00600: internal error code, arguments: [KGL-heap-size-exceeded], [0x14D22C0C30], [0], [524288008], [], [], [], [], [], [], [], []
</txt></msg>

Just before this error, there was an error with a Query, and the PID matched, so it seemed cleared to me that the query was causing the crash at Oracle level.

Checking the file:

/u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc

The content was something like this:

<msg time='2016-01-24T13:28:33.263+00:00' org_id='oracle' comp_id='rdbms'
msg_id='7725874800' type='INCIDENT_ERROR' group='Generic Internal Error'
level='1' host_id='gotham.world7c.justice.cat' host_addr='10.100.100.30'
pid='281279' prob_key='ORA 600 [KGL-heap-size-exceeded]' downstream_comp='LIBCACHE'
errid='726175' detail_path='/u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc'>
<txt>Errors in file /u01/app/oracle/diag/rdbms/world7c/world7c2/trace/world7c2_ora_281279.trc  (incident=726175):
ORA-00600: internal error code, arguments: [KGL-heap-size-exceeded], [0x14D22C0C30], [0], [524288008], [], [], [], [], [], [], [], []
</txt>
</msg>

Basically in our case, the query that was launched by the BackEnd was using more memory than allowed, which caused Oracle to kill it.

That is a tunnable that you can modify introduced in Oracle 10g.

You can see the current values first:

SQL> select
2 nam.ksppinm NAME,
3 nam.ksppdesc DESCRIPTION,
4 val.KSPPSTVL
5 from
6 x$ksppi nam,
7 x$ksppsv val
8 where nam.indx = val.indx and nam.ksppinm like '%kgl_large_heap_%_threshold%';

NAME                              | DESCRIPTION                       | KSPPSTVL
=============================================================================================
_kgl_large_heap_warning_threshold | maximum heap size before KGL      | 4194304
                                    writes warnings to the alert log
---------------------------------------------------------------------------------------------
_kgl_large_heap_assert_threshold  | maximum heap size before KGL      | 4194304
                                    raises an internal error

So, _kgl_large_heap_warning_threshold is the maximum heap before getting a warning, and _kgl_large_heap_assert_threshold is the maximum heap before getting the error.

Depending in your case the solution can be either:

  • Breaking your query in several to reduce the memory used
  • Use paginating or LIMIT
  • Set a bigger value for those tunnables.

It will work setting 0 for these to variables, although I don’t recommend it to you, as you want your Server to kill queries that are taking more memory than you want.

To increase the value of , you have to update it. Please note it is in bytes, so for 32MB is 32 * 1024 * 1024, so 33,554,432, and using spfile:

SQL> alter system set "_kgl_large_heap_warning_threshold"=33554432
scope=spfile ;
 
SQL> shutdown immediate 

SQL> startup
 
SQL> show parameter _kgl_large_heap_warning_threshold
NAME                               TYPE      VALUE
==================================|=========|===============
 
_kgl_large_heap_warning_threshold | integer | 33554432
 

Or if using the parameter file, set:

_kgl_large_heap_warning_threshold=33554432

Post-Mortem: The mystery of the duplicated Transactions into an e-Commerce

Me, with 4 more Senior BackEnd Engineers wrote the new e-Commerce for a multinational.

The old legacy Software evolved into a different code for every country, making it impossible to be maintained.

The new Software we created used inheritance to use the same base code for each country and overloaded only the specific different behavior of every country, like for the payment methods, for example Brazil supporting “parcelados” or Germany with specific payment players.

We rewrote the old procedural PHP BackEnd into modern PHP, with OOP and our own Framework but we had to keep the transactional code in existing MySQL Procedures, so the logic was split. There was a Front End Team consuming our JSONs. Basically all the Front End code was cached in Akamai and pages were rendered accordingly to the JSONs served from out BackEnd.

It was a huge success.

This e-Commerce site had Campaigns that started at a certain time, so the amount of traffic that would come at the same time would be challenging.

The project was working very well, and after some time the original Team was split into different projects in the company and a Team for maintenance and evolutives was hired.

At certain point they started to encounter duplicate transactions, and nobody was able to solve the mystery.

I’m specialized into fixing impossible problems. They used to send me to Impossible Missions, and I am famous for solving impossible problems easily.

So I started the task with a SRE approach.

The System had many components and layers. The problem could be in many places.

I had in my arsenal of tools, Software like mysqldebugger with which I found an unnoticed bug in decimals calculation in the past surprising everybody.

Previous Engineers involved believed the problem was in the Database side. They were having difficulties to identify the issue by the random nature of the repetitions.

Some times the order lines were duplicated, and other times were the payments, which means charging twice to the customer.

Redis Cluster could also play a part on this, as storing the session information and the basket.

But I had to follow the logic sequence of steps.

If transactions from customer were duplicated that mean that in first term those requests have arrived to the System. So that was a good point of start.

With a list of duplicated operations, I checked the Webservers logs.

That was a bit tricky as the Webserver was recording the Ip of the Load Balancer, not the ip of the customer. But we were tracking the sessionid so with that I could track and user request history. A good thing was also that we were using cookies to stick the user to the same Webserver node. That has pros and cons, but in this case I didn’t have to worry about the logs combined of all the Webservers, I could just identify a transaction in one node, and stick into that node’s log.

I was working with SSH and Bash, no log aggregators existing today were available at that time.

So when I started to catch web logs and grep a bit an smile was drawn into my face. :)

There were no transactions repeated by a bad behavior on MySQL Masters, or by BackEnd problems. Actually the HTTP requests were performed twice.

And the explanation to that was much more simple.

Many Windows and Mac User are used to double click in the Desktop to open programs, so when they started to use Internet, they did the same. They double clicked on the Submit button on the forms. Causing two JavaScript requests in parallel.

When I explained it they were really surprised, but then they started to worry about how they could fix that.

Well, there are many ways, like using an UUID in each request and do not accepting two concurrents, but I came with something that we could deploy super fast.

I explained how to change the JavaScript code so the buttons will have no default submit action, and they will trigger a JavaScript method instead, that will set a boolean to True, and also would disable the button so it can not be clicked anymore. Only if the variable was False the submit would be performed. It was almost impossible to get a double click as the JavaScript was so fast disabling the button, that the second click will not trigger anything. But even if that could be possible, only one request would be made, as the variable was set to True on the first click event.

That case was very funny for me, because it was not necessary to go crazy inspecting the different layers of the system. The problem was detected simply with HTTP logs. :)

People often forget to follow the logic steps while many problems are much more simple.

As a curious note, I still see people double clicking on links and buttons on the Web, and some Software not handling it. :)

News from the blog 2020-10-16

  • I’ve been testing and adding more instances to CMIPS. I’m planning on testing the Azure instance with 120 cores.
  • News: Microsoft makes an option to permanently remote work

https://www.bbc.com/news/business-54482245

  • One of my colleagues showed me dstat, a very nice tool for system monitoring, and bandwidth of a drive monitoring. Also ifstat, as complement to iftop is very cool for Network too. This functionality is also available in CTOP.py
  • As I shared in the past news of the blog, I’m resuming my contributions to ZFS Community.

Long time ago I created some ZFS tools that I want to share soon as Open Source.

I equipped myself with the proper Hardware to test on SAS and SATA:

  • 12G Internal PCI-E SAS/SATA HBA RAID Controller Card, Broadcom’s SAS 3008, compatible for SAS 9300-8I.
    This is just an HDA (Host Data Adapter), it doesn’t support RAID. Only connects up to 8 drives or 1024 through expander, to my computer.
    It has a bandwidth of 9,600 MB/s which guarantees me that I’ll be able to add 12 SAS SSD Enterprise grade at almost the max speed of the drives. Those drives perform at 900 MB/s so if I’m using all of them at the same time, like if I have a pool of 8 + 3 and I rebuild a broken drive or I just push Data, I would be using 12×900 = 10,800 MB/s. Close. Fair enough.
  • VANDESAIL Mini-SAS Cables, 1m Internal Mini-SAS to 4x SAS SATA Forward Breakout Cable Hard Drive Data Transfer Cable (SAS Cable).
  • SilverStone SST-FS212B – Aluminium Trayless Hot Swap Mobile Rack Backplane / Internal Hard Drive Enclosure for 12x 2.5 Inch SAS/SATA HDD or SSD, fit in any 3x 5.25 Inch Drive Bay, with Fan and Lock, black
  • Terminator is here.
    I ordered this T-800 head a while ago and finally arrived.

Finally I will have my empty USB keys located and protected. ;)

Remember to be always nice to robots. :)