A quick video, of 3 minutes, that shows you how it works.
If you don’t have pandas installed you’ll have to install it and lxml, otherwise you’ll get an error:
File "/home/carles/Desktop/code/carles/blog.carlesmateo.com-source-code/venv/lib/python3.8/site-packages/pandas/io/html.py", line 872, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
You can install both from PyCharm or from command line with:
pip install pandas
pip install lxml
And here the source code:
import pandas as pd
if __name__ == "__main__":
# Do not truncate the data when printing
pd.set_option('display.max_colwidth', None)
# Do not truncate due to length of all the columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 2000)
# pd.set_option('display.float_format', '{:20,.2f}'.format)
o_pd_my_movies = pd.read_html("https://blog.carlesmateo.com/movies-i-saw/")
print(len(o_pd_my_movies))
print(o_pd_my_movies[0])
Only 2% of the viewers donate, so I answered the call every time it was made.
This is my 5th donation to Wikimedia.
I consider that Freedom is very important.
I bought these new books
One of my secrets to be on top is that I’m always studying.
I study all the time, at work and in my free time.
I use Linux Academy and I buy books in paper. I don’t connect with reading in tablets. I think information is stored better when read in paper. I use also a marker and pointers to keep a direct access to the most interesting points on the books.
And I study all kind of themes. Obviously I know a lot of Web Scraping, but there is always room for learning more. And whatever new I learn helps me to be better with my students and more clear writing my books.
I’ve never been a Front End, but I’ve been able to fix bugs in the Front End engines from the companies I worked for, like Privalia. I was passed a bug that prevented the Internet Explorer users to buy just one hour before we launching a massive campaign. I debugged and I found a variable named “value” so the html looked like <input name="value" value="">. In less than 30 minutes I proved to the incredulous Head of Development and the CTO that a bug in Internet Explored was causing a conflict when fetching the value from the input named value. We deployed to Production the update and the campaign was a total success. So I consider knowing Javascript and Front also a need, even if I don’t work directly with it. I want to be able to understand all the requirements and possibilities, and weaknesses, so I can fix bugs and save the day. That allowed me to fix scalability problems in Nodejs and Phantomjs projects too. (They are Javascript Server Side, event driven, projects)
It seems that Amazon.co.uk works well again for Ireland. My two last orders arrived on time and I had no problems of border taxes apparently.
Nice Python article
I enjoyed a lot this article, cause explains part of what I did with my student and friend Albert, in a project that analyzes the access logs from Apache for patterns of attempts of exploits, then feeds a database, and then blocks those offender Ip Addresses in the Firewall.
The article only covers the part of Pandas, of reading the access.log file and working with it, but is a very well redacted article: