Category Archives: Data Analysis

Video: Parse the Tables from a Website with Python pandas

A quick video, of 3 minutes, that shows you how it works.

If you don’t have pandas installed you’ll have to install it and lxml, otherwise you’ll get an error:

  File "/home/carles/Desktop/code/carles/blog.carlesmateo.com-source-code/venv/lib/python3.8/site-packages/pandas/io/html.py", line 872, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

You can install both from PyCharm or from command line with:

pip install pandas
pip install lxml

And here the source code:

import pandas as pd


if __name__ == "__main__":

    # Do not truncate the data when printing
    pd.set_option('display.max_colwidth', None)
    # Do not truncate due to length of all the columns
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_rows', None)
    pd.set_option('display.width', 2000)
    # pd.set_option('display.float_format', '{:20,.2f}'.format)

    o_pd_my_movies = pd.read_html("https://blog.carlesmateo.com/movies-i-saw/")
    print(len(o_pd_my_movies))

    print(o_pd_my_movies[0])