Video: Parse the Tables from a Website with Python pandas

First Published: .

A quick video, of 3 minutes, that shows you how it works.

If you don’t have pandas installed you’ll have to install it and lxml, otherwise you’ll get an error:

  File "/home/carles/Desktop/code/carles/blog.carlesmateo.com-source-code/venv/lib/python3.8/site-packages/pandas/io/html.py", line 872, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

You can install both from PyCharm or from command line with:

pip install pandas
pip install lxml

And here the source code:

import pandas as pd


if __name__ == "__main__":

    # Do not truncate the data when printing
    pd.set_option('display.max_colwidth', None)
    # Do not truncate due to length of all the columns
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_rows', None)
    pd.set_option('display.width', 2000)
    # pd.set_option('display.float_format', '{:20,.2f}'.format)

    o_pd_my_movies = pd.read_html("https://blog.carlesmateo.com/movies-i-saw/")
    print(len(o_pd_my_movies))

    print(o_pd_my_movies[0])

Views: 108 views

Rules for writing a Comment


  1. Comments are moderated
  2. I don't publish Spam
  3. Comments with a fake email are not published
  4. Disrespectful comments are not published, even if they have a valid point
  5. Please try to read all the article before asking, as in many cases questions are already responded

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload CAPTCHA.