Video: Parse the Tables from a Website with Python pandas
First Published: .
A quick video, of 3 minutes, that shows you how it works.
If you don’t have pandas installed you’ll have to install it and lxml, otherwise you’ll get an error:
File "/home/carles/Desktop/code/carles/blog.carlesmateo.com-source-code/venv/lib/python3.8/site-packages/pandas/io/html.py", line 872, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
You can install both from PyCharm or from command line with:
pip install pandas pip install lxml
And here the source code:
import pandas as pd
if __name__ == "__main__":
# Do not truncate the data when printing
pd.set_option('display.max_colwidth', None)
# Do not truncate due to length of all the columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 2000)
# pd.set_option('display.float_format', '{:20,.2f}'.format)
o_pd_my_movies = pd.read_html("https://blog.carlesmateo.com/movies-i-saw/")
print(len(o_pd_my_movies))
print(o_pd_my_movies[0])
Rules for writing a Comment