Video: Parse the Tables from a Website with Python pandas
First Published: .
A quick video, of 3 minutes, that shows you how it works.
If you don’t have pandas installed you’ll have to install it and lxml, otherwise you’ll get an error:
File "/home/carles/Desktop/code/carles/blog.carlesmateo.com-source-code/venv/lib/python3.8/site-packages/pandas/io/html.py", line 872, in _parser_dispatch raise ImportError("lxml not found, please install it") ImportError: lxml not found, please install it
You can install both from PyCharm or from command line with:
pip install pandas pip install lxml
And here the source code:
import pandas as pd if __name__ == "__main__": # Do not truncate the data when printing pd.set_option('display.max_colwidth', None) # Do not truncate due to length of all the columns pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) pd.set_option('display.width', 2000) # pd.set_option('display.float_format', '{:20,.2f}'.format) o_pd_my_movies = pd.read_html("https://blog.carlesmateo.com/movies-i-saw/") print(len(o_pd_my_movies)) print(o_pd_my_movies[0])
Rules for writing a Comment