Visualizing Wikidata with Python
Jay Winkler and Rebecca Y. Bayeck
For this project, we used Python to run SPARQL queries against Wikidata and turn the results into interactive visualizations. The main tools are SPARQLWrapper for querying and Plotly for visualization.
Setup
The notebooks were built in Google Colab, so setup is minimal:
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
Running a Query
After defining a SPARQL query string (see the SPARQL post for examples), you send it to Wikidata and convert the JSON response into a pandas DataFrame:
sparql.setQuery(query_string)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
df = pd.json_normalize(results['results']['bindings'])
The resulting DataFrame has columns like artistLabel.value, occupationLabel.value, etc. – one column per SPARQL variable, with Wikidata’s JSON structure flattened out.
Cleaning the Data
The raw DataFrame usually needs some cleanup:
- Rename columns to drop the
.valuesuffixes - Handle duplicates – artists with multiple occupations or residences appear in multiple rows. Use
groupbywith string aggregation to collapse them - Parse dates – birth/death dates come as ISO strings and need trimming to just the year
Visualization
We used Plotly to build interactive charts from the cleaned data – bar charts of occupation distributions, maps of birthplaces using geocoded coordinates, and breakdowns of which museum collections hold the most works. The notebooks include the full visualization code with outputs.
Notebooks
All notebooks are available in the wikidata repository:
- Querying_and_Visualizing_Wikidata.ipynb – Full analysis with SPARQL queries and Plotly visualizations
- Wikidata_SPARQL_Queries.ipynb – Reference guide for query syntax
- querying_and_visualizing_wikidata.py – Standalone script version (no Jupyter needed)
For a deeper discussion of the analysis, see the Scholars Studio blog post: Visualizing Wikidata: Using Python to Analyze Identity and Representation.