Collaborator: Howie Lan, Prof. Lewis Lancaster, Jeffrey Shaw
This project integrates the Chinese Buddhist Canon, Koryo version Tripitaka Koreana, into the AVIE system (a project between ALiVE, City University Hong Kong and UC Berkeley). This version of the Buddhist Canon is inscribed as UNESCO World Heritage enshrined in Haeinsa, Korea. The 166,000 pages of rubbings from the wooden printing blocks constitute the oldest complete set of the corpus in print format. Divided into 1,514 individual texts, the version has a complexity that is challenging since the texts represent translations from Indic languages into Chinese over a 1000-year period (2nd-11th centuries). This is the world’s largest single corpus containing over 50 million glyphs, and it was digitized and encoded by Prof Lewis Lancaster and his team in a project that started in the 70s.
The Blue Dots project undertaken at Berkeley as part of the Electronic Cultural Atlas Initiative which abstracted each glyph from the Canon into a blue dot, and gave metadata to each of these Blue Dots allowing vast searches to take place in minutes which would have taken scholars years. In the search function, each blue dot also references an original plate photograph for verification. The shape of these wooden plates gives the blue dot array its form.
As a searchable database, it exists in a prototype form on the Internet. Results are displayed in a dimensional array where users can view and navigate within the image. The image uses both the abstracted form of a “dot” as well as color to inform the user about the information being retrieved. Each blue dot represents one glyph of the dataset. Alternate colors indicate position of search results. The use of colour, form, and dimension to quickly communicate understanding of the information is essential for large data sets where thousands of occurrences of a target word/phrase may be seen. Analysis across this vast text retrieves visual representations of word strings, clustering of terms, automatic analysis of ring construction, viewing results by time, creator, and place. The Blue Dots method of visualization is a breakthrough for corpora visualization and lies at the basis of the visualization strategies of abstraction undertaken in this project. The application of an omnispatial distribution of these texts solves problems of data occlusion and enhances network analysis techniques to reveal patterns, hierarchies and interconnectedness. Using a hybrid approach to data representation, audification strategies will be incorporated to augment interaction coherence and interpretation. The data browser is designed to function in two modes: the Corpus Analytics mode for text only and the Cultural Atlas mode that incorporates original text, contextual images, and geospatial data. Search results can be saved and annotated.
The current search functionality ranges from visualizing word distribution and frequency to other structural patterns such as the chiastic structure and ring compositions. In the Blue Dots 360 version, the text is also visualized as a matrix of simplified graphic elements representing each of the words. This will enable users to identify new linguistic patterns and relationships within the matrix, as well as access the words themselves and related contextual materials. The search queries will be applied across classical Chinese and eventually English, accessed collaboratively by researchers, extracted and saved for later re-analysis.
The data provides an excellent resource for the study of dissemination of documents over geographic and temporal spheres. It includes additional metadata such as present day images of the monasteries where the translation took place, which will be included in the data array. The project will design new omnidirectional metaphors for interrogation and the graphical representation of complex relationships between these textual datasets to solve the significant challenges of visualizing both abstract forms and close-up readings of this rich data. In this way, we hope to set benchmarks in visual analytics, scholarly analysis in the digital humanities, and the interpretation of classical texts.