People and Networks

Abstract

The most complicated entities involved in the exchange of learned letters are the people who exchange them and the networks created by the exchange. This complexity is compounded by the fact that people and networks cannot really be separated: typically, the exchange of letters rests directly or indirectly on pre-existing networks of social exchange, so correspondence networks can be fully understood only with reference to data documenting non-epistolary as well as epistolary contact. After identifying research questions and drawing inspiration from previous work (I), WG 2 will devise a data model for the prosopographical information required to answer these questions and an input form as clear, simple, intuitive, and flexible as possible (II). It will then identify major electronic sources of relevant structured data, enable access to them (III), and design and create tools to help scholars reconcile and fill in gaps in their data in a semi-automated fashion (IV). The final task will be to devise tools for visualizing and analysing data sets — from individual people to multi-dimensional networks — in order to answer their research questions (V).

WG2 will be responsible also for establishing the technical grounding of this infrastructure as a whole, which will be based on the following general principles:

  1. Use open Linked Data data models and modularize tools behind open APIs where possible to maximize flexibility and reusability;
  2. Where possible, build upon open source tools and release own contributions as open source;
  3. In the future, following these two considerations will enable
    • new actors to join the network more easily,
    • new visualization and other tools to be developed on top of the framework with less effort,
    • gradual improvement and evolution of the tools and infrastructure itself, and
    • gradual improvement of data quality.

WG 2 is led by Eero Hyvönen, Professor in both the Department of Media Technology at Aalto University and the Department of Computer Science at the University of Helsinki.

He is also Research Director of Aalto’s Semantic Computing Research Group.

wg2s

Agenda


  1. Review prior work
    • Study the harvesting, analysis, and visualization of material from the Oxford Dictionary of National Biography undertaken by the ‘Six Degrees of Francis Bacon’ project. Study the prospects for integrating this approach with the scholarly crowd-sourcing of prosopographical entries. Consider also the feasibility of a similar approach to other leading national biographical dictionaries;
    • Analysis of epistolary metadata alone, which studies the manner in which information can be exchanged through networks documented by correspondence alone (Ruth and Sebastian Ahnert’s work with the State Papers Online for Tudor England);
    • Analysis of people mentioned (co-citation proximity analysis), which reveals which people were most commonly discussed in proximity to one another in collections of correspondence;
    • Analysis of relationships implicit in major biographical sources such as national biographical dictionaries;
    • Analysis of relationships within prosopographical data, which maps the data on familial, personal, professional and other relationships captured prosopographically (for instance, in the Cultures of Knowledge project);
    • Any combinations of the above.
  2. Formulate use cases and research questions (coordinated with step 3, identifying sources that could help answer these questions).

  1. Review related work on representing people and networks (VIAFULANVIVOBIOCIDOC CRMRELATIONSHIP).
  2. The basic model is likely to consist of ‘event streams’, documenting a specific event in the biography of an individual, typically relating to other persons at a specific place and time.
  3. The model should focus on the kinds of events typically central to the life of an early modern intellectual, such as schooling, university study, academic travel, membership in learned societies (formal and informal), and stages in learned careers.
  4. Reconstructing the social networks underlying correspondence networks likewise requires focus on location and contact histories.

  1. Evaluate known data sources for suitability (coordinated with step 1.2)
    • Authority files (VIAF, CERL, GND, ULAN, DBPedia, Freebase);
    • Publication data (BNF, DNB, BNB, OCLC WorldCat, EEBO, ECCO);
    • Appearances in letter metadata (EMLO, EE, CKCC, CEEC);
    • Appearances in letter texts (CKCC/ePistolarium,CEEC, EE) [coordinate with WG3];
    • Geographical gazetteers (TGN, Pleiades) [coordinate with WG1].
  2. Enable access to the data sources, either through utilizing existing APIs or importing and creation of such APIs.
  3. As a means of reciprocity, arrange equivalent API access to data created as part of the work of WG2 (e.g. EMLO internal authorities) and negotiate exchange with relevant external agencies (e.g. VIAF/CERL).

  1. Arrange user tests of EMLO web-form for manual collection of prosopographical data.
  2. Investigate semi-automated matching of personal name variants (using e.g. Silk/OpenRefine) on the basis of work undertaken by Cultures of Knowledge from April 2015 onward.
  3. Devise automated means of providing editors with relevant entries in standard biographical dictionaries, to inform record input, matching and disambiguation. This work will build on the Letter Metadata Prototype devised in Halle and further work being undertaken by Cultures of Knowledge in 2015.
    • Precondition: compile a list of web-mounted biographical dictionaries covering the early modern period (a task for a scholarly sub-committee).
  4. Devise automated techniques for identifying and encoding personal names within textual corpora, in order to code letter texts and to enrich letter records with people mentioned (building on work in WG 1.10).

  1. Evaluate the suitability of existing tools (e.g. Palladio, Europeana4D, VISU) with regard to answering the research questions derived in step 1.2.
  2. Develop (open) functionality to fill gaps in existing tools.
  3. This exploration will be enriched and informed by conducting one or more pilot projects based on suitable (sub)corpora, illustrating potential for larger follow-up projects, for instance:
    • Epistolary metadata + people mentioned: Circulation of Knowledge/ePistolarium;
    • Epistolary metadata + prosopographical data: Cultures of Knowledge/EMLO.
  4. Evaluate results.