Librarians and Open Source: We Need Code, Too!

Alex Byrne
Youth Services Librarian
Twitter: @HeofHIShirts


Open Source Bridge 2016
21 June 2016

The Standard Disclaimer

A few things before we begin.

  • The opinions expressed in this presentation are not necessarily those of my employer, whom I very much like working for. Please do not mistake my opinions for theirs.
  • I have attempted to accurately convey the opinions of others based on their published work. If I'm wrong about this, please let me know so I can make corrections.
  • No logo or image used in this presentation, or mention of any other project, implies endorsement from the owners and/or maintainers of those projects.

We Need Code For Ourselves

Libraries Use Code Already...

I should probably start by saying that libraries of all stripes, sorts, and sizes use a significant amount of open source code in our daily operations already. Many of the vendors that we use also have open source libraries or build on those libraries to deliver products and services to their library clients. It's not a wasteland of proprietary software everywhere, even though there's still a lot of that going around.

For example, Foss4Lib tries to provide a directory of some open source applications and (software) libraries used in various types of (people) libraries. And the Libraries Sharing Code list tries to collect all the institituional code repositories that it is aware of.

Apache Solr, for example, can be dropped in as an indexer for records generated by a library using any one of many libraries and programs used to create Machine Readable Cataloging records. Blacklight places a discovery layer on top of the records indexed and searchable by Solr so as to make the searching much more human-friendly. Or one can add a plug-in to many other languages and applications and grab Solr that way.

For a more complete idea, both Evergreen and Koha offer a full integrated library system - Koha uses modules run in web browsers for all of its functions, along with the Koha servers at an institution, Evergreen has client programs and server programs for various operating systems.

The Library Freedom Project wants every public library to function as a Tor exit node, an idea that they were able to obtain funding from the Knight Foundation to try and produce, along with several excellent privacy toolkits and workshops. Because librarians try to respect your privacy against intrusion both from governments and corporations...although the latter is a bit more difficult when we're beholden to them for access to items our users need to do their work. (For more on that, go see last year's presentation, The Public Library Is An (Almost) Open Source Institution).

There's a lot of open source software underpinning a lot of what we do in libraries, whether it's explicitly mentioned or as part of the frameworks that we use to talk across machines and accomplish our work. Setting up infrastructure and making decisions about what software to run is really about choosing from a series of good products to find the one that works best for our applications and users. Where the problems start are twofold - having staff who are experts enough in the chosen solution that they can build and support it, and when we want to take the data that we're generating and to start turning it into useful applications for front-line staff and librarians to put to use.

...But We're Looking For Unicorns

My library system, Pierce County Library System serves an operating area of approximately 565,000 people. Let's say half of them have library cards, which is pretty consistent with data that I've had before. To accomplish the task of helping out approximately 282,500 people with their library tasks, we employ... 16 people, plus 2 managers to smooth the process. They're mostly concentrated on keeping the systems in our 18 locations running smoothly. Three of them are "Software engineers" or "User Experience Designers". One is a Database administrator, and one is a Library Systems Administartor who is our expert on the Integrated Lirbary System (ILS) that we use, Polaris.

Suffice it to say, we have just enough people to keep the place running without being able to devote much time at all to thinking beyond that point. Slowly, things might be getting better (I hear that there are actual developer jobs making their way through all the approvals needed to fund and post them), but at this particular point, we just don't have the staff to do much more than put out fires when they appear and to keep things patched and working.

Come work for your public library, we say. The pay is decent, and you'll enjoy serving an appreciative public. We just need to make sure that you're an expert on several different technologies, to the point where we've already excluded the vast majority of people who might want to work for us, and then we're probably going to understaff them and expect them to be able to run the ship missing a sail or two, because that's what we have funding for.

Part of the problem stems from what Coral Sheldon-Hess calls the "tech pipeline problem" in a post from September of 2015. To wit: many libraries don't think they're in the business of technology, and so they don't provide opportunities for their staffers to grow beyond beginning aspects of technology, coding, and the use of their spaces. We'll cover how this affects the user side in a little bit, but for now, let's focus on the staff development side of the operation.

There aren't that many junior developer positions, or junior sysop positions, or junior database administration positions in libraries. Which leaves two options, according to Sheldon-Hess - learn on your own time, or quit public libraries altogether, learn the necessary skills, and then return to the place you came, bearing your newfound skills.

Learning the skills on your own is a nice option, if you have copious amounts of free time after your work shift (or, for most people not holding a degreed job, shifts) to teach yourself the material, and then go out and find good-looking projects that need development in the skills that you're trying to level up that also have a robust mentor community or a really friendly developer community that doesn't mind taking new people in and letting them whack at things until they get it to work correctly, then elegantly, then according to the style and design of the project. Subscribing to mailing lists like code4lib and reading their work may be a start, but a lot of what goes on there is systems work, useful to people who are in the business of indexing, cataloging, and building code that allows for finding and discovering of the elements in the collections. There are conference announcements, and a jobs list, but to fully understand code4lib, you're probably going to be in a specific segment of library work. There aren't a whole lot of threads on the mailing list about people making small code releases that perform specific functions or make one aspect of library life better, or even that are just an entertaining diversion that got created in a program somewhere.

Then, to add to the degree of difficulty, a large majority of the public-facing workers in libraries identify as women, so they're probably also having to deal with the additional emotional and time costs of household management, which may or may not include child care, and the dangers of being a woman on the Internet or in any other technological space. That, incidentally, is the major problem with the other option that Sheldon-Hess proposes, as being a woman or minority in tech is almost guaranteed to result in a table-flip. Both Stephanie Morillo's keynote from Open Source Bridge 2015 and Kronda Adair's Keynote from Open Source Bridge 2015 put it into very stark terms about what minorities are facing in trying to get into tech sector work. Either way, it's Clown Shoes, because there's no way for anyone to learn how to do this thing at work, being paid to learn. Because there aren't enough people on the staff to have any free time, there's also basically no method for someone to get mentorship on the actual systems they're using, so that interested parties that have data needs could start tinkering with APIs and possibly building their own queries against the database to get back what they want.

So, neither of these options are close to workable, if you are anything other than, well, me. Not me, specifically, as I don't have those necessary skills, but someone who presents, looks, and sounds like me, and can call forth the power of their privilege to gather resources to their side. Calling those options "Clown Shoes" is, at best, being polite.

Vendor Options Are Sometimes Not Helpful

Let's have an example of a regular library problem that could be helped greatly with access to code and development tools. There's some form of SQL database behind all the item and bibliographic records for each of the objects in the library system. Librarians and those in charge of the collection often have to make decisions about what to keep and what to discard in their collection based on the number of times an item has or has not been checked out, and whether those checkouts are recent or old.

Until somewhat recently, my library subscribed to a service called collectionHQ, which processed our records and generated a set of reports for us in our various categories. The two reports most used were:

  • Grubby Items, those that had a high enough lifetime circulation count that they might need replacing on their condition
  • Dead Items, which were items that had not circulated in a set period of time.

The limitations of the system were pretty apparent once we started using them. collectionHQ only allowed us to set global values for collections with regard to when they were grubby, and a global value of inactivity across all collections to see when they were dead. So as soon as an item hit the threshhold for circulations for its collection, it would be on the grubby list, regardless of whether it had been mostly handled by people who were rough with their books or gentle. If an item hadn't circulated in the amount of time that the dead list prescribed, it was dead and appeared on a list to be weeded. That length of time was usually about 180 days, so a work would need to circulate approximately once every six months to avoid getting on the dead list.

As one might guess, neither of these options was particularly effective - many "grubby" items that had been well taken-care of would have more than a few circulations left in them, and many "dead" items were classics, assignments works, or had been really popular at the beginning of the year and were having a lull at that particular point. Or had just been recently added to the collection and were still winding their way through their popularity points. Things like the seventh copy of The Hunger Games would show up as a dead item, despite the series itself having circulated 75 times to that point in the year. The item that needed to be gone was likely the first copy, the one that had been read and checked out the majority of those 75 times, but because it was being checked out, we would have to wait for it to appear on the "grubby" list before the system would believe it was ready for anything. And while we could delay the reappearance of an item on the grubby list by up to 40 circulations (in increments of 10 only), there was no such cure for the Dead list except to think about checking it out and then back in so that it would have a circulation to count against being dead.

collectionHQ did have a useful feature, in that it would allow one location to request another location's dead copy to replace a grubby one of their own, but after the first few times, the collection had shifted to where it was going to stay and nobody was really in the market for trading as the larger locations settled into requesting the copies of the smaller locations to replenish their own higher-circulating stocks. Our selection team and already established processes could replicate that, and in a far more granular manner.

In-House Options Are Sometimes Not Helpful, Too

The library stopped subscribing to collectionHQ in 2015, when it was quite clear that we had exhausted all the use we were going to get out of it, returning to our own ingenuity and systems for collection management. So what do we have as in-house solutions? I know that our library system keeps track of at least three counts relating to checkout associated with each item:

  • The number of checkouts (circulation) the item had this calendar year
  • Its circulation from the last calendar year
  • Its circulation since it was added to the collection.

The system also knows when an item was added to the collection. Given this set of knowledge, it should be easy enough for someone with sufficient knowledge of the field names and SQL syntax to construct a query that asked for all records that met a certain combination of those aspects, as well as limiters like which branch the item belongs to, and whether or not the item itself is sitting on the shelf right now, as it is fiendishly difficult to evaluate an item that is currently out being used.

We have some in-house reports, created by our IT department, that can handle some of these queries and produce reports that can be exported to one of many useful file formats.

  • One tells us which items haven't circulated since a certain date, based on collection and branch.
  • One tells us all the items that are in a collection at a particular branch and have circulated above the threshold asked for in their lifetime.
  • We can look at a particular book's bibliographic record and see a "preview" of how each book in the system did in terms of the three circulation counts. Which would be great, except that records collected in the staff client aren't necessarily exportable to any useful file format with the permissions that a standard librarian gets, or the power search and SQL search options built into the client would be a whole lot more useful.

There is no report that produces any of these elements all together, or that allows someone to select based on branch, acquisition date, and circulation criteria.

This is where I confess ignorance to the intricacies of SQL, so I genuinely do not know whether or not constructing such a query would be awful and horrible and a pain in the tuckus or not. Most of the languages and frameworks I have worked with, like Ruby on Rails, abstract out query formation in favor of using their language's syntax to build requests from the database. Even if I knew how to build the SQL, I don't know the table names where the records are stored and the names of the elements that I'm asking to match against.

If only there were a duplicate copy of the database somewhere that an enterprising coder could run queries against in their preferred language, or some sort of documented API for Polaris ILS that the library system gave access to so that the a power user, staff member, or interested civic or community partner could build an application to go in, grab the requested resources, and kick out a set of records in some sort of useful format.

Unfortunately, the library system has just enough people to be able to keep things running. So they don't have time or resources to dedicate to one weirdo that wants to do things a particular way. Maybe if that weirdo went and gathered some allies who all wanted the same thing, they could devote time and resources to it, but it would have to be a pretty large set of allied people...or someone with true organizational power. Ideally, this sort of thing would be perfect for a developer to cut their teeth on, or use as practice for getting to know the database and/or API better so that a later, more complex request can build on those skills, and at the same time, making one (or more) of the staff people happy that they can get useful data out of the system. After all, if one person requests it, there might be three more that don't know how useful it will be to them until they try it or see how the other person uses it. Since libraries are about sharing useful things with each other, the developed module might spread across library systems and be useful to others.

If you want to see what I managed to accomplish without database or API access, by basically taking outputs of other reports as sample data sets and then building an application that would do what I wanted to do with them, you can examine WeedingHelper on my GitHub page - it's a little over 350 lines of code, including comments and whitespace, based on my limited abilities. Someone with greater Ruby skills than mine could probably take it and do amazing things. Or see how it's done and replicate it in their language, and do amazing things with it. I just can't go any further with it because I don't have access to the things I would want to make it more useful. Clown Shoes for all, I guess.

A Path Forward

Andromeda Yelton, in Chapter 6 of Coding for Librarians: Learning By Example, runs down a list of support that managers can provide to employees that are looking to improve or pick up code skills:

  • time: finding ways for planned projects to include learning new technologies, setting aside time for learning and experimentation, defending this time to upper management
  • books
  • software licenses
  • root privileges, development sandboxes, testing servers, quality hardware: in short, the ability to install and experiment with software
  • conference attendance: supported in time, money, or both
  • workshops: some paying for attendance, others teaching them personally
  • regular study groups, such as the one at the University of Maryland libraries or the George Washington University code reading group
  • courses: online (such as, Code School, RailsCasts, Treehouse) or face-to-face, through tuition remission in the case of academic libraries
  • code review
  • mentorship
  • formal internship programs
  • making coding skills part of supervisees' performance goals, which helps justify other forms of support

That's a pretty big lift for a library system that doesn't have a lot of dollars around, if you think of this support list as something that requires proprietary systems, hefty hardware, and/or formal tuition at an institution of higher learning. Truthfully, for a lot of code and code experiments and ideas, whether as formalized programs meant for public attendance, or as staff training and primarily peer-to-peer learning, can probably be done on old hardware sitting in a sandbox somewhere, running open source tools, development environments, and other free or donated equipment, assuming the IT department is okay with setting it up and making sure that things don't stray from their appointed paths, as well as providing tools for communication across distance (as it's likely the people who are most interested in learning these skills are ones sitting at desks in different locations in the same library system). Getting expertise on board might be the most difficult part of setting up a system like this that allows for experimental learning.

We Need Code For Others

In schools, there's a big move for STE(A)M education. In libraries, this often translates to the creation of or collaboration with Maker spaces to put on programs and provide interesting things for users to do with technology. Three-dimensional printers are a popular option for those spaces, as are more traditional Making tools such as saws, hammers, sewing machines, and video and audio production and editing tools. Creation is the big word at the moment, and working with technology to do something is the backbone of a lot of STE(A)M programs in public libraries.

My library system divides the kinds of questions and requests that the library receives into three large general buckets:

  • Get Me Started
  • Help Me Solve Problems
  • Keep Me Interested

The first bucket is a pretty common one, and what usually comes to mind when people first think of libraries. Someone needs to do something new to them, and the library is a place full of resources and helpful people that can get someone oriented and moving in the right direction. For example, during the Great Recession's worst points, public libraries were inundated with people who hadn't had to apply for work in decades, which meant getting up to speed on where jobs would be posted, learning how to create electronic versions of important documents like resumes and cover letters, and navigating the questions and forms of online job application sites. Our library created the Job and Business Center website and designated computers for extended sessions so that those new to the session could apply and navigate the wilds without worrying about the computer shutting down on them and losing all of their work.

When it comes to STE(A)M education and Maker spaces, public libraries are often touting programs like a Lego Mindstorms programming session or the Hour of Code - introductory elements that assume no prior experience and that are geared toward giving someone a good first experience testing the waters of coding and directing technology to do what you want, instead of solely using someone else's code and materials in the way they want. Many of the programs that you'll find at a local library that are for children are geared this way, as are many of the orientation and equipment usage classes for using the Maker space. These beginning classes have a purpose - it's not going to be productive to throw resources at a person that doesn't understand how to use them or how to take what you're giving them and synthesise it for their own needs.

A major sticking point happens at the shift from "Get me Started" to "Help Me Solve Problems" and "Keep Me Interested". When working with robot programming interfaces, Hour of Code, and other tutorial exercises, the domain of required knowledge is small enough that people who may not have much experience working with code can still be reasonably expert enough to handle most problems and troubleshooting questions within the boundaries of their program or tutorial. Linda W. Braun details the ways that a single tutorial experience like the Hour of Code is insufficient for synthesis, suggesting that programs like Hour of code need to re-seat themselves in a bigger picture that gives their attendees practical experience toward solving problems, meeting educational objectives, or accomplishing self-directed tasks, instead of teaching them that code acquisition is much like other subject learning in schools - learn enough to pass the test and not much else. To achieve this, though, the people putting on the class need more expertise than they currently have.

Clown Shoes Redux

As we saw above, for people who aren't sufficiently privileged to be able to learn and contribute to code projects in their spare time and can't manage to convince anyone that learning code is worth devoting clock time to, either in a junior developer position or as part of a formal agreement with their workplace. No, Learning time that consists of a single hour a week doesn't cut it. We're talking something closer to the 80/20 split that is supposedly Google's way of keeping people interested, as well as the ability to have blocks of uninterrupted time away from public service so as to be able to dig into meaty projects and have mentorship activities, or attend conferences, hack jams, and do things that advance the skill sets needed to then be able to come back and help others solve their problems and go farther. If that's not possible, then the public library has to rely on their community of experts and developers to be willing to volunteer time to mentor, either the staff or the other people, or to be available at certain days and times to help people with their projects. This can mean setting up hack jam times with the library, or announcing and advertising your availability, whether or in person or by a virtual method, in the Maker Space for people to come with questions and inquiries. If those are the ways you want to help, do talk with the staff first. You'll find that many of us will be very happy to assist and promote you and feed you as many people with questions and projects as you want to take.

If there's not a big community, or understandably, not a whole lot of interest in doing for free what one might do for pay, then we're back to figuring out how to do it yourself (Clown Shoes!) or we need better teaching tools. With children of sufficient age, tweens, teens, and adults, it's possible to engage in one-day intensives, or to start thinking about putting together series programming where sessions build on the previous sessions' knowledge. This is a better idea that helps to hit the "Help Me Solve Problems" and "Keep Me Interested" parts of experience that builds actual expertise and, incidentally, helps cement the value of the public library to groups that traditionally don't use it.

Camp Code: The Right Direction

An example that's going in the right direction is a recent program developed at Pierce County Library System - Camp Code: Game Lab. Designed to either be done as two sessions or a one-day intensive, Camp Code takes people with varying skill levels through concepts of programming and game design. Registrants are usually in pairs of one tween or teenager and their grownup. The first session explains various components of a game, then invites the participants to modify familiar games, adding or changing components about them and see how they play differently, before culminating in an exercise to design a playable game using components provided in a plastic baggie. Depending on the inclination of the designers, all sorts of tabletop games are possible from the component kit. By the end of the first session, everyone participating can say they have designed a game and played the games that others have designed. They're often pretty rough, since they have to design the game using a limited number of components within a short time frame, but by emphasising playability as a goal, the focus of the designers doesn't get lost in mythology, backstory, or other components that can be thought of later and added on.

The second session of Camp Code is when the participants actually get to touch computers - after a short video about the impact of the mechanic of Mario jumping first introduced into Super Mario Brothers, a short orientation and explanation of the various components of Scratch, developed by the Lifelong Kindergarten Group at the MIT Media Lab, and a short tutorial about pair programming and the driver/navigator concept, the participants are turned loose on a scavenger hunt to find various components and programming snippets that will accomplish tasks on screen, introducing them to the various menus and blocks of Scratch. Once the scavenger hunt is over, the participants go through a specific tutorial building a paddle-and-ball game, learning a few new and interesting bricks and their effects, before getting to choose between three different types of starter programs to go and play with, building new elements in, analyzing the elements that are already there, and changing elements to suit their own purposes. In this last session, the facilitators are on hand to help with sticky situations or to try and explain useful concepts to the participants, but the creativity of the participants really shines through in being able to build off an already-established structure of work. For example, in this session, one of the participants was trying to understand the nature of how games make movement work, and being able to provide a key insight about scrolling (character stays still, scenery moves) and map tiles (teleport character from right side of current map tile to left side of new map tile) opened up possibilities in that designer's mind.

Here's their review of the program: "This place is awesome because I made my own board game and video game. This place is amazing." That's what we're looking for in libraries.

Lacking the knowledge of how to accomplish this, though, might have made the review a little less awesome, or it could have turned it entirely - "This was a cool program, but the people who were there couldn't answer my questions." Which makes them less likely to come back for another program, or to ask us about questions that we might be able to solve, or to see if we offer resources that might further them on their journey. Without the right tools for the job, the public library loses out on the opportunity to help inspire someone's dreams or to help them continue to make progress toward them.

Please Help Us

Public libraries are great at adaptation, even when it seems like we're adapting to what just was the latest thing right before the new latest thing hits. One of the things your public library is looking for is feedback about what you want from them. If you want to build a hack jam, or a community of developers interested in working on projects together, or if you want your public library to carry materials that will help you with Making and/or teaching things, your public library is probably going to want to help you out. If you bring a lot of people in with you, who are also wanting the same sorts of things, they will probably want to help out even more. The easiest, lowest-effort way of helping a public library be more relevant to you is letting them know what's relevant to you. Insistently, repeatedly, and to the Board of Trustees as well as the regular staff.

If you want to be more involved, offering yourself up as a mentor to the staff is a good thing. Being flexible enough to mentor while staff is on working time is great. If you want to brand it as a coaching gig or something else where you might make a little coin while at it, please remember that while public libraries may appear to rake it a lot of money from their taxing authorities, most of that money is spent in wages and materials as soon as it arrives. There's not a lot of discretionary spending in public libraries, so the less expensive you feel you can be, the more likely you are to get hired. Bring program ideas with you, if you can, or projects that would be helpful for beginners to use and extend, so that those participating feel like they're making useful contributions.

Ask for non-confidential data from the library and show what you can do with that data to help the library better target its resources - circulation and door counts are clunky measures at best for determining effectiveness. A project like Measure the Future thinks that computer vision and liberally scattered sensors could help improve data collection. Maybe they're right, maybe they aren't, maybe you have a better idea. The expertise that you have and the work that you've done with Open Hardware and or Open Source Software is still valuable and relevant, even if the context is different.

Build platform-agnostic tools for front-line staffers to use. If it's attached to an ILS like Evergreen or Koha, that's okay, but it would be (about 20%) cooler if the tools could ingest a report from anywhere, or be hooked up to any sort of database, and then pop out useful things. Ask your public librarians what their pain points and bottlenecks are in their workflows - it will give you a good idea where to target your efforts. I don't particularly like schlepping carts of books back and forth, or having to wand every book on the shelf to see if it needs to be weeded - far better to set some options into a database or report query and let the computer do the sifting, so that all I have to do in addition to plucking the weeding candidates is touch the books to see if they're physically damaged. It will still take time, but it will take a lot less time if the computer can be set up to evaluate based on my criteria beforehand.

And finally, if you can afford to, get a job with us! Having expertise in-house on open source software and hardware will make it more likely for conversation and hopefully, adoption of those tools. It changes the money being spent from vendors that may or may not listen to us, or worse, that will go out of business and leave us without the ability to adapt or continue using their technologies, to spending it on salaries and expertise that will advance the community of developers and the projects in use through the organic method of having to solve library-related problems with them.


This presentation is licensed under a Creative Commons 4.0 Attribution-ShareAlike License. Please credit Alex Byrne (abyrne at piercecountylibrary dot org) or HeofHIShirts for the attribution requirements of any derivative works. Or if you like this work and would like to quote it at someone else.

Any content linked to or used in this presentation, and any video or audio recordings of this presentation may not be governed by this license - please check with the appropriate content owners before reusing any content not specifically governed by this license.