Posts Tagged ‘Microsoft Excel’

Java, John and JournalTOCs

Posted on April 17th, 2012 by Paul Stainthorp

I heard recently that the ticTOCs journal tables-of-contents service will close down in the next month or so. ticTOCs was a JISC-funded project which hasn’t been developed for several years now.

Screenshot of ticTOCs

It’s effectively been superseded by the JournalTOCs service, “the largest, free collection of scholarly Tables of Contents (TOCs)”. The outgoing service has published some advice for users on transferring saved lists of TOCs between ticTOCs and JournalTOCs.

ticTOCs did have one particularly useful feature: a text file of all the TOCs it contained (at http://www.tictocs.ac.uk/text.php), which I’ve been filtering and using since 2009 to create a custom package of RSS feeds for upload to the e-journals A-to-Z at Lincoln.

While JournalTOCs doesn’t provide the same simple text list feature, it does have a fully-documented API. This is much more powerful and flexible for developers, but it’s not quite so straightforward as /text.php to create my list (a subset of all the feeds in JournalTOCs, matching only those journals to which the University has full-text access) using desktop tools and no programming.

A chance comment from a colleague at another university about Lincoln having “developers coming out of its ears“(!) inspired me to ask on the LNCD development group for help.

Dr John Murray of the Lincoln School of Computer Science responded, and very kindly supplied a Java program which I can use to identify which journals in our A-to-Z are represented in JournalTOCs, and so build a list of links to valid RSS feeds. Starting with a comma-separated list of ISSNs (which I downloaded from the A-to-Z), the program takes each ISSN in turn and makes a call to the JournalTOCs journals API. Depending on the data returned by JournalTOCs, the program records each ISSN as ‘VALID’ or ‘INVALID’ (i.e. no RSS feed available) in a new .csv file.

Thank you very much, John!

[Aside: to use John's code I had to learn how to compile and run Java programs on my laptop (running Ubuntu 11.10). For the record—and because I imagine it'll be useful again in the future—I first had to install OpenJDK 6 by going to the terminal and running the command:

sudo apt-get install openjdk-6-jdk

…then, once OpenJDK had installed, using the following command to select the correct version of Java:

sudo update-alternatives --config java

…before compiling and running the program itself.]

Once all the ISSNs had been checked against the API and the validated list constructed (this took ~5hrs to run!), I used Microsoft Excel to filter out only the ‘VALID’ ISSNs matched in JournalTOCs, and used Excel’s =LOOKUP() function to pull in enough information about each journal from our managed title list (previously downloaded), to create a custom upload text file.

Screenshot of the A-to-Z

The updated package of journal article RSS feeds is now available to view on the A-to-Z. We’ll review and re-generate this every few months, as we do with all custom and publisher-generated e-journal packages. At the time of writing, it contains just over 10,000 journal article RSS feeds, each one corresponding to one of our full-text journals. I’ve also added an orange RSS icon and link to JournalTOCs for each one, using the A-to-Z’s public notes feature.

So: which other library APIs will accept an ISSN as an input, and what other custom packages could I create using John Murray’s code in the same way?

Library Impact Data Project: good news, everybody!

Posted on June 18th, 2011 by Paul Stainthorp

I think this is worth re-posting from the LIDP blog:

LIDP graphicWe are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

We got print journals in your electronic journals! We got electronic journals in your print journals!!

Posted on September 14th, 2010 by Paul Stainthorp

While my colleagues are busy adding 856 $u tags to MARC records, I’ve been using our Library’s list of current subscriptions to update information about print journal holdings on our e-journals website.

I first did this in May 2009: I wasn’t even sure if it was a job worth doing until I discovered that links to current print holdings wers accounting for one in every fifteen links followed out from the University’s [supposedly "electronic"] journals A-to-Z site.

Photo - me, "reading" a "journal", "yesterday"

(Photo - me, “reading” a “journal”, “yesterday”.)

The Library has only recently transferred many of its print-only (or print+”e”) journal subscriptions to electronic-only, so a freshly-dumped file of our print holdings data was needed. (The shift in subscription format aside, dozens of minor title changes had left confusing, semi-orphaned records in the A-to-Z.)

Screenshot of print holdings on the University of Lincoln's e-journals A-to-Z websiteIt turns out that, from 600 print+”e” titles in May 2009, the University now has 319: the rest, presumably, having gone “e”-only or having been cancelled outright.

The custom ‘package’ of data for our e-journals A-to-Z site was created by taking a list of the ISSNs, plus holdings, for all of our current subscriptions; filtering out those ISSNs which don’t appear in our A-to-Z (i.e. those which are print-only); then using a =LOOKUP() function in Microsoft Excel to pull across a matching title from our e-journals knowledgebase.

Next: to see if usage of this package, as a percentage of total e-journal usage at the University of Lincoln, changes [from 7% last year] over the coming academic year, now that—presumably—the relative demands for electronic vs. print journals have themselves changed at the University …and now that there are only half the print+”e” titles to choose from.

Google magazines (slight return)

Posted on August 20th, 2010 by Paul Stainthorp

I’ve just recreated my list of magazines from Google Books for the University’s e-journals site.

Google now hosts 199 digitised magazine titles, and for the sake of 10 minutes’ work every few months it would be a shame to miss out on the extra full-text coverage, which often complements the “library” sources for a title.

E.g. for the frankly un-put-downable Estonian Journal of Archaeology (available as an Open Access (OA) journal from 2006-, and indexed in Art Full Text), Google provides the missing articles from 1997 (vol.1) up to 2006.

Record for the 'Estonian Journal of Archaeology' on the University of Lincoln's list of e-journals.

I’d like to be able to harvest the Google Books content to build my list using the standard mashlib toolkit (Google spreadsheets; Yahoo! Pipes; some coffee)… but while use of Google’s =ImportHtml() function is limited to 50 per spreadsheet, and because Google search pages block robots.txt files, I can’t figure out a way of doing so.

Instead, I’ve been copying-and-pasting the search results pages into an ordinary Microsoft Excel spreadsheet (thanks, again, Google, for making this possible through your magazine browse page), then using a custom Excel function to ‘unmask’ the URL hidden behind each hyperlinked magazine title.

Google Books magazine browse page, pasted into an Excel spreadsheet.

Finally, I use a bit of text-to-column splitting, search/replace, and filling-in of package-wide fields, to give me a compatible, tab-delimited text file which I then upload to our e-journals knowledge base (which happens to be EBSCO A-to-Z) – I used EBSCO’s custom notes feature to link to Google’s cover image to each entry in the file.