• Introduction to Internet Marketing Content Studies Pt. 2: Poorly collected data makes for poor analyses

    In a the last post, I detailed the motivation, approach, and challenges of collecting native advertising or suggested content on popular websites. In this post I attempt to explore the data collected and conclude with some remarks on observations, potential insights, and problems or shortcomings involved. Ultimately this remains an experiment and will no longer be pursued. The data can be accessed by others if they might be interested

  • Introduction to Internet Marketing Content Studies Pt. 1: The Web is (still) not an API

    Suggested content on the web, often ignored or blocked, is both ubiquitous and invisible. It's everywhere, and yet goes largely unseen because many elect not to see it, or filter it out mentally when it is present. Information can take on a different value when aggregated and presented outside of its original context. This post details how I originally sought to take on the task of collecting, analyzing, and showcasing those clickbait, suggested content articles that accompany most commercial, high-traffic web sites. But it also explores what I found about how fragile, bloated, and unpredictable the web is and how it is not anything close to an API if you seek to handle its content like structured data.

  • Handling Large JSON Files with Streaming

    jq is an amazing tool for querying JSON, but loading large JSON files into memory is often not possible. Fortunately it has an option for streaming JSON without loading it into memory

  • Saving Stupid Videos: Preserving YouTube

    Attempting to preserve a small, intimate piece of the Internet shows the threadbare future of keeping digital content beyond a few years from now.

  • Review: Linked Data for Libraries, Archives and Museums

    A review of the book 'Linked Data for Libraries, Archives and Musuems: How to Clean, Link and Publish Your Metadata' by Seth van Hooland and Ruben Verborgh

  • The Datenbank Entartete Kunst (Degenerate Art Database)

    The Nazi's attempt to control and or destroy the art they found to be degenerate resulted in a meticulous record keeping of that artwork. The Freie Universit├Ąt Berlin has undertaken the process of turning these records into a publicly accessible database. I took a pass at collecting that data into a sqlite database and went over some of the readily available summary statistics. There are extreme gaps in this data due to some incompleteness of the online database itself as well as a lack of information for its artists currently available on services like DBpedia. As such I don't believe it is totally fit for deep analysis. But I think a surface report of some of its attributes is worth the while and provides a proof of concept for datasets in art history. The statistical accumulation of artistic works within a particular context could offer a interesting launch pad for researchers inquiring into that domain.

  • The Archive as Artistic Spectacle

    How Upon visiting the Biennial at the Whitney Museum in New York and seeing an artist's archive there, the author wondered about the artistry of the archive and how artistic process might be documented in the age of digital tools.

  • The Database Novel

    A post detailing the novel as relational model

  • The Special Collection in the Age of Digital Reproduction - Digital Libraries Part I

    Preliminary thoughts on sharing digital representations of special collections and rare books.


  • Historical GIFs (AKA dpla.gif)

    A twitter bot that pops into the Digital Public Library of America's collection of moving images and posts GIF image excerpts as it goes.

  • Artdealer

    Blacklight Project (solr and rails) implementation for an artdealer's poster collection

  • Wikisewer

    A fork of Wikistream with a focus on capturing the vandalism that occurrs on Wikipedia. Uses Flask and MongoDB.

subscribe via RSS