Posts

Aug 15, 2017
Introduction to Internet Marketing Content Studies Pt. 2: Poorly collected data makes for poor analyses

In a the last post, I detailed the motivation, approach, and challenges of collecting native advertising or suggested content on popular websites. In this post I attempt to explore the data collected and conclude with some remarks on observations, potential insights, and problems or shortcomings involved. Ultimately this remains an experiment and will no longer be pursued. The data can be accessed by others if they might be interested
Jun 17, 2017
Introduction to Internet Marketing Content Studies Pt. 1: The Web is (still) not an API

Suggested content on the web, often ignored or blocked, is both ubiquitous and invisible. It's everywhere, and yet goes largely unseen because many elect not to see it, or filter it out mentally when it is present. Information can take on a different value when aggregated and presented outside of its original context. This post details how I originally sought to take on the task of collecting, analyzing, and showcasing those clickbait, suggested content articles that accompany most commercial, high-traffic web sites. But it also explores what I found about how fragile, bloated, and unpredictable the web is and how it is not anything close to an API if you seek to handle its content like structured data.
Mar 28, 2016
Handling Large JSON Files with Streaming

jq is an amazing tool for querying JSON, but loading large JSON files into memory is often not possible. Fortunately it has an option for streaming JSON without loading it into memory
May 9, 2015
Saving Stupid Videos: Preserving YouTube

Attempting to preserve a small, intimate piece of the Internet shows the threadbare future of keeping digital content beyond a few years from now.
Aug 6, 2014
Review: Linked Data for Libraries, Archives and Museums

A review of the book 'Linked Data for Libraries, Archives and Musuems: How to Clean, Link and Publish Your Metadata' by Seth van Hooland and Ruben Verborgh
May 29, 2014
The Datenbank Entartete Kunst (Degenerate Art Database)

The Nazi's attempt to control and or destroy the art they found to be degenerate resulted in a meticulous record keeping of that artwork. The Freie Universität Berlin has undertaken the process of turning these records into a publicly accessible database. I took a pass at collecting that data into a sqlite database and went over some of the readily available summary statistics. There are extreme gaps in this data due to some incompleteness of the online database itself as well as a lack of information for its artists currently available on services like DBpedia. As such I don't believe it is totally fit for deep analysis. But I think a surface report of some of its attributes is worth the while and provides a proof of concept for datasets in art history. The statistical accumulation of artistic works within a particular context could offer a interesting launch pad for researchers inquiring into that domain.
Apr 22, 2014
The Archive as Artistic Spectacle

How Upon visiting the Biennial at the Whitney Museum in New York and seeing an artist's archive there, the author wondered about the artistry of the archive and how artistic process might be documented in the age of digital tools.
Mar 30, 2014
The Database Novel

A post detailing the novel as relational model
Mar 13, 2014
The Special Collection in the Age of Digital Reproduction - Digital Libraries Part I

Preliminary thoughts on sharing digital representations of special collections and rare books.

Projects

Historical GIFs (AKA dpla.gif)

A twitter bot that pops into the Digital Public Library of America's collection of moving images and posts GIF image excerpts as it goes.
Artdealer

Blacklight Project (solr and rails) implementation for an artdealer's poster collection
Wikisewer

A fork of Wikistream with a focus on capturing the vandalism that occurrs on Wikipedia. Uses Flask and MongoDB.