Archive News #2 – Tags

Welcome to our second Archive news post! These regular posts are a venue for us to answer some frequently asked questions about the Archive of Our Own. Everything you wanted to know about tags–right under the cut!

Please leave your questions about the Archive in comments and we’ll answer them in upcoming posts. (This is a space for more general questions – if you have specific comments about the design or usability of the Archive please send feedback on the Archive site itself, so it goes into our bugfix and design process).

This week we’re looking at something quite specific: the way tags are used on the Archive. This is a bit more detailed than a lot of the posts we plan to have in this slot, but as tags work a bit differently in the Archive than on other sites you may use and we’ve had a lot of questions on them, we thought we’d do a special feature.

Where are tags used on the Archive?

Tags can be used on works (ie your fics) and on bookmarks. For the rest of this post, we’ll mainly be discussing the way tags work on works (bookmarks are a bit simpler).

For the purposes of the archive, almost all the metadata on works – that is Category, Warnings, Rating, Fandom, Characters, Pairings, and what is shown as Tags (when you enter them on the work form they’re known as ‘Freeform tags’) – is treated as a tag. So, all the bits in the grey header box on a work are tags:

Archive: work header box

The advantage of doing it this way is that it should be easier for people to search on any field they wish. If you click on a tag, then you will be shown all the works using that tag. Tags are also used to populate our search filters, which help you drill down through all the available fics.

What format do tags have?

  • They can have spaces – so you can put Harry Potter rather than Harry_Potter.
  • They can be 42 characters long – we may allow longer tags in the future, but we have to have some limit on the length, or our database might melt.
  • They can’t have commas – tags are comma separated at input, so if you include a comma, then the database assumes you’re making a new tag (apologies to fans of ‘At Swim, Two Boys’ and other comma-loving fandoms).

What’s so special about the Archive tags?

The tags are a bit more organised than you’ll find on sites like Delicious. For those of you who are metadata-inclined, we’re using a sort of hybrid of folksonomy (user-defined tags) and classification. What this means is:

  • In most fields, our users can enter any tag, in exactly the form they want it.
  • Behind the scenes, our team of tag wranglers classify and make connections between tags, building a structure which adds extra meaning and helps make tags as useful as possible to all users.

The reason for this is to ensure maximum flexibility for authors, and maximum ease of searching for readers.

What else do tag wranglers do? How do the tag relationships work?

Our tag wranglers work to ensure that tags give our users as much meaningful information as possible, by hooking up related tags and clearing up ambiguities. The full details of how tags work behind the scenes and what tag wranglers do with them are fairly complex. However, some of the most important things are as follows:

* Some tags are marked as ‘canonical’. These are the tags that are the most meaningful when viewed alone. So, Dean Winchester/Sam Winchester would be canonical, while Sam/Dean wouldn’t. All the other versions are connected to this tag. Only canonical tags are visible in our filters, so you won’t see every tag you ever used in there, but since the canonical versions are hooked to all the others the filters still search every possible version of a tag.
* Some tags are ‘ambiguous’. These are the tags which could mean more than one thing. For example, the tag Dean could refer to Dean Winchester in Supernatural, Dean Forester in Gilmore Girls, or a whole host of other Deans. We are planning to introduce a special behind the scenes tag category called ‘Ambiguity’. In this case, the tag wranglers will mark the tag as ambiguous but also hook it up to all the possible canonical tags it could refer to.
* Tags are given relationships which put them in context. So, character tags John Sheppard and Rodney McKay ‘belong’ to the fandom tag Stargate Atlantis.

Some tags appear with ‘freeform’ or ‘character’ after them. What’s that all about?

Each tag belongs to a category – Pairing, Character, Fandom, etc. Tag names must be unique, so if a tag already exists in one category, then when it is used in another category, the Archive will automatically add the name of the category.

For example:
* Buffyfan1 puts ‘Buffy the Vampire Slayer’ in as a fandom tag. A new tag is created.
* Buffyfan2 puts ‘Buffy the Vampire Slayer’ in as a character tag. Oh noes – this tag name is taken and can’t be reused! So, the Archive changes it to ‘Buffy the Vampire Slayer – character’.

Result:
* XanderGirl5 can click on ‘Buffy the Vampire Slayer – character’ and find all fics which feature Buffy as a character.
* The Archive has unique tag names and so the database does not melt.
* Everyone is happy.

People might use all kinds of crazy tags! Do I need to search for every possible variation of a tag?

Nope. Thanks to the work of our tag wranglers, different tags which mean the same thing are marked as synonymous. So, some authors will mark their fic Harry Potter/Severus Snape, others will put Harry/Severus and still others will put Snarry. Behind the scenes, the tag wranglers hook all these together so the Archive knows that if you search for Snarry you also want fics tagged with the other possible ways of describing the pairing.

My tag has been categorised wrongly! What should I do?

From time to time, there’s a chance the tag wranglers will make a mistake when they wrangle your tag. Maybe you only write Death Note fic based on the anime, and yet your fic is coming up when you search for Death Note (manga). Our tag wranglers are only human, and they don’t know every fandom in existence, so what’s obvious to you might not be clear to them. If you notice a mistake, then please let us know via the Archive Feedback Form.

How can I make sure the tag wranglers don’t make mistakes with my tags?

Be generous with your taggings! The more information you put in the easier it is for tag wranglers to make sense of what you meant.

  • Give full names (first name, family name) for characters and pairings, unless that’s not appropriate for some reason (i.e. ‘Angel’ from AtS and BtVS just goes by the one name).
  • If your fandom could be ambiguous – for example, you write Death Note fic for the anime only, not the manga – then add more detail. You can put Death Note (anime) in the Fandom field, or just add anime in the freeform tags.

What about RPF?

The Archive of Our Own is RPF-friendly! However, RPF is one of the most difficult areas for tag wranglers, since the way people classify their fandoms varies a lot. Do we use fandom name? Network name? What about general groups of people, like “Canadian Actors”, or historical figures?

We’ve come up with a concept for dealing with this, but we haven’t finished building it, so please bear with us.

What about crossovers?

We love crossovers too! However, we don’t have specific tags for crossovers, since crossovers are just fics with more than one fandom.

Please just enter each fandom in your crossover in the Fandom field, separated by commas. Don’t separate with slashes, as this creates a tag which we can’t wrangle into our search structure. Feel free to add Crossover as a freeform tag, though.

Can I become a tag wrangler?

Yes, probably! We will always need tag wranglers to keep the Archive tags in line. We are looking to add more diverse fandoms to our teams, so while we won’t take on everyone who applies (some fandoms are already well-represented) we’d love to hear from anyone who thinks they can help. If you’ve noticed a fandom languishing unwrangled for a while, it probably means that we have nobody with expertise for it – if you could fix that, please let us know! Anime, manga and comics are currently particularly under-represented.

Some known bugs with tags

We think we’ve fixed all our known bugs with tags, but there are a few which were extremely noticeable / annoying, so we’re listing them here. If you’ve encountered some other weirdness, do let us know via the feedback form.

  • Filtering not specific enough in some cases – e.g. filtering for Stargate Atlantis also brought up fics for SG-1. This was caused by the way tag relationships were working for fandoms with common characters. This should be fixed as of our deploy in late May.
  • ‘No fandom’ showing up as a possible fandom in the filters. This was caused by our need to mark tags which don’t belong to a specific fandom (for example, schmoop). It showing up in the filters was a bug, and should be fixed as of our deploy in late May.

We hope this post answers a few of your questions! Please leave other questions and comments here. We won’t be answering comments on this post directly – we’ll put your feedback into our pool of things to answer in future posts.

Notes from the Open Video Conference, Day Two

Summary of a couple of panels on Day 2:

Automated DMCA Takedowns and Web Video: Scott Smitelli, a professional sound designer and editor, is the fellow who wrote Fun with YouTube’s Audio Content ID System, in which he tried to test out the limits of YouTube’s fingerprinting system for audio. Conclusions: the software is mainly interested in the first 30 seconds of a song, and can be thwarted by pitch or time alterations of over 6% (which may be unhelpful to the musically sensitive among us, but there you go.) Kevin Driscoll and others from YouTomb discussed the January Massacre: the massive increase of takedowns in December, 2008 and January, 2009. On a graph, it looks like takedowns have dropped off since then, but that may be deceptive: in fact, it seems like things are being detected so fast (within ten minutes) that YouTomb can’t keep track of them, or to put it another way: takedowns are low because stuff’s never getting UP in the first place. A suggestion: that it would be great if every takedown left a webpage with a card saying, “This has been taken down,” because in many cases, people are not aware of what they can’t have. Oliver Day, also from YouTomb, told a chilling story: the original filmmaker who shot the clouds that were used in the Anonymous anti-Scientology ads had his original footage taken down–not in deference to those ads, but in deference to a Huffington Post anti-Giuliani parody of those ads. As Day put it, “The power is with the powerful”: even though the original filmmaker’s footage was there first, it was assumed that he was infringing the Huffington Post, and not the other way around.

Who Owns Popular Culture? Remix and Fair Use in the Age of Corporate Mass Media: This was the panel hosted by Jonathan McIntosh and featuring animator Nina Paley (of Sita Sings The Blues, Neil Sieling from the Center for Social Media, political remixer Elisa Kreisigner, Karl Fogel from questioncopyright.org, and OTW Board Member Francesca Coppa. The panel largely discussed what the policing of online video and the over-enforcement of copyright means for artists, remixers, and those interested in free speech. Nina Paley answered the question literally, by providing a list of who owns popular culture–or in her case, literally, the songs, mostly from 1927-28, that she used in Sita Sings The Blues, while Elisa Kreisinger evoked many the important visual artists, from Duchamp to Koons to Kruger to Lichtenstein to Warhol, for whom remixing and recontextualizing pop culture was a key artistic move. (She also showed her remixes of the Queer Housewives of New York City.)

Notes from the Open Video Conference, Day One

Francesca Coppa, Naomi Novik, and head coder Elz spent the day at the Open Video Conference in NYC today. The conference is primarily about building architecture for online video as well as open source software more generally, so you can see why we were interested. (We’re keeping a close eye on the emerging technologies that might make a Vidding Archive Of Our Own more feasable and efficient.)

Some highlights from today’s programming:

Independent Video Platforms: Representatives from various independent video spaces, mostly dealing with issues of social justice or alternative media, showcased their sites. (My favorite was India’s Pad.ma, a beautifully designed digital archive designed to contextualize its footage and work in both high-bandwidth and low bandwidth situations.)

Emerging P2P Technologies: This was a glimpse into a wildly exciting and very near future: streaming from bitorrents. The guys at P2P Next are working on something called the Swarmplayer, which allows you to stream from torrents, which means that you can create a YouTube like video archive with none of the server or infrastructure costs. Imagine a video archive where you can stream or download or both, and where having a popular vid doesn’t kill your bandwidth, it increases your download speed. Imagine being able to watch anything currently being torrented through streaming, on-demand. (You can test Swarmplayer now, though you can only watch two videos; the researchers say we can expect a full version to be released in November, 2009.)

How to Make a Political Remix Video: Political remixer and friend of the OTW Jonathan McIntosh has been showcasing fan vids on his site, politicalremixvideo.com. Now he’s made what he calls a vidding-influenced political remix video critiquing Twilight, Edward Meets Buffy (Twilight Remixed), which he premiered at the conference. Vidders, he’d love to hear what you think, so check out the video (embedded below, or linked on blip, which provides higher quality; vidders might check out blip as a replacement for YouTube or iMeem.)