Reading File Attributes And Metadata Using C#
March 10, 2014I like learning new languages - programming languages of course, but not exclusively. That’s why I am attending Spanish lessons once a week. In one of these lesson, it happened that a fellow student approached me with a USB drive full of useful vocabulary mp3 files. Each of these mp3 file contains one German-Spanish word or phrase. Very useful for brushing up one’s vocabulary skills when driving to work for for example.
But this fellow student had noticed another useful detail of this vocabulary list. The advanced properties of each mp3 file contained the full text for both the Spanish word and the German translation (and vice versa) in the file’s Title attribute. Even more metadata contained information about the word class (e.g. noun, adjective, verb, …) and the category of the vocabulary (e.g. sports, food and drink, …).
So his question was, could we come up with a small piece of software that would extract the vocabulary and their translation as well as the additional metadata and create a nice little Excel spreadsheet out of it. And boy we could …
Nuget Package - Windows 7 API Code Pack
Most easily this kind of metadata extraction can be done with a NuGet package called Windows 7 API Code Pack. This can be installed as usual using the graphical interface ….
… or the command line.
PM> Install-Package Windows7APICodePack-Shell
This package provides a couple of really useful helper objects around Windows Shell functionality. In this case the ShellObject class together with the vast collection of SystemProperties came in very handy for accessing the files’ metadata. All that was left to to was slicing up strings and writing all the data to a CSV file. Cake walk. Here is all the code it took:
using System.IO; using System.Text; using Microsoft.WindowsAPICodePack.Shell; using Microsoft.WindowsAPICodePack.Shell.PropertySystem; namespace Vocab { static class Program { static void Main(string[] args) { CreateCSV(args[0]); } static void CreateCSV(string directory) { using (var stream = new StreamWriter("vocab.csv", false, Encoding.UTF8)) { foreach (var f in new DirectoryInfo(directory).GetFiles()) { using (var so = ShellObject.FromParsingName(f.FullName)) { // // The Title property contains the vocabulary separated by " - " // var title = so.Properties.GetProperty(SystemProperties.System.Title).ValueAsObject.ToString(); var index = title.IndexOf(" - "); var left = title.Substring(0, index - 1); var right = title.Substring(index + 3); // // The Album property stores the category (e.g. Food and Drink). // var category = so.Properties.GetProperty(SystemProperties.System.Music.AlbumTitle).ValueAsObject.ToString().Substring(3); // // The Artist property stores the type of vocabulary (verb, noun, ...). // var kind = so.Properties.GetProperty(SystemProperties.System.Music.DisplayArtist).ValueAsObject.ToString(); var line = string.Format("{0};{1};{2};{3}", left, right, category, kind); stream.WriteLine(line); } } } } } }
We then converted the resulting CSV file into a proper Excel file and applied some table formatting to the data.
The result can be view or downloaded from this OneDrive location.
So, now that I have written this code and blogged about, I will hopefully find the time and motivation to actually study this vocabulary.