(Ab)using the MetaCPAN API for Fun and Profit
Source: YAPC::NA 2013 on the 2013-06-03.
Speaker: Olaf Alders (oalders) speaker
MetaCPAN aims to make it fun and easy to get data about CPAN modules, distributions, favourites and even CPAN authors themselves, but sometimes it's just not easy enough. This talk will show you how to avoid some of the pitfalls of working with the MetaCPAN API, creating ElasticSearch queries and building your own MetaCPAN powered application. Some sample code will be made available prior to the talk for any who'd like to review it ahead of time, but it's by no means compulsory for attendance. The links to some prep code and slides are posted at blogs.perl.org
The aim of this session is to arm both MetaCPAN beginners and intermediate users with enough knowledge to build the next MetaCPAN-powered web app, mobile app or even contribute back to MetaCPAN itself.
original talk announcement.
++ on MetaCPAN is favoriting a distribution, even when it is shown on the page of a module.
Base URL http://api.metacpan.org/v0
Two different types of endpoints: convenience endpoints and ElasticSearch endpoints. There is some overlap though. Every type in the system has a corresponding endpoint. You've got distributions, modules, releases, favorites, etc, but not every endpoint has a corresponding type.
- /author/DOY - get the author record that has this ID
- /search/autocomplete?q=Moose (JSON reply)
There are actually no module and pod types. The above will retrieve the latest authorized verison of Moose.
Versioned Convenience Endpoints
You might want a specific version of Moose:
Don't send JSON in your request. Don't expect JSON in your response.
By default it sends back HTML. You can pass a content-type in the query parameter:
Or send a content-type header.
The easy way to do it is to use the MetaCPAN::API::Tiny module.
The (real) Endpoints
A module is a file.
The user endpoint
You need to be logged in and you need to use https!
- /author/OALDERS (the html result page)
- /author/OALDERS (the API result)
- /author (the first few authors)
- /author/_search (the first 10 results)
- MetaCPAN::Document::Author (the source code)
- Or set the appropriate Accept: gzip request header
Send a query that will calculate the results and give an id back. Then use that id to fetch parts of the result. (The result set is normally limited to 5,000 entries.) The ElasticSearch module abstracts this away.
Query vs filter
Use a query if you need to sort your results by relevance.
Generally you want to use a filter. (e.g. all the distributions on CPAN)
Filters use less resources and are faster.
See the README there for insttructions.
Hack on MetaCPAN
- Download the pre-configured VM
- Require VirtualBox + Vagrant
- See more details on GitHub
About and resources
latest, cpan, backpan are separate