Thursday, April 12, 2007
Announcement: data marts now available
By popular demand of our user base (and some hard work by our developers, especially ruphus_13), we now provide data marts for Sourceforge data.
The new package, called DataMarts contains all the SQL create and insert statements for creating your own version of the FLOSSmole database - for multiple data sources (Sourceforge, Freshmeat, Rubyforge, ObjectWeb, FSF)
The marts are created following each of our data collections; we collect and parse the data as usual. We then load it into our database as usual, and create the raw flat-file data dumps as we have been doing since 2004. The new feature we are announcing today is that we also now provide the SQL data dumps so you can auto-load our data into your own local database for easier processing and more complex mining tasks.
So, there are now numerous ways to get our data:
--install the data marts into your own mysql database
--download and analyze the flat, delimited data files
--play around with the query tool
Good work, moles!! Another milestone in our open source data collection and dissemination effort.
The new package, called DataMarts contains all the SQL create and insert statements for creating your own version of the FLOSSmole database - for multiple data sources (Sourceforge, Freshmeat, Rubyforge, ObjectWeb, FSF)
The marts are created following each of our data collections; we collect and parse the data as usual. We then load it into our database as usual, and create the raw flat-file data dumps as we have been doing since 2004. The new feature we are announcing today is that we also now provide the SQL data dumps so you can auto-load our data into your own local database for easier processing and more complex mining tasks.
So, there are now numerous ways to get our data:
--install the data marts into your own mysql database
--download and analyze the flat, delimited data files
--play around with the query tool
Good work, moles!! Another milestone in our open source data collection and dissemination effort.
Thursday, April 05, 2007
April 2007 data released for all forges
April 2007 data is released for all forges. Here is a summary of the data we have and where to get it:
Each set of data described here is released three ways
1) As text files, ready for you to grab and import into your favorite stats package, spreadsheet, or even into your own database.
2) As datamarts, with SQL code for you to input into your own database. (Get the datamarts)
3) If you want a sense of what raw data we have, play around with the new query tool for a while.
- Sourceforge data
- General Forge Information(Get it)
- Project code names, project display names, developer counts, date project was registered, long project descriptions
- Developer Information(Get it)
- Developer login names, real names, developers-per-project and what role they have on that project, are they an admin?
- Data about Projects(Get it)
- Database type by project, number of downloads per project, rank of project, intended audience, topic of project, status of project, license(s), operating system(s), programming language(s), real URL of project, tracker data, donors to projects, user interfaces
- General Forge Information(Get it)
- Freshmeat data (Get it)
- project names, descriptions, authors per project, project URLs (real urls and freshmeat urls), project licenses, vitality/popularity ranks by freshmeat
- Rubyforge data (Get it)
- user interfaces, programming languages, developers per project & roles, licenses, programming languages, environment, intended audience, list of all projects, natural language, status, topics
- ObjectWeb data (Get it)
- user interfaces, programming languages, developers per project & roles, licenses, programming languages, environment, intended audience, list of all projects, natural language, status, topics
- Free Software Foundation data (Get it)
- Developers per project, project registration dates, project descriptions, interfaces, languages, licenses, full list of projects, project URLs
- SourceKibitzer data (Get it)
- Metrics on a variety of FLOSS projects. Metrics include: loc, cloc, ncloc, dc, nom, wmc, ncss, npath, fnaout, abstr_coupl, todo_count, bool_exp (see release notes for more details on these metrics)
Each set of data described here is released three ways
1) As text files, ready for you to grab and import into your favorite stats package, spreadsheet, or even into your own database.
2) As datamarts, with SQL code for you to input into your own database. (Get the datamarts)
3) If you want a sense of what raw data we have, play around with the new query tool for a while.