-

I'm currently looking into big data because it's very interesting to me.

Claire Corthell published a Data Science curriculum that is like an encyclopedia. http://datasciencemasters.org/ 

The US government publishes a site called http://www.data.gov/ with over 160,000 datasets. "The home of the U.S. Government’s open data Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more." As I browsed around, I found a link to the TIGER database of the census. It's a database I was familiar with when I first got out of college. 

The image you see at the beginning of this article is from Citymapper, an app that uses public data you can get from Data.gov. I'm about to download it. From the comments, it's a lifechanger for some people.

Are you working with big data? Tell me about it!

O'Reilly Media is offering unlimited 30-day access to the Safari library of over 30,000 books and videos from over 200 publishers and imprints. Keith Spurr called Safari the “best technical learning resource on the internet.” You’ll have instant access to hundreds of books and videos on all aspects of big data--check out this recommended data reading list. Keep up to date on Safari-related news @safari or @safaribot

Link: http://www.oreilly.com/pub/cpc/1551

O’Reilly Releases their 2015 Data Science Salary Survey 

For the third consecutive year, O’Reilly Media conducted an anonymous survey to expose the tools that successful data scientists and engineers use, and how those tool choices might relate to their salary. 

Want to know what you need to know to earn the big bucks? Knowledge of certain tools can increase your salary more than getting a Ph.D. Curious what clusters of tools are most commonly used together? Or what job titles pay the best? It's all there. 

Gain insight from these potentially career-changing findings, and plug your own variables into one of the linear models to predict your own salary.

Link: http://www.oreilly.com/pub/cpc/1549

Views: 51

Comment

You need to be a member of Codetown to add comments!

Join Codetown

Happy 10th year, JCertif!

Notes

Welcome to Codetown!

Codetown is a social network. It's got blogs, forums, groups, personal pages and more! You might think of Codetown as a funky camper van with lots of compartments for your stuff and a great multimedia system, too! Best of all, Codetown has room for all of your friends.

When you create a profile for yourself you get a personal page automatically. That's where you can be creative and do your own thing. People who want to get to know you will click on your name or picture and…
Continue

Created by Michael Levin Dec 18, 2008 at 6:56pm. Last updated by Michael Levin May 4, 2018.

Looking for Jobs or Staff?

Check out the Codetown Jobs group.

 

Enjoy the site? Support Codetown with your donation.



InfoQ Reading List

Presentation: Deploy MultiModal RAG Systems with vLLM

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

By Stephen Batifol

OpenAI DevDay 2025 introduces GPT-5 Pro API, Agent Kit, and more

At OpenAI's DevDay 2025, AgentKit and models GPT-5 Pro and Sora 2 were unveiled, enabling interactive software experiences directly within ChatGPT. This shift towards "apps inside ChatGPT" fosters collaboration and commercialization in conversations. Enhanced self-hosting options and robust SDKs empower developers and streamline workflows, positioning OpenAI at the forefront of AI innovation.

By Andrew Hoblitzell

Data API Builder 1.6 Adds HTTP Header Controls and Flexible Logging

Microsoft has released Data API Builder (DAB) 1.6, expanding the open-source runtime’s capabilities for REST and GraphQL endpoints over Azure SQL, PostgreSQL, MySQL, and Cosmos DB. The new version introduces advanced HTTP header behaviours for better client-side control and a revamped logging subsystem designed to improve diagnostics and observability in both cloud and on-premise deployments.

By Edin Kapić

QCon AI New York 2025 Schedule Published, Highlights Practical Enterprise AI

The QCon AI New York 2025 schedule is now live for its Dec 16-17 event. Focused on moving AI from PoC to production, the program offers a practical roadmap for senior engineers & tech leaders. It addresses the real-world challenges of building, scaling, and deploying reliable, enterprise-grade AI systems, helping organizations overcome the hurdles of productionizing their AI initiatives.

By Artenisa Chatziou

GitHub Introduces New Embedding Model to Improve Code Search and Context

GitHub has introduced a new embedding model for Copilot, now integrated into Visual Studio Code. The model is designed to improve how Copilot understands programming context, retrieves relevant code, and suggests completions.

By Daniel Dominguez

© 2025   Created by Michael Levin.   Powered by

Badges  |  Report an Issue  |  Terms of Service