What We Do
Data science and machine learning: Big data collection and processing including text mining and topic modeling – applied predictive modeling including regression, random forest, support vector machines, boosting and time series analysis
Interactive data applications: Web and mobile applications with technologies such as node, D3, HighCharts, Express, Java, Spring, Oracle, Geoserver, Apache – creation of interactive web maps with Leaflet, Google, CartoDB, MapBox and other tech – development of Shiny applications
Spatial analysis and mapping: Geoprocessing, change detection and cartography using PostGIS, QGIS and ArcGIS – spatial statistics including geostatistics, nearest neighbor and point pattern analysis, hot spot identification
Programming solutions: Custom tools for data processing and analysis; custom R packages
The New York City Department of Health contracted with us to help develop and implement one of the largest air monitoring networks in the world. We worked with DOH to develop a sampling plan to distribute 150 monitors throughout the city which involved computing proximity to emissions sources and conducting spatial statistics to help identify gaps in coverage. Since the establishment of the network we have worked closely with the research team at DOH for the past 10 years analyzing the trends in neighborhood air pollution. As part of this collaboration we developed statistical models to predict air pollutants and we co-authored 10 scientific publications. You can find out more about the New York City Community Air Survey and the reports we were involved in here. The project involved extensive use of R, Python, ArcGIS and other tools.
Live Nation Entertainment, the parent company of Ticket Master, contracted with us to provide consulting on a field study to evaluate movement patterns at outdoor concerts. We worked with analysts at Live Nation to identify the appropriate mini-GPS units for volunteers to wear while at the show chosen for a pilot study. We then analyzed movement patterns and delivered a report and visualizations documenting the results. Tools used on this project include R, QGIS and PostGIS.
We worked with scientists at Columbia’s Mailman School of Public Health and Lamont-Doherty Earth Observatory to develop software to process and store environmental measurement data from a study on exposure to air pollution. As part of the project, we designed and developed the backend SQL database to store data, developed a custom R package to process raw input data and created an interactive front-end that allows lab technicians to upload data files. As part of this project, we also worked with Columbia and the NYC NPR affiliate, WNYC to create an interactive biking application where users can upload GPS files of a bike ride and get air pollution estimates. A link to the beta version of this tool can be found at WNYC’s Map My Air site. The project was featured on an episode of Science Friday that can be viewed here. This project involved an extensive list of tools including R, Shiny and PostgreSQL for the software and node.js, Express.js and CartoDB for the web tool.
We developed a data-driven online application for viewing and evaluating trends in energy-related data from China. The application allows users to view maps, create charts and post results to Chinese and American social media sites. The site can be viewed in both English and Chinese and can be viewed here. Tools used include backbone.js and the Google maps and visualization APIs and the final application is hosted on Heroku.
We worked closely with the research team at a national cell phone company to conduct spatial and temporal analysis of call patterns and the relationship between patterns and landscape features. The project involved the creation of a PostgreSQL database for conducting spatial analysis on large datasets, extensive data visualization, classical statistics like logistic regression and machine learning techniques such as random forest.