Working with data can be very challenging and messy! Explore the sections below to learn about tools that you can use to wrangle your data and create meaningful visualizations.
A spreadsheet is a tool that is used to store, manipulate and analyze data. Data in a spreadsheet is organized in a series of rows and columns and can be searched, sorted, calculated and used in a variety of charts and graphs.
Microsoft Excel is a spreadsheet developed by Microsoft. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications.
Google Sheets is a spreadsheet application included as part of the free, web-based Google Docs Editors suite offered by Google. This application allows for simultaneous multi-user editing and collaboration.
Apache OpenOffice is an open-source office productivity software suite. It is one of the successor projects of OpenOffice.org and the designated successor of IBM Lotus Symphony.
Apple Numbers is a spreadsheet application developed by Apple Inc. as part of the iWork productivity suite alongside Keynote and Pages.
Data Analysis Applications are your path to cleaning messy data prior to creating visualizations. These tools can help normalize and transform your data to uncover meaningful trends.
ArcGIS is a family of client, server and online geographic information system software developed and maintained by Esri. ArcGIS was first released in 1999 and originally was released as ARC/INFO, a command line based GIS system for manipulating data.
OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. It is similar to spreadsheet applications, and can handle spreadsheet file formats such as CSV, but it behaves more like a database.
RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser.
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation.
Data visualization applications help with the representation of data through use of graphics, charts, plots, infographics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.
The Google Chart API is an interactive Web service that creates graphical charts from user-supplied data. Google servers create a PNG image of a chart from data and formatting parameters specified by a user's HTTP request. The service supports a wide variety of chart information and formatting.
Google Data Studio is a web-based data visualization tool that helps users build customized dashboards and easy-to-understand reports. It helps in tracking Key Performance Indicators (KPI's) for customers, visualizing trends, and comparing performances over time.
Microsoft Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights.
Tableau is a visual analytics platform transforming the way we use data to solve problems—empowering people and organizations to make the most of their data. Tableau makes it easier for people to explore and manage data, and faster to discover and share insights that can change businesses and the world.
A text editor is computer software used for editing plain text. It is distinguished from a word processor because it does not manage document formatting or other features commonly used in desktop publishing. Some text editors are small and simple, while others offer a broad and complex range of functionality. A text editor is important to use while performing data analysis because it can help strip away metadata and unwanted characters and code from data strings.
Notepad is a simple text editor for Windows that creates and edits plain text documents. First released in 1983 to commercialize the computer mouse in MS-DOS, Notepad has been part of every version of Windows ever since.
Notepad++ is a free and open-source text and source code editor for use with Microsoft Windows. It supports multiple tabbed open files in a single window, and offers programming language-based color-coded text.
Sublime Text is a shareware text and source code editor available for Windows, macOS, and Linux. It natively supports many programming languages and markup languages. Users can customize it with themes and expand its functionality with plugins, typically community-built and maintained under free-software licenses.
With TextEdit, you can open and edit rich text documents created in other word processing apps, including Microsoft Word and OpenOffice. You can also save your documents in a different format, so they're compatible with other apps.
UltraEdit is a commercial text editor for Microsoft Windows, Linux, and OS X created in 1994. The editor contains tools for programmers, including macros, configurable syntax highlighting, code folding, file type conversions, project management, regular expressions for search-and-replace, a column-edit mode, remote editing of files via FTP, interfaces for APIs or command lines of choice, and more.
Developer tools are supplement efficiency tools that researchers and computer programmers use to make their lives easier. These tools can be used in conjunction with one another to clean data, store data, write code for analysis of data, and annotate findings.
GitHub is used for storing, tracking, and collaborating on software projects. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. GitHub also serves as a social networking site where developers can openly network and collaborate.
JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse.
SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day.
Geneseo Authors Hall preserves over 90 years of scholarly works.
KnightScholar facilitates creation of works by the SUNY Geneseo community.
IDS Project is a resource-sharing cooperative.