So, you have data – now what?
This article is a general overview of some of the data analysis tools available to University staff. What are they good for? And, how much effort do they need to use and learn? A performance analyst’s perspective.
Background
I joined the Prospective Student Web Content Team as a performance analyst about six months ago. In my role, I deal with data of all shapes and forms and in my short time here, I have carried out:
- data transformation and processing
- statistical analysis
- qualitative data analysis
- and website analytics
In the process, I have come across a variety of tools and software that help me do my job, some I knew, some are new. Here is a very brief overview of the tools I know and use for anyone that works with data.
As a disclaimer, I come from an academic background, so I am somewhat biased towards open-source and free tools.
Quantitative analysis
The bulk of the data processing and analysis I do is quantitative analysis. My two favourite tools for this purpose are Excel and R, I could go on for ages about them but I will keep it short.
Excel
Excel is like the swiss-army knife of spreadsheets – it’s widely used, and very versatile. It features calculations capabilities, pivot tables, graphing tools, and supports automation via Macros. This is my go-to for run-of-the-mill calculations and simple tasks. Excel on the web is free for individual use. Another free alternative is Google Sheets which features a lot of the same capabilities and is, in my opinion, better for collaborative work.
PROS | CONS |
Easy to use for basic tasks like data organisation and simple visualisations | No way of checking for human error |
Batch calculations are easy | Struggles with big datasets. It starts to lag at ~10K rows. |
Been around a long time so it has great community support | Lacks automation capabilities and manual input takes time |
Integrates with other Microsoft products | Advanced features and functions have a steeper learning curve |
Cost! |
Learn more about Google Sheets
R and R Studio
R is one of the most widely used programming languages for statistical modelling and the most widely used language in Data Science. R Studio is simply the user interface I use. Being open-source, R has an avid community of developers who provide continuous improvement and support. It currently features over 18,000 packages.
In simple terms: there is an R package for almost anything. Here are some of the things I have used it for in this role and in the past:
- data cleaning, transformation, processing and analysis
- statistical modelling
- data visualisation
- text mining
- natural language processing
- machine learning
- predictive analytics
PROS | CONS |
Open source! | Steep learning curve |
Platform independent | Can use up a lot of memory if not managed correctly |
Always growing and improving | |
Outstanding community behind it – I have not come up with an issue that the R community cannot solve. Package owners are particularly helpful and receptive to constructive criticism |
Website Analytics
Another part of my job is to use web analytics as a tool to understand and optimise our website. We, mainly, use Google Analytics for that purpose.
Google Analytics
Google Analytics (GA) is a web analytics service that provides basic analytical tools and statistics about your website traffic and users. GA is easy to use and learn, with great training resources via Analytics Academy.
PROS | CONS |
Insights into user behaviour | Not the most sophisticated visuals (in my opinion) |
Set and track goals | Gradual learning curve |
Create custom reports | The number of options can feel overwhelming at the start |
Training via Analytics Academy | |
Integration with Data Studio for reporting and dashboarding |
Learn more about Google Analytics
Qualitative analysis
Qualitative analysis tools are useful when it comes to trend analysis on unstructured data, like interviews or recorded meetings.
Miro
Miro is a mind-mapping and diagramming tool. It also has flowchart and presentation creation capabilities. This tool is great for collaborative work and a good white-board substitute in an online world.
PROS | CONS |
Focus on collaborative work | If you are not a visual person (like myself) it might not be the most useful |
Great for visual people | Can become disorganised quickly |
Fiddly to use |
Reporting and data visualisation
Microsoft Power BI
Power BI is a reporting and dashboarding tool from Microsoft, easy to use and has great capabilities for data transformation, analysis, and visualisation. I have only started using Power BI in my role at the university and I find it easy to use but some features and behaviours are not the most intuitive.
PROS | CONS |
Connects to data from a variety of sources | Can lag with large datasets |
Interactive visualisations | Cannot handle complex table relationships |
Custom visualisations | Need a Pro license to publish reports |
Integrates with other Microsoft software | Cost! |
Data Studio
Data Studio (DS) is the reporting and dashboarding tool from Google. This tool integrates easily with GA, which makes it an attractive alternative to tools like Power BI. As most Google software, DS is free for individuals and small teams.
PROS | CONS |
Cloud-based | Basic visualisations |
Supports live data – no need to schedule updates | Lack of functions |
Flat learning curve | Can lag with a live connection |
Integrates with Google applications | |
Training available through Analytics Academy | |
Templates available makes it easy to get started | |
Free |
How do you choose?
Choosing the right data analytics tool can be challenging. It all depends on the questions you need to answer and how much resource you can put into it. No tool fits every need but you can start with some basic questions:
- What kind of analysis do you need to perform?
- What is the size of your dataset?
- Is there a tool you are already familiar with?
- Is it simple to implement and support?
For example, if you have a simple statistical analysis of a numerical dataset with one hundred thousand records. Ask yourself these questions:
- What tools are available to me? The University has Excel and R available to handle simple statistics. I would pick the one I am most familiar with.
- What is my dataset size? Depending on the complexity of the data, Excel is likely to lag with large datasets, at one hundred thousand records, it might crash. That leaves me with R as the best tool for this task.
On a final note, always consider the feasibility of a tool. Do not waste your time implementing anything that will not be used routinely and effectively.
Where can I find these tools?
For staff at the University, most software is available from the Software Centre or the Software Services website.
This is not an exhaustive list, far from it, and there are lots of different tools available out there, so if you start using a new tool make sure:
- to follow the license terms and conditions
- it complies with data protection laws
Future posts
I will be sharing more information about dealing with data in future posts.
In the meantime, if you have any comments or questions, feel free to get in touch.
1 replies to “So, you have data – now what?”