Why Visualize Data When Coding? Preliminary Categories for Coding in Jupyter Notebooks
Issued Date
2022-01-01
Resource Type
ISSN
15301362
Scopus ID
2-s2.0-85149180580
Journal Title
Proceedings - Asia-Pacific Software Engineering Conference, APSEC
Volume
2022-December
Start Page
462
End Page
466
Rights Holder(s)
SCOPUS
Bibliographic Citation
Proceedings - Asia-Pacific Software Engineering Conference, APSEC Vol.2022-December (2022) , 462-466
Suggested Citation
Settewong T., Ritta N., Kula R.G., Ragkhitwetsagul C., Sunetnanta T., Matsumoto K. Why Visualize Data When Coding? Preliminary Categories for Coding in Jupyter Notebooks. Proceedings - Asia-Pacific Software Engineering Conference, APSEC Vol.2022-December (2022) , 462-466. 466. doi:10.1109/APSEC57359.2022.00063 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/84309
Title
Why Visualize Data When Coding? Preliminary Categories for Coding in Jupyter Notebooks
Author's Affiliation
Other Contributor(s)
Abstract
Data visualization becomes a crucial component in data analytics, especially data exploration, understanding, and analysis. Effective data visualization impacts decision-making and aids in discovering and understanding relationships. It leads to benefits in data-intensive software development tasks e.g., feature engineering in machine learning-based software projects. However, it is unknown how visualizations are used in competitive programming. The idea of this paper is to report early results on what visualizations are prevalent in competitive programming. Grandmasters are the highest level reached in competitions (novice, expert, master, and grandmaster). Analyzing the visualizations of 7 high-rank competitors (i.e., Grandmaster) in Kaggle, we identify and present a catalog of visualizations used to both tell a story from the data, as well as explain the process and pipelines involved to explain their coding solutions. Our taxonomy includes nine types from over 821 visualizations in 68 instances of Jupyter notebooks. Furthermore, most visualizations are for data analysis for distribution (DA Distribution), and frequency (DA Frequency) are most used. We envision that this catalog can be useful to better understand different situations in which to employ these visualizations.