Lab-00: Science, Human Experience, Experiments, and Data

Some Data Games

Let us start with a couple of small rumpus/games.¹

Game-1: Making Sushi

I will call out a few random characteristics, such as “People who wake up before 0800 hours in the morning”, or “People who love Sushi”. We will see how our classgroup spontaneously reorganises itself based on these characteristics.

Questions to Ponder:

Did you stay in the one group you chose?
If you moved, why did you move?
How did you know “where to stand”? (like Archimedes seems to have known)
Did you feel some groups to be “cooler” than say the groups you were in?
If you were to look down at our classroom arrangement from the ceiling, how would you know which group was which?

Game-2: Thinking like Kandinsky

Look around the room, at the people, furniture, walls, fittings…. and write down as many abstract nouns that pertain to concrete things as you can.

A concrete noun is a noun that can be identified through one of the five senses (taste, touch, sight, hearing, or smell).

An abstract nounnames a quality or an idea that cannot be physically quantified with the senses. Instead, it symbolises an abstract concept, such as a feeling, a quality, or an idea. In other words, abstract nouns are intangible concepts.

Questions to Ponder:

Did any of the Abstract Nouns “show up” in the way you formed Sushi groups?
How did you know “where to stand”? ( like Archimedes seems to have known)
Did you feel some groups to be “cooler” than say the groups you were in?
If you were to look down at our classroom arrangement from the ceiling, how would you know which group was which?
How could you possibly use some of the Abstract Nouns in the Sushi-group-making?

The Nature of Data

Why Visualize?

So now that we know where data comes from, why do we want to visualize it?

We can digest information more easily when it is pictorial
Our Working Memories are both short-term and limited in capacity. So a picture abstracts the details and presents us with an overall summary, an insight, or a story that is both easy to recall and easy on retention.
Data Viz includes shapes that carry strong cultural memories and impressions for us. These cultural memories help us to use data viz in a universal way to appeal to a wide variety of audiences. (Do humans have a gene for geometry?)
It helps sift facts and mere statements: for example:

Figure 1: Source https://www.deccanherald.com/national/india-is-known-as-the-rape-capital-of-the-world-rahul-783495.html

Figure 2: Source https://datareveals.org/crime-data/

Why Code? Why not use no-Code?

There are good arguments in favour of using code to produce charts. There are of course also situations and needs where you may decide to not use code.

Let us paraphrase the arguments from Data Viz expert Claus Wilke :

Ideally, (charts) should come out of the pipeline ready to be sent to the printer, no manual post-processing needed.
- First, the moment you manually edit a figure, your final figure becomes irreproducible. A third party cannot generate the exact same figure you did. This may be important for example in scientific and research disciplines certainly, but also when you are part of a larger team of collaborators and you have to swap roles and work products.
- If you use say Adobe Illustrator to spruce up a chart, how does another person know why you made the changes? Code can show what decisions you make.
- No chart is ever done-done one time. And if you add a lot of manual post-processing to your figure-preparation pipeline, then you will be more reluctant to make any changes or redo your work. Code makes it easier to iterate, especially you may not be in a position to ignore reasonable requests for change made by collaborators or colleagues.
- You may yourself forget what exactly you did to prepare a given figure, or you may not be able to generate a future figure on new data that exactly visually matches your earlier figure. For example then, what do you do if the underlying data changes and causes changes and you can’t remember what you did?

So, we will play it safe and do both: Code and No-Code.

What are Data Types??

https://www.youtube.com/watch?v=dwFsRZv4oHA

In more detail:

How do we Spot Data Variable Types?

By asking questions!

Pronoun	Answer	Variable / Scale	Example	What Operations?
What, Who, Where, Whom, Which	Name, Place, Animal, Thing	Qualitative / Nominal	Name	Count no. of cases Mode
How, What Kind, What Sort	A Manner / Method, Type or Attribute from a list, with list items in some ” order**” ( e.g. good, better, improved, best..)	Qualitative / Ordinal	Socio -economic status (“low income, middle income, high income) education level (“high school”, “B S”,” M S”,“PhD”) income level (“less than 50K”, “50K-100K”, “over 100K”) Satisfaction rating ( “extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”).	Median Percentiles
How Many / Much / Heavy? Few? Seldom? Often? When?	Quantities with Scale. Differences are meaningful, but not products or ratios	Quantitative / Interval	pH SAT score (200-800), Credit score (300-850). Year of Starting in College	Mean Standard Deviation
How Many / Much / Heavy? Few? Seldom? Often? When?	Quantities, with Scale and a Zero Value. Differences and Ratios /Products are meaningful. (e.g Weight )	Quantitative / Ratio**	Weight,length,Height Temperature in Kelvin Enzyme activity, dose amount, reaction rate, flow rate,concentration Pulse Survival time	Correlation Coeff of Variation

As you go from Qualitative to Quantitative data types in the table, I hope you can detect a movement from fuzzy groups/categories to more and more crystallized numbers. Each variable/scale can be subjected to the operations of the previous group. In the words of S.S. Stevens ,

the basic operations needed to create each type of scale is cumulative: to an operation listed opposite a particular scale must be added all those operations preceding it.

What Are the Parts of a Data Viz?

How to pick a Data Viz?

Most Data Visualizations use one or more of the following geometric attributes or aesthetics. These geometric aesthetics are used to represent qualitative or quantitative variables from your data.

Figure 3: From Claus Wilke, Fundamentals of Data Visualization

What does that mean? We can think of simple visualizations as combinations of these aesthetics. Some examples:

Aesthetic #1	Aesthetic #2	Shape
Position X = Quant Variable	Position Y = Quant Variable	Points/Circles with Fixed Size
Position X = Qual Variable	Position Y = Count of Qual var)	Columns
Position X = Qual Variable	Position Y = Qual Variable	Rectangles, with area proportional to joint(X,Y) count
Position X = Qualitative Variable	Position Y = Rank Ordered Quant Variable	Box + Whisker, Box length proportional to Inter-Quartile Range, whisker-length proportional to upper and lower quartile resp.
Position X = Quant Variable	Postion Y = Quant Variable + Qual Var
Quant Variable	Shape = Line with Quant Variable