name: title-slide class: title-slide, center, middle, inverse # The Nature of Data #.fancy[Understanding Human Experience, Experiments, and Where Data comes from] <br> .large[by Arvind Venkatadri] Written: July 13 2022 Updated: July 23 2022 .footer-large[.right[.fira[ <br><br><br><br><br>[The Foundation Series](https://the-foundation-series.netlify.app/courses/7-data-visualization-with-no-code/) ]]] --- ## What makes Human Experience? <img src="images/Anecdote-spotting-a-business-story.png" width="120%" style="display: block; margin: auto;" /> ### How would we begin to describe this experience? .small[ - Where / When? - Who? - How? - How Big? How small? How frequent? How sudden? - And....How Surprising ! How Shocking! How sad...How Wonderful !!! So: Our .orange[Questions], and our .orange[*Surprise*] lead us to creating Human Experiences. ] .footnote[ https://www.anecdote.com/2014/09/story-framework/] --- ## Is This a Surprise? .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Needs to be celebrated. Spotted in a men's washroom at <a href="https://twitter.com/BLRAirport?ref_src=twsrc%5Etfw">@BLRAirport</a> - a diaper change station. <br><br>Childcare is not just a woman's responsibility. <br><br>👏🏻✨ <a href="https://t.co/Za4CG9jZfR">pic.twitter.com/Za4CG9jZfR</a></p>— Sukhada (@appadappajappa) <a href="https://twitter.com/appadappajappa/status/1541366922545369088?ref_src=twsrc%5Etfw">June 27, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right[ <img src="images/toilet.jpg" width="250" style="display: block; margin: auto;" /> ] --- .pull-left[ <img src="images/P&P.jpg" width="567" /> ] .pull-right[ ## The Element of Surprise? Jane Austen knew a lot about human information processing as these snippets from **Pride and Prejudice** *(published in 1813 -- over 200 years ago)* show: .small[ - She was a woman of mean understanding, little .orange[*information*], and uncertain temper. - Catherine and Lydia had .orange[*information*] for them of a different sort. - When this .orange[*information*] was given, and they had all taken their seats, Mr. Collins was at leisure to look around him and admire,... - You could not have met with a person more capable of giving you certain .orange[*information*] on that head than myself, for I have been connected with his family in a particular manner from my infancy. - This .orange[*information*] made Elizabeth smile, as she thought of poor Miss Bingley. - This .orange[*information*], however, startled Mrs. Bennet ... ] .footnote[.small[https://www.cs.bham.ac.uk/research/projects/cogaff/misc/austen-info.html]] ] --- ## Claude Shannon and Information <img src="images/InfoSurprise.png" width="750px" height="450px" style="display: block; margin: auto;" /> .footnote[https://plus.maths.org/content/information-surprise] --- ## Human Experience is....Data?? .pull-left[ <iframe width="860" height="500" src="https://www.youtube.com/embed/sFIDCtRX_-o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ] .pull-right[ <img src="images/dear-data.jpg" width="300px" height="450px" style="display: block; margin: auto;" /> ] --- ## Experiments and Hypotheses .right-column[ ![](images/DoE.png) ] .left-column[ ### A Kitchen Experiment - Inputs are: Ingredients, Recipes, Processes - Outputs are: Taste, Texture, Colour, Quantity!! ] .footnote[Used *without permission* from https://safetyculture.com/topics/design-of-experiments/] --- ## What is the Result of an Experiment? .pull-left[ ### All experiments give us data about phenomena - We obtain data about the things that happen: **Outputs** - What makes things happen?: **Inputs** - How?: **Process** - When? **Factors** - How much "output" is caused by how much "input"? **Effect Size** ] .pull-right[ > All Experiments stem from Human Curiosity, a Hypothesis, and a Desire to Find out and Talk about Something ] --- ## A Famous Lady and her Famous Experiment .pull-left[ <img src="images/nightingale.jpeg" width="428" /> .small[In 1853, Turkey declared war on Russia. After the Russian Navy destroyed a Turkish squadron in the Black Sea, Great Britain and France joined with Turkey. In September of the following year, the British landed on the Crimean Peninsula and set out, with the French and Turks, to take the Russian naval base at Sevastopol. What followed was a tragicomedy of errors -- failure of supply, failed communications, international rivalries. Conditions in the armies were terrible, and disease ate through their ranks. They finally did take Sevastopol a year later, after a ghastly assault. It was ugly business all around. Well over half a million soldiers lost their lives during the Crimean War.] ] .pull-right[ <img src="images/rose.jpg" width="450px" height="450px" /> ] --- ## Florence Nightingale's Data <table> <thead> <tr> <th style="text-align:left;"> Month </th> <th style="text-align:right;"> Year </th> <th style="text-align:right;"> Disease.rate </th> <th style="text-align:right;"> Wounds.rate </th> <th style="text-align:right;"> Other.rate </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Apr </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 7.0 </td> </tr> <tr> <td style="text-align:left;"> May </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 6.2 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 4.6 </td> </tr> <tr> <td style="text-align:left;"> Jun </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 2.5 </td> </tr> <tr> <td style="text-align:left;"> Jul </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 150.0 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 9.6 </td> </tr> <tr> <td style="text-align:left;"> Aug </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 328.5 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:right;"> 11.9 </td> </tr> <tr> <td style="text-align:left;"> Sep </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 312.2 </td> <td style="text-align:right;"> 32.1 </td> <td style="text-align:right;"> 27.7 </td> </tr> <tr> <td style="text-align:left;"> Oct </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 197.0 </td> <td style="text-align:right;"> 51.7 </td> <td style="text-align:right;"> 50.1 </td> </tr> <tr> <td style="text-align:left;"> Nov </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 340.6 </td> <td style="text-align:right;"> 115.8 </td> <td style="text-align:right;"> 42.8 </td> </tr> <tr> <td style="text-align:left;"> Dec </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 631.5 </td> <td style="text-align:right;"> 41.7 </td> <td style="text-align:right;"> 48.0 </td> </tr> <tr> <td style="text-align:left;"> Jan </td> <td style="text-align:right;"> 1855 </td> <td style="text-align:right;"> 1022.8 </td> <td style="text-align:right;"> 30.7 </td> <td style="text-align:right;"> 120.0 </td> </tr> </tbody> </table> --- ## How Does Data look Like, then? <img src="images/Ratio Interval Ordinal Nominal.PNG" width="750px" height="500px" style="display: block; margin: auto;" /> ## Types of Variables: .pull-left[ ### Using Interrogative Pronouns - Nominal: What? Who? Where? (Factors, *Dimensions*) - Ordinal: Which Types? What Sizes? How Big? (Factors, Dimensions) - Interval: How Often? (Numbers, *Facts*) - Ratio: How many? How much? How heavy? (Numbers, *Facts*) ] .pull-right[ <img src="images/Ratio Interval Ordinal Nominal.PNG" width="600px" height="450px" style="display: block; margin: auto;" /> ] --- ## Types of Variables in Nightingale Data .leftcol30[ .small[ ### Using Interrogative Pronouns: - Nominal: None - Ordinal: (Factors, Dimensions) - .orange[HOW?] `War, Disease, Other` - Interval: (Numbers, *Facts*) - .orange[WHEN?] `Year, Month` - Ratio: (Numbers, *Facts*) - .orange[HOW MANY?] `Rate of Deaths` (War, Disease, Other) ] ] .rightcol70[ <table> <thead> <tr> <th style="text-align:left;"> Month </th> <th style="text-align:right;"> Year </th> <th style="text-align:right;"> Disease.rate </th> <th style="text-align:right;"> Wounds.rate </th> <th style="text-align:right;"> Other.rate </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Apr </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 7.0 </td> </tr> <tr> <td style="text-align:left;"> May </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 6.2 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 4.6 </td> </tr> <tr> <td style="text-align:left;"> Jun </td> <td style="text-align:right;"> 1854 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 2.5 </td> </tr> </tbody> </table> <img src="images/Ratio Interval Ordinal Nominal.PNG" width="400px" height="300px" style="display: block; margin: auto;" /> ] .footnote[Nightingale's data table had *dimensions* coded into column names. This is not considered **tidy** in the modern age] --- ## Nightingale's Rose .pull-left[ .small[Nightingale created a remarkable and original graphical display to show us just what hadd really gone on in the War. It was a .orange[Polar-Area Diagram] that showed how people had died during the period from July, 1854, through the end of the following year. Nightingale's graph is like a pie chart, cut into twelve equal angles. These slices advance in a clockwise direction, one each month. The radius shows how many deaths occurred in that month. We see little short slices in April, May and June of 1854. After the troops land in the Crimea, the slices begin reaching far outward in the radial direction. There's more: Each slice has three sections, one for deaths from wounds in battle, one for "other causes", and one for disease. Once you see Nightingale's graph, the terrible picture is clear. The Russians were a minor enemy. **The real enemies were cholera, typhus, and dysentery**. Once the military looked at that eloquent graph, the modern army hospital system was inevitable. ] ] .pull-right[ <img src="images/rose.jpg" width="450px" height="450px" /> ] --- ## So, Did the Sanitation Commission succeed? <img src="figs/unnamed-chunk-14-1.png" width="850px" height="500px" style="display: block; margin: auto;" /> --- ## Nightingale's famous Coxcomb or Rose Plot <img src="figs/unnamed-chunk-15-1.png" width="850px" height="450px" style="display: block; margin: auto;" /> .footnote[.small["Engines of Our Ingenuity", <https://www.uh.edu/engines/epi1712.htm>]] --- ### From Data -> Geometry - How did we arrive at shapes, colours, lines, points...from data? - All Statistical Graphs do a Kalidasa: - Transform a variable with a .orange[`stat`(`count`,`bin`,`sort`)] - they use metaphors to map .orange[data variables] and .orange[computed stats] to .orange[geometrical aspects] aka .orange[aesthetics] .pull-left[ <img src="images/common-aesthetics-1.png" width="512" style="display: block; margin: auto;" /> ] .pull-right[ - Commonly used aesthetics in data visualization: position, shape, size, color, line width, line type. - Some of these aesthetics can represent both continuous and discrete data (position, size, line width, color) - While others can usually only represent discrete data (shape, line type). ] --- .leftcol70[ <img src="images/perceptual-ranking.png" width="550px" height="600px" style="display: block; margin: auto;" /> ] .rightcol30[ ### Each of the geometries works differently ] --- ## The Need for Answers: Questions to Visuals <img src="images/Workflow.png" width="650px" style="display: block; margin: auto;" /> --- ## Variables and Graphs: Qualitative Variables .pull-left[ <img src="https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/amounts_multi-1.png" width="300px" height="125px" /> ### Amounts and Counts - Variable: Ordinal / Nominal - Stat: `count` - Geometry: height and colour - Questions: - .orange[How many] of each type of #Var1? - .orange[How many] of each type of #Var2 broken up by #Var2? ] .pull-right[ <img src="figs/unnamed-chunk-20-1.png" width="504" height="450px" /> ] --- ## Variables and Graphs : Quantitative Variables .pull-left[ <img src="https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/single-distributions-1.png" style="display: block; margin: auto;" /> ### Distributions - Variable: Interval / Ratio - Stat: `bin` and `count` - Geometry: x = bins, y = count, and colour - Questions: Range and frequency of Interval/Ratio variable ] .pull-right[ <img src="figs/unnamed-chunk-22-1.png" width="504" height="450px" /> ] --- ## Variables and Graphs : Quantitative Variable .pull-left[ <img src="https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/multiple-distributions-1.png" style="display: block; margin: auto;" /> .small[### Distributions (Many of them at once) - Variable: Interval/Ratio + Nominal/Ordinal - Stat: `sort`(boxplot), `bin`(violin) - Geometry: x = Nom/Ord, y = Int/Ratio, and colour = Nom/ord ] ] .pull-right[ <img src="figs/unnamed-chunk-24-1.png" width="504" height="450px" /> ] --- ## Variables and Graphs : Quantitative Variables .pull-left[ <img src="https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/basic-scatter-1.png" style="display: block; margin: auto;" /> ### X-Y Relationships - Variable: Interval/Ratio + Nominal/Ordinal - Stat: none - Geometry: x = Int/Ratio, y = Int/Ratio, and colour = Nom/ord ] .pull-right[ <img src="figs/unnamed-chunk-26-1.png" width="504" height="450px" /> ] --- ## Conclusion ### - We question the world and form *Hypothesis* out of surprise ### - Hypotheses leads us to define *Questions* ### - Questions lead to *Variables* ### - Questions *with* Variables lead to *Graphs* ### - With Graphs, we can write *Stories* ( Next Time!!) --- class: hidden ## Variables and Graphs .pull-left[ - Quantitative Variables >Proportions can be visualized as pie charts, side-by-side bars, or stacked bars (Chapter 10), and as in the case for amounts, bars can be arranged either vertically or horizontally. ] .pull-right[ ![](https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/proportions-comp-1.png)<!-- --> ] --- class: hidden ## Variables and Graphs .pull-left[ - Quantitative Variables >When proportions are specified according to multiple grouping variables, then mosaic plots, treemaps, or parallel sets are useful visualization approaches ] .pull-right[ ![](https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/proportions-multi-1.png)<!-- --> ] --- class:hidden ## Variables and Graphs .pull-left[ - More than on Quantitative Variables >Scatterplots represent the archetypical visualization when we want to show one quantitative variable relative to another. If we have three quantitative variables, we can map one onto the dot size, creating a variant of the scatterplot called bubble chart. For paired data, where the variables along the x and the y axes are measured in the same units, it is generally helpful to add a line indicating x = y ] .pull-right[ ![](https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/basic-scatter-1.png)<!-- --> ] --- class: hidden ## Variables and Graphs .pull-left[ - More than on Quantitative Variables >For large numbers of points, regular scatterplots can become uninformative due to overplotting. In this case, contour lines, 2D bins, or hex bins may provide an alternative. When we want to visualize more than two quantities, on the other hand, we may choose to plot correlation coefficients in the form of a correlogram instead of the underlying raw data ] .pull-right[ ![](https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/xy-binning-1.png)<!-- --> ] --- class:hidden ## Variables and Graphs .pull-left[ - More than on Quantitative Variables >When the x axis represents time or a strictly increasing quantity such as a treatment dose, we commonly draw line graphs. If we have a temporal sequence of two response variables, we can draw a connected scatterplot where we first plot the two response variables in a scatterplot and then connect dots corresponding to adjacent time points. We can use smooth lines to represent trends in a larger dataset. ] .pull-right[ ![](https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/xy-lines-1.png)<!-- --> ] --- class:hidden ## Variables and Graphs .pull-left[ - Geospatial Data >The primary mode of showing geospatial data is in the form of a map. In addition, we can show data values in different regions by coloring those regions in the map according to the data.(Choropleth). In some cases, it may be helpful to distort the different regions according to some other quantity (e.g., population number) or simplify each region into a square. Such visualizations are called cartograms. ] .pull-right[ ![](https://clauswilke.com/dataviz/directory_of_visualizations_files/figure-html/geospatial-1.png)<!-- --> ] --- class: middle, center # Thanks! ## Slides created via the R packages: ### with
<i class="fab fa-r-project faa-vertical animated "></i>
### via the R packages:<br> ⚔️ [**xaringan**](https://github.com/yihui/xaringan)<br> +<br/>😎 ✘[**gadenbuie/xaringanExtra**](https://github.com/gadenbuie/xaringanExtra) <br> +<br/> ⚔️[**the tidyverse**](https://tidyverse.tidyverse.org/)