Meet
   a
 Data Scientist
Institute for Data Science and Computing
Timothy B Norris, PhD - tnorris@miami.edu
December 9 2020

John Locke, Political Economy,
        and the Secret Life
         of a
      Bitpusher
Institute for Data Science and Computing
Timothy B Norris, PhD - tnorris@miami.edu
December 9 2020
Philosophical Transactions of the Royal Society
  • - Oldenburg, Henry (1665). Philosophical Transactions of the Royal Society, March 6th. The Royal Society, London.
  • - de Salio, Denis (1665). Journal des sçavans, January 5. Paris.
What is Data Science
  &
   What is a Data Scientist?
  • Beginnings
  • Data Curation
  • A Few Examples
  • A Political Economy of Data
  • Some Time for Questions and Discussion
  • > 2.048 MHz Zilog Z80
  • > 16K RAM
  • > 100K 5 1/2 floppy disk
  • > HDOS
  • > BASIC/PASCAL
  • > Serial Port
  • > Intel 8088 4.7 MHz
  • > 128K RAM
  • > 2 x 360K 5 1/2 floppy disk
  • > MS-DOS
  • > IBM PC Compatible
  • > WordStar, SuperCalc
Beginnings
University of California Industry-University Cooperative Research Program
  • Data driven maps of legislative districts and research investment
  • Print, presentation (film slides), and on line mapping system
  • California Legislature, National Labs, and UC Office of the President
Beginnings
GeoMap - UC Berkeley Geography
  • Participatory Mapping
  • Rights and resources
  • International courts, World Bank, Indian Law Resource Center
  • web map
    • ~ 200,000 records in streets + other features
    • ~ 120,000 map tiles -> 4+ days to render on P4
    • python + php + MapServer
  • search engine
    • custom address normalizer with synonyms and common errors
    • custom geocoder for Lima2000 address data
    • php + mysql
What is Data Science
  &
   What is a Data Scientist?
  • Beginnings
  • Data Curation
  • A Few Examples
  • A Political Economy of Data
  • Some Time for Questions and Discussion
Data Curation
Federal Movement Towards Open Data
Adapted from: Whitmire, Amanda L. (2014). Research Data Management Curriculum, Lecture 2: Introduction to Research Data Management. Oregon State University Libraries. http://figshare.com/articles/GRAD521_Research_Data_Management_Lectures/1003835
Data Curation
"It is an unfortunate accident of history that the term datum ... rather than captum ... should have come to symbolize the unit-phenomenon in science. For science deals, not with 'that which has been given' by nature to the scientist, but with 'that which has been taken' or selected from nature by the scientist in accordance with his [sic] purpose." (Kitchin 2014, p. 2)
Cited in Kitchin, R. (2014). The Data Revolution. Washington DC: SAGE from Jensen. H.E. (1950). 'Editorial Note' in H. Becker (1952). Through Values to Social Interpretation. Duke University Press, Durham, pp. ix.
Data Curation
Some Useful Abstractions
 


“Information is not knowledge.
Knowledge is not wisdom.
Wisdom is not truth.
Truth is not beauty.
Beauty is not love.
Love is not music.
Music is THE BEST.”

― Frank Zappa  
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
https://www.youtube.com/watch?v=yKvxWk5wwUU
Data Curation
Who Owns Your Data?

“The PI owns the data.”

“The university owns the data.”

“Nobody can own it; data isn’t copyrightable.”

... So who really owns research data? Well, the short answer is “it depends.” (emphasis in original).

Fortney, Katie (2016). "Who Owns Your Data". University of California Office of Scholarly Communication [ webpage ]. Accessed from https://osc.universityofcalifornia.edu/2016/09/who-owns-your-data/ on July 20th 2019.
Data Curation
UM and Data Ownership
Innovations: patentable or un-patentable inventions, discoveries, processes, compositions, research tools, data, ideas, databases, know-how, copyrightable works that are not scholarly or artistic Creations and tangible property, including biological organisms, engineering prototypes, drawings, and software created, conceived or made by Applicable Personnel within their normal duties (including clinical duties), course of studies, field of research or scholarly expertise or making more than Incidental Use of University’s resources. (p. 141)
3.3 Innovations are owned by the University; revenues derived from commercialization of Innovations will be shared with the Applicable Personnel as detailed in Section VI. (p. 142)
University of Miami Faculty Manual 2018 - 2019. https://fs.miami.edu/_assets/pdf/facultysenate/Documents/FacultyManual.pdf
Data Curation
!?
Data Curation
Why Researchers Have a Hard Time Sharing
Stuart, D., G. Baynes, I. Hrynaszkiewicz, K. Allin, D. Penny, M. Lucraft, and M. Astell (2018). Research Data: Practical Challenges for Researchers in Sharing Data. Springer Nature. DOI: 10.6084/m9.figshare.5975011.
  • 46% - Organising Data in a Presentable and Useful Way
  • 37% - Unsure About Copyright and Licensing
  • 33% - Not Sure Which Repository to Use
  • 26% - Lack of Time to Share
  • 19% - Cost of Sharing is too High
What is Data Science
  &
   What is a Data Scientist?
  • Beginnings
  • Data Curation
  • A Few Examples
  • A Political Economy of Data
  • Some Time for Questions and Discussion
Some Examples
Embedded Data Curation
  • UM Libraries
  • Center for Computational Science
    • Pulley Ridge (RSMAS with Peter Ortner, Chris Mader among others)
    • The Turtlebox Project (NOAA with Paul Richards, Chris Mader among others)
    • Las Flores Participatory Mapping (SOA with Chris Mader, Adib Cure, Carie Penebad among others)
Pulley Ridge Data Curation Experience
"represents a collaboration of more than thirty scientists at ten different universities and two federal laboratories (NOAA’s Atlantic Oceanographic and Meteorological Laboratory and Southeast Fisheries Science Center) pooling their expertise through NOAA’s Cooperative Institute for Marine and Atmospheric Studies at the University of Miami in coordination with the Cooperative Institute for Ocean Exploration, Research, and Technology at Florida Atlantic University."
NCCOS - Understanding Coral Ecosystem Connectivity in the Gulf of Mexico from Pulley Ridge to the Florida Keys - https://coastalscience.noaa.gov/project/coral-ecosystem-connectivity-gulf-florida-keys/
Pulley Ridge Decision Support Resource. https://mesophotic.ccs.miami.edu.
VirTu - Sea Turtle Density Estimator
https://virtu.mesophotic.ccs.miami.edu
Participatory Mapping: Las Flores, Barranquilla, Colombia
https://mapeolasflores.ccs.miami.edu
point cloud: Chris Mader and Amin Sarafraz
6 Pages
81 Questions
18 Observations

8Information about the interviewee
14Building characteristics
13Socio-economic
8Services
14Community and culture
10Agricultural activities
11Environment
18Built environment (observations)

Signed Consent Form

Summary of the Door to Door Survey

2578 buildings edited in the database (100%)

2414 buildings observed from the street (94%)

2338 buildings with complete survey instrument (91%)

2125 buildings with complete survey and signed consent form (82%)

!?

Written charter co-drafted by community members, UM CCS and pro-bono lawyers from Tecnoglass

Data Ownership, Control, Access, Possession

All data owned by FUMUJEM, except for orthographic images which are co-owned by FUMUJEM and UM CCS

8 members of the board:
- 4 from community
- 2 from UM CCS (UM has deciding vote)
- 2 from Fundación Tecnoglass

137 variables in census:
- 63 are public
- 62 restricted, and 12 internal

All data sharing must be voted on by board (actas firmadas)
- all data under Open DB licenses
- all assets under CC licenses

Some Examples
Lessons Learned and Food for Thought
  • Data Sharing and Data Publication: interest is apparent!
    Great tools exist; UM libraries in cahoots with IDSC is developing more robust services.
  • Storytelling and Data Curation go together, much like curation in museums.
    Similar to other projects like Pulley Ridge experience (Norris and Mader 2019, in further reading, last slide).
  • Data does not tell stories, humans tell stories with data.
    Extreme care should be taken with curation of data. It is the stories we tell as scientists, reporters, and other interested parties that shape societal reaction.
Some Examples
GIS, Data Curation, and Storytelling
  • Quality metadata
  • Proper sharing licenses/agreements
  • Interdisciplinary problem solving
  • Synthetic analysis and visualization
  • Relevant applied science
Indeed, a well constructed GIS is a
special collection made purposefully to
curate cartographic exhibits with
dynamic geospatial data.
At the CCS visualization wall, Paul Richards of NOAA explains the relevance of modeling turtle distributions in the conext of oil spills in the Gulf of Mexico.
What is Data Science
  &
   What is a Data Scientist?
  • Beginnings
  • Data Curation
  • A Few Examples
  • A Political Economy of Data
  • Some Time for Questions and Discussion
Political Economy of Data
Data Science
Big Data Analytics
Data Visualization
Machine Learning
... and so on ...
}
UM Institute for Data Science and Computing
MPS in Data Science at UM
High Performance Computing
/
"Standard" Data Science
\
"Critical" Data Science
Political Economy of Data
Critical Data Science
Critique is not a rejection of some practice or technology, but instead a careful interrogation of categories and assumptions used by certain approaches or technologies. It is in this vein that "critical GIS" should be understood (Crampton and Krugier, 2005).
... perhaps the same for a critical data science?
Political Economy of Data
The Open Revolution
"Will the digitial revolution give us digital dictatorships or digital democracies? Forget everything you think you know about the digital age. It's not about privacy, surveillance, AI, or blockchain -- it's all about ownership." (Pollock 2018, back cover)

"Today, in a digital age, who owns information owns the future." (Pollock 2018, p. 7)
Pollock, R. (2018). The Open Revolution. A/E/T Press.
Political Economy of Data
"The benefts of sharing identified by researchers relate mainly to the impact of their work (i.e., combining data increases the validity and reproducibility of the research), research efficiency (i.e., saving time and costs), generation of new ideas and contributions to the field, and transparency and collaboration. These answers confirm previous findings that data sharing helps develop a democratic society ... enhances transparency ... allows for reproduction ... and unleashes the potential of data to solve complex societal problems ..." (Elsevier and CWTS 2017)
Elsevier and the Centre for Science and Technology Studies (CWTS) (2017). Open Data: The Researcher Perspective. Elseveier and CWTS. https://www.elsevier.com/about/open-science/research-data/open-data-report.
Political Economy of Data
"Moving forward, it is worth questioning the value of open data and data sharing. Given the possibility of unanticipated challenges, concerns, and potential impacts to the time available to scholars to conduct actual scientific research, is open data always a good thing?" (Tenopir et al 2015)
Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D. and Dorsett, K. (2015). Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLoS ONE, 10(8), e0134826. doi: 10.1371/journal.pone.0134826.
Political Economy of Data
"It is very difficult to get open data to pay for itself because open data is almost a perfect example of a public good, a type of resource markets almost inevitably fail at supplying. And Yet despite public policy interest in open research data, nobody seems to know how to finance it, even at the level of the white house." (Kansa 2014)
Kansa, E. C. (2014). The Need to Humanize Open Science. In: Moore, S. A. (ed.) Issues in Open Research Data. Pp 31-58. London, Ubiquity Press. DOI: 10.5334/ban.c.
Political Economy of Data
The rise of Digital Feudalism not only raises questions over who controls, owns and benefits from the value of data, it also raises ethical questions related to privacy, accuracy and accessibility. This, in turn, leads to questions around the rise of digital monopolies and the power imbalance that could create; as well as the resulting data asymmetry and its impact on the global society. It is essential that we advance the dialogue on developing ethical principles and solutions as a means to help address these issues.
Global Open Data for Agriculture and Health (GODAN). Ethical Dimensions of Feudalism in Digital Agriculture. Accessed on December 6 2020 from https://www.godan.info/news/ethical-dimensions-digital-feudalism-agriculture.
Political Economy of Data
"As the foregoing makes clear, establishing and maintaining information-sharing partnerships—much less data-sharing partnerships ... is not easy. In absence of active, ongoing efforts to tend communication and to open policy and legislative channels, the default position appears to be closure. As such, it is crucially important to build up formal sharing arrangements and joint data governance structures ... In terms of organizational structure, data partnerships entail co-governance of the data asset." (p6)
Bruhn, J. (2014). "Identifying Useful Approaches to the Governance of Indigenous Data." The Governance of Indigenous Information 5(2): Art. 5.
Political Economy of Data
Food for Thought
Political Economy of Data
"The information ethic in this critique is based upon explicitly acknowledging that information systems are human creations, not natural phenomena. It follows the same argument that the free market is a myth; better said, the so-called “invisible hand” of Adam Smith is subordinate to institutions created by humans. This does not deny the possibility of creating greener and friendlier information systems; instead it creates opportunities to emphasize the human role in these processes."
Norris, T., & Suomela, T. (2017). Information in the ecosystem: Against the “information ecosystem”. First Monday, 22(9). doi: https://doi.org/10.5210/fm.v22i9.6847.
Thanks!
Timothy B Norris
tnorris@miami.edu
Coda: data curation is a human activity. There is no way to express gratitude to all of the people who have supported this work - you know who you are.
Data Curation and Data Science
Further Reading
  • Bruhn, J. (2014). "Identifying Useful Approaches to the Governance of Indigenous Data." The Governance of Indigenous Information 5(2): Art. 5.
  • Data Curation Network (2018). "Checklist of CURATED Steps Performed by the Data Curation Network." https://datacurationnetowrk.org
  • Elsevier and the Centre for Science and Technology Studies (CWTS) (2017). Open Data: The Researcher Perspective. Elseveier and CWTS. https://www.elsevier.com/about/open-science/research-data/open-data-report.
  • Fecher B, Friesike S, Hebing M (2015) What Drives Academic Data Sharing? PLoS ONE 10(2): e0118053. 10.1371/journal.pone.0118053
  • Kansa, E. C. (2014). The Need to Humanize Open Science. In: Moore, S. A. (ed.) Issues in Open Research Data. Pp 31-58. London, Ubiquity Press. DOI: 10.5334/ban.c.
  • Kitchin, R. (2014). The Data Revolution. Washington DC: SAGE
  • Norris, T. and S. Shreeves (2017). University of Miami Data Curation Initiative: Report and Recommendations. University of Miami Libraries.
  • Pollock, R. (2018). The Open Revolution. A/E/T Press.
  • Stuart, D., G. Baynes, I. Hrynaszkiewicz, K. Allin, D. Penny, M. Lucraft, and M. Astell (2018). Research Data: Practical Challenges for Researchers in Sharing Data. Springer Nature. DOI: 10.6084/m9.figshare.5975011.
  • Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Manoff, M., Frame, M. (2011). Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6(6). DOI: 10.1371/journal.pone.0021101
  • Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D. and Dorsett, K. (2015). Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLoS ONE, 10(8), e0134826. doi: 10.1371/journal.pone.0134826.
  • Tuan, Yi-Fu (1977). Space and Place: The Perspective of Experience. University of Minnesota Press, St. Paul.
  • University of Miami Faculty Manual 2020 - 2021. https://fs.miami.edu/_assets/pdf/facultysenate/Documents/FacultyManual.pdf
  • Weber, Max & Andreski, Stanislav (1983). Max Weber on capitalism, bureaucracy, and religion : a selection of texts, London ; Boston: Allen & Unwin.
  • Whitmire, Amanda L. (2014). GRAD 521 Research Data Management Lectures. Oregon State University Libraries. http://figshare.com/articles/GRAD521_Research_Data_Management_Lectures/1003835.
  • Wilkinson, M. D., M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. C. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao and B. Mons (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. doi: 10.1038/sdata.2016.18.