Logo

FF21 workshop: Carpentries Incubator Introduction to AI for GLAM - Shared screen with speaker view
Mike Trizna (he/him)
17:27
Carpentries Code of Conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
Mike Trizna (he/him)
17:40
AI4LAM Code of Conduct: https://sites.google.com/view/ai4lam/about?authuser=0#h.p_zuKrz82B5sYs
Jennifer Giaccai
58:03
Any chance we could go the last definition slide for a minute?
Richard Arias-Hernandez
59:30
- What are labels like? a categorical controlled vocabulary? a quantitative metric? Does it matter?- How are they applied? exclusively, exhaustively? Does it matter?- What does a ML model look like? a multidimensional matrix of values for categories as in the fruit example?
Richard Arias-Hernandez
01:00:02
haha no need to take on all questions … these are my notes
Mike Trizna (he/him)
01:01:18
@Jennifer, did that work on your device, at least?
Mark Bell
01:01:51
I'll start with the first one. A label could be anything: it could be something categorical - like dog/cat; it could also be a numeric value like the temperature on a given day; or it could be a whole paragraph of text - like a summary of a document.
Jennifer Giaccai
01:02:26
Menti? Yes! (Except for when my phone keeps locking me out because I have the child-unfriendly short lock time on it)
Nora McGregor/British Library
01:14:52
https://www.robots.ox.ac.uk/~vgg/research/
Richard Arias-Hernandez
01:16:51
Thank you @Mark
Nora McGregor/British Library
01:19:14
Fun example: https://www.robots.ox.ac.uk/~vgg/research/face_paint/
Mathieu-Alex Haché
01:27:22
Transfer learning is often mentioned in the literature, what is it exactly and what can it be useful for in GLAMs?
Nora McGregor/British Library
01:28:50
Good question! Mike will be covering that in a few slides from now right before we break :)
Nora McGregor/British Library
01:29:45
And HTR (Handwritten Text Recognition) like https://readcoop.eu/transkribus/
Daniel van Strien
01:33:57
I want an image of an avocado in the shape of an armchair 😛
Philo van Kemenade
01:34:42
What's the name of that avocado chair model again?
Daniel van Strien
01:35:51
https://openai.com/blog/dall-e/
Juja Chakarova
01:36:13
Hi. Can anybody help me to know how to use Menti pls?
Philo van Kemenade
01:37:13
Thanks Daniel!
Juja Chakarova
01:38:16
Thanks Nora!
Nora McGregor/British Library
01:38:30
https://www.menti.com/pbrviz1yx9
Mark Bell
01:38:55
Here is a nice example of text summarisation which reduces research papers to a sentence: https://tldrthis.com/
Dorothée Benhamou-Suesser
01:39:05
Is transfer learning or fine tuning necessarily superpervised ?
Daniel van Strien
01:41:04
It doesn’t need to be depending on how you approach it. For example, you could fine-tune a language model that has been trained on contemporary English on historic English texts. Since language models (often) don’t need supervised data this fine-tuning could happen without any labelled data.
Mike Trizna (he/him)
01:41:58
Good question @Dorothee. This is actually where semi-supervised learning might come in. Where you have a lot of images or documents, and you want to finetune the parameters to “separate” the examples as much as possible.
Jennifer Giaccai
01:56:44
Aha, I knew I’d seen that image before, but couldn’t remember where—must have read that blogpost
Philo van Kemenade
01:57:20
https://ai.googleblog.com/2018/09/introducing-inclusive-images-competition.html?m=1
Dorothée Benhamou-Suesser
02:04:52
How do you give feedback / correct the bias in the ML process : do you have to correct the same bias on a large amount of data or multiple times the same biais ? Or can you just change the learned"rule" / "model" / "inference" at a more general level and faster ?
Mike Trizna (he/him)
02:07:18
Great question, Dorothee. I think Nora is about to answer this question, but we can definitely follow up after the next set of slides.
Mike Trizna (he/him)
02:10:55
Datasheets for Datasets: https://cacm.acm.org/magazines/2021/12/256932-datasheets-for-datasets/fulltext
Mike Trizna (he/him)
02:12:03
Lessons from Archives PDF: https://dl.acm.org/doi/pdf/10.1145/3351095.3372829
Daniel van Strien
02:12:23
A GLAM dataset which (tries) to include a useful datasheet: https://huggingface.co/datasets/blbooksgenre
Philo van Kemenade
02:13:15
Also see IBM's AI fairness toolkit https://aif360.mybluemix.net/
Mark Bell
02:14:37
On the correcting bias question, I'd say we communicate with the model through the data. So you wouldn't go in and adjust the weightings, but you could pass new data into it. Or you could process your original data originally - for example, removing gendered terms from the Amazon CVs.
Mark Bell
02:15:04
differently not originally
Nora McGregor/British Library
02:15:26
Thanks Mark! Perfect.
Dorothée Benhamou-Suesser
02:17:27
Thank you very much @Mark, @Daniel and @Mike for answering my questions (this one and the former) !
Mike Trizna (he/him)
02:17:59
👍
Dorothée Benhamou-Suesser
02:18:00
So it can be a consequent work to correct bias
Mark Bell
02:20:37
The first step is identifying the system is biased before it gets used. The whole training a model process is very iterative. Train - refine - train again
Nora McGregor/British Library
02:24:42
I think this set of principles is useful. Lots of detail here. https://ethical.institute/principles.html
Nora McGregor/British Library
02:29:28
Our Case Study: "A Museum is keen to make a newly acquired digitised collection of 20,000 Southeast Asian late 19th Century Photographs more discoverable within the main catalogue.The Photographs are the work of an English traveller, and aside from captions handwritten by him, the individual photographs have very little in the way of item-level description."
Nora McGregor/British Library
02:30:38
Donor has demanded it be accessible online quickly.
Mathieu-Alex Haché
02:30:46
An interesting possibility could be to detect the emotions on the faces of the individuals appearing in the photographs in order to allow the emotional navigation of the collection by the users.
Nora McGregor/British Library
02:30:50
;P
Nora McGregor/British Library
02:34:06
Emotional AI could be challenging as it's such a subjective thing. Would need to be very aware of potential for bias with that one :)
Jennifer Giaccai
02:34:28
I’m not sure I could understand the difference between / roles of software developer, ML experts, and IT
Mike Trizna (he/him)
02:35:09
At underfunded institutions, they might be all the same person.
Mike Trizna (he/him)
02:35:44
I see the ML experts as the people that train the models, and the software engineers do the frontend work.
Philo van Kemenade
02:35:56
Gotta run, thanks for the great workshop!
Mark Bell
02:37:03
Software developers might create the interface that the user interacts with the ML system. They may also create the pipeline that gets data to travel through the system. IT would be in charge of hosting the machine it all runs on. The ML people are turning the data into a model - doing all the training and testing, trying different models out etc.
Mark Bell
02:37:19
Mike is right though, that it may well be one person!
Mathieu-Alex Haché
02:37:37
@Nora McGregor I totally agree, particularly in the case of historical photos of marginalized communities or victims of colonialism: the emotions conveyed in the photos are sometimes not the real ones that were experienced.
Nora McGregor/British Library
02:38:20
TOTALLY. Also particularly for our example, facial recognition AI at the moment is still quite imbalanced when it comes to black v white faces, so utilising it on Southeast Asian photographs would be troublesome.
Nora McGregor/British Library
02:39:08
https://theconversation.com/emotion-reading-tech-fails-the-racial-bias-test-108404
Mike Trizna (he/him)
02:40:37
If anyone else has to leave early, thanks so much for staying as long as you did!, but also please fill out our survey when you get a chance: https://forms.gle/mpuY3xcFpwWqz5DF6
Nora McGregor/British Library
02:42:17
I once put my own photograph into the Cloud Vision API many years ago and it was like 70% certain I was a cockerspaniel. 😂 It's improved since.
Richard Arias-Hernandez
02:42:34
lol
Mark Bell
02:42:50
I was labelled as Tom Hanks.
Mike Trizna (he/him)
02:48:24
Survery one more time: https://forms.gle/mpuY3xcFpwWqz5DF6
Mike Trizna (he/him)
02:48:30
*survey
Jan Philipp Richter
02:49:55
Dear all, unfortunately I have to leave now, but thanks so so much for this great, well prepared and very informative workshop. A lot to take home. Kudos!
Nora McGregor/British Library
02:50:13
Thank you Jan!
Jennifer Giaccai
02:51:29
Facial recognition can’t differentiate me from my mother, although to be fair, when I was in first grade and my grandfather showed me a picture of my mother in first grade, I was convinced he had sent some spy to take a picture of me—so maybe the AI is more correct than we thought.
Tom RICHARD
02:53:03
Unfortunately, I have to goThank you for this amazing and interesting workshop !Great work and discovery of Menti :)
Nora McGregor/British Library
02:54:03
https://programminghistorian.org/en/lessons/
Nora McGregor/British Library
02:55:16
https://transkribus.eu/lite/
Mike Trizna (he/him)
02:55:25
https://github.com/YaleDHLab/pix-plot
Mike Trizna (he/him)
02:56:57
Review of GLAM Training Resources: https://sites.google.com/view/ai4lam/news/training-resources?authuser=0
Nora McGregor/British Library
02:57:21
https://www.fast.ai/
Reinhardt Hartzenberg
02:59:45
Thank you so much for the into in AI. These workshops are great for people like me in South Africa that has limited exposure to AI in the direct market.
Nora McGregor/British Library
03:00:13
👍🏻Thanks for coming!