Showing posts with label Christian Rudder. Show all posts
Showing posts with label Christian Rudder. Show all posts

Dataclysm: Who We Are (When We Think No One's Looking) by Christian Rudder (2014)

, 5 May 2015

Dataclysm is a fun popular approach to scientific data analysis and interpretation. This a very enjoyable fascinating reading.

There are many good things about this book: it deals with data  in a passionate and entertaining way,  and makes something a priori not that interesting to people who do not love data, really interesting! The book does so in a very clear and approachable language. Besides, Rudder is an insider who knows what he is talking about first hand so everything he says is worth listening to. After all he is not one of those lorikeets who repeat data analysis and statistics without understanding anything about them or even questioning the results. Rudder comes trough as a lovely chap, inquisitive mind, and passionate about the work he does. Most importantly, he comes through also as an unpretentious guy who wants to connect with the reader. We are connected now, baby.

Perhaps the main take from the book to me is that mathematicians and data analysts are coming down to something that Social Sciences and Humanities wanted them to come to decades ago, and, most importantly, they now have the tools to deal with humongous amounts of real-life real-people's life to do that. By reading the book, it became clear to me the need of interdisciplinary studies between Scientists and Social Scientists because, despite the a-priori bullxeet that the "academic establishment" has perpetuated for many decades, the contrary is  necessary. Dataclysm shows the many possibilities open to data analysis as a specific branch of Science and how data collection and interpretation affects us. It is like Mackos - I am loving it.  

The most interesting parts of the book, at least to me, are chapters 12 (Know your Place) and chapter 14 (Bread crumbs). The first because it shows, even at an embryonary level, that geographical maps are sometimes just lines drawn on a piece a paper, while other factors, beyond the place you live, have way more importance. The Dolly project seems to me the most fascinating thing in the world and I would have loved more details from the expert instead of having to go to  Mr Goo to ask him about it. The later chapter is, by far, the most interesting (to me) because Rudder is an insider and anything and everything he has to say about the collection, storage and use of our data or meta-data is relevant and important and needs to be taken into consideration. I would have loved that chapter way more developed and detailed. Rudder is just very clear about how things are and should be, or perhaps should not be, and I wanted more. Also, I wanted to know his opinion on the use of IP blinders, and the use of browsers like Mozilla or Duckduckgo, which are not that keen on recording our data or sharing it with anybody.

I am a critical reader, not a data muncher, so I tend to question or think about what any non-fiction writer says, as much as my limited knowledge allows me. I was pleased to find that some questions or objections that I found myself making to Rudder's statements while reading, were later presented and discussed. Those very questions are the ones are those that can make any researcher transcend data itself. In a way, Rudder has a Humanities sort of soul, which shines now and then when dealing with his mathematician core. I love the combo. 

I also loved al the details about data collection and use, and the games that Rudder and his pals at OkCupid playe, and especially Rudder's trends analysis examples. That is Rudder's forte and it does show! I loved some of his reflections on Google's auto-complete trends analysis, the healthiness of a couple by looking a the chart on dots interconnection on Facebook, or the discussion on racial attitudes in the USA.

The charts are beautifully presented and coloured, so many different styles and ways of organising the data. I am a tables kinda lady. There is nothing that cannot be presented in a table and be understood. And some of those were there. I love squares and red. So the book was visually enthralling.

Dataclysm could have been a better book on so many fronts that it is a pity that is not. Allow me the analogy - I have this distinct impression that, in a way, Rudder self-beheads himself for the sake of an applause in a reality show. That is painful to watch.  

TOMATOS OR TOMATOES?
The main downside of the book is the lack of a proper editor and of proper editing. A good editor can make wonders for any book, no matter how brainy you are. A good editor works not only on making the text more readable regarding spelling, sentence and paragraph structure, but also book structure, approach and level of focus, so the book is not only polished, but also makes sense and conveys the author's message better. Unfortunately, the book is not polished and the structure does not make Rudder any favour. Mind you, the use of verbal contractions is not advisable in a published book, unless you are translating or reproducing direct speech, while it is preferred in blogging. The use of long paragraphs with bad punctuation turns a stroll by the beach into a walk through thorny bushes. You get the image.

GOING BANANAS  - THE BOOK STRUCTURE
It is a pity that the author decided or was advised to present us with the current book's structure. I do not have a problem with general non-related chapters presented as such and bunched together in  three parts, because they make sense to me and they are well connected despite their diversity. However, I do have a problem with the general structure of the book, and the endnotes/notes system.

The chapter on sources and data is relegated to the end of the book, before the index. I consider this a big fall because Rudder is asking the reader to believe what he says with some sort of theological trust, while he could have easily earned the reader's trust and respect by just using the "coda" (Italian for tail or epilogue) at the very beginning. Why? Because this chapter explains exactly how he has approached data, and his methodology, what he has done and how he has done it. This is especially relevant in this book, because a good deal of its chapters are re-takes on his own blog posts, so it would have benefited him stating clearly, upfront, at the beginning, that those re-takes used new fresh data and the testing was done again from scratch and were not a copy-and-paste sort of thing. We have to wait to the end of the book to learn that. That is to me, a "going-bananas" sort of decision.

I am a bit anal about footnotes/endnotes while writing and while reading anything coming from academics or people with a high level of education. I also understand that if you want to write a scientific book on data for the general public you cannot do that, just for practical reasons. So, I consider sensible Rudder's restrain at using endnotes. Then, we get to the bottom end of the book, and we find this statement:
"We no longer live in a world where a reader depends on endnotes for “more information”or to seek proof of facts or claims. For example, I imagine any reader interested in Sullivan Ballou will have Googled him long before"
Yes, it is true, even I do that, but it worries me that any person coming from a decent University would say that or do that in a book. We are relying more and more on what the Wikipedia is saying or the Internet (who is the Internet here?) is saying, and not on what scholarly periodicals, books or encyclopaedias, peer-reviewed, properly edited and discussed, say about anything. I would strive to provide "serious" reference material, and add as many footnotes or endnotes or references as necessary.

Confession. I would have forgiven him for this, if then Rudder had not gone bananas again and contradicted himself by providing a "chapter" called "notes", right after the space devoted to the endnotes. Rudder wants to provide us with extra information on certain points mentioned in the book. Well, if that the case, add more footnotes/ endnotes. That is what they are used for, sweetheart! Those "notes" are actually embryo endnotes that Rudder birthed and give in adoption to himself. It sounds ridiculous isn't it?  It is. This is even more painful in the Kindle edition. The link from the note to the text works backwards, and takes you to the part of the text it relates to, but does not allow you to do so forward, because, hello Huston, there is not an endnote to do so properly. 

If this were my book, I would work on fixing this and introducing that information as endnotes in the text, properly. And also to link properly the references forward in the Kindle Edition.  

There it comes the Index, a proper scholar index, one of those beautifully made indexes that are so awesome to have in a book and so expensive to produce in printed books. There for us... Well, useful if you have a hard copy. Otherwise, no, because it is not properly linked in the Kindle edition, and therefore, useless. This is something easily fixable if you want to charge the client full price for any book.

To add to this going-totally-bananas sort of trance, the book, per se, ends when my Kindle showed 65% read. Yes, that is right. The rest is the footnotes, notes, index and info about the author and the publisher. I felt ripped off again.   

Why anybody with the brilliance of Rudder could self-behead himself is something that escapes me. And here it comes the main culprit for the failure of the book - Rudder's struggle to please both the general public and the academia. Mini-Miny-Miny-Mo sort of struggle (my impression). Many of his statements about methodology strive to convey a serious scientific way of work that matters and gets  the approval of his academic peers, because he is really a serious scientist. That struggle also explains why the "coda" and the "notes" were relegated to the end of the book but were not totally disregarded.

A scientist can present his findings and knowledge to the general public being rigorous and respected by their academic peers without trying to please both. Look at Kaiku, and the way he is able to do so with easiness. For that you have to be clear about who is the target of your book, and therefore what you have to sacrifice and what not. Not an easy task, but easier if have a good editor.

FLAW WITH ME
There are a few flaws in the a-priori reasoning used. Perhaps things were not explained sufficiently, so I give Rudder the benefit of the doubt, just because he is a gorgeous looking guy. Here some examples of those sort of arguments should be polished and looked at with a frowned forehead, if you know what I mean: 
+ Although most people are not on online networks and sites, most of them are or will be, so the analysis of the data and its result have some sort of universality. And well, Facebook and Google are the kings and everybody is there, not to say the phone and Internet companies, which are also collecting your data. Yes, it is true. However, my mini-me-on-the-shoulder sort of question pops up. Were do we put the gazillion Chinese on Planet Earth who do not use FB or Google or Western sites? What about Middle East Cultures, like, say Yemen, or Saudi Arabia or Qatar or Afghanistan? Do include them by default in the findings and analysis in the book and decide that we are all one?

+ Sometimes I had the impression that Rudder could not distinguish, although I am sure he does, that the USA is not the world, and that the Western World is not the whole world. For example,  are his analysis (which I really loved) about race in the USA pertinent, say, in Bolivia? in South Africa? in Botswana? Rudder probably never intended to imply that, for sure, but the book comes across as if the contrary was true at times. I think part of the epilogue should have been devoted to stating what he is doing and what sort of limits his analysis has. This is, unless he is using data from around the world from China to Bhutan, Uganda to Dafour. Then, I will vanish and disappear out of embarrassment. Of course there are some things that are universal because we are all humans, and we all have a human body, and want to relate: "no man is an island" However the other is there, in those places where life is most deeply affected by the religion you have, your gender, or the part of the world you live in. Way different. 

+ The author recognises that the important thing is not just what the data says about what humans do, but why they do it. Bingo! That got me excited. It was a quickie-sort-of excitement. Not for long, because beyond some truisms, nothing of substance is said or argued or even presented as a reply. That is because the data, to me, has a limit. It can reveal what we do, even that we do say something and do a different one, that hidden secrets of us "on the Internet", but cannot always explain why, or put the intention behind. Psychology, can be very helpful on that regard.

+ What a person searches for often gives you the person himself. Really? Well, sometimes, not always. For example, if I look up Google for skin rash photos I might be giving my me having a skin rash, or me studying dermatology, or making an assessment for High School, or my baby has a rash and I want to find what exactly is, or I have a sort of sickly morbid fascination with photos of skin diseases. You get the picture, searches on Google are never straight forward, or at least not all the time. Now, how do you interpret the intention behind the search?   

TRUISMS ARE TRUE NO MATTER THE HUE
Let me ask you some questions. Be honest with yourself. If I made the statements below, would you be surprised or think that a humongous amount of data has to be analysed, charted and studied for your to learn it? Would you be wowed?  
* Men usually prefer younger women, no matter their age.
* At the end of the day, looks aren't that important when you meet a person in real life, more the things you have in common.
* People say they are something but then they are another.
* People tend to hide or not to say things that are not politically correct regarding race, gender and what is not.
* Men-women connect better when they do not sea each other's photo.
* People vote for somebody and lie about at the exit of the polls booth, especially if the candidate is not popular.
* Asian Americans talk more about Korean pop or Korean films than white people, while the music that South American mention is Salsa or Bachata not as much as country music.
* With the Internet we all have a voice now and a larger audience.
* The better interconnected in the family a couple is, the more chances has of their relationship to succeed.
* And so on.

Yes, that is right. A series of truisms, common sense evaluations presented through flashy mathematically crafted charts, and complex data analysis. Isn't this a bit of a waste of the author's talent (to me undeniable) and time?

HIGHLIGHTED SENTENCE
"The era of data is here; we are now recorded"
Is that so new? Have you ever visited a historical archive? Yes, of course it is not the same, but I can envision Sumerian bookkeepers might have felt at the top of the world as Rudder does know, mind the volume of data, the detail and the people recorded of course. Yet, everything is relative. We have been recording our data and our data has been used for ages, literally, just a bit differently. Yes, Rudder possibly did not intend to imply this either, but we do not know. The flash is sometimes too bright to let us see properly.

BRIGHT IDEA
The cover of the book is dreadful. Go and get a decent designer Rudder! And another editor, did I mention that? 

FINAL CONFESSION
I would have not written such a long review if there wasn't something intrinsically good and thought-provoking in Rudder's book, so take it as it is. I still recommend the reading and I think it is really entertaining.