Ellie Asks Why Annex

25 October 2013

Account hijackers

If a message originates from a familiar name or email address, its likelihood of making it through spam filters is greater.

Google described their efforts to minimize harm to users due to email account hijacking:
"Our security team...saw a trend of spammers hijacking legitimate accounts to send their messages. [We developed] a system that uses 120+ signals to...detect whether a log-in is legitimate, beyond just a password."
Less than 1% of spam emails make it into a Gmail inbox.

chart Google Gmail accounts compromised since 2010 decreased to nearly zero
Legitimate Gmail accounts blocked for sending spam versus time

The number of compromised accounts decreased by 99.7% since 2011. That's impressive, for a sustained reduction! How does Google avoid false positives? I am so curious about the specific details of their filtering rules!

The blog post was written in March 2013. It is remarkable that the same methods continue to be effective, as Gmail spam-attackers would perceive this as a new challenge to be overcome.

120 Signals

I suspect that Google's methods are analogous to those used by the U.S. Department of Health & Human Services' Centers for Medicare & Medicaid Services (CMS) in detecting medically unlikely edits (MUEs). MUEs can be accidental, due to claim coding or data entry errors. MUEs can also be deliberate, when there is fraudulent intent, e.g. by filing for more services, or for more expensive services. Regardless of intent, MUE identification reduces paid claims error rates.

How will the Affordable Care Act impact existing processes for detecting MUEs, and for setting benchmarks? CMS does not disclose its MUE criteria for the same reasons that Google will not reveal details about their 120 signals.

Continuous improvement is a part of life, for email-spam account hijackers, Google and the fraud detection team at the Centers for Medicare and Medicaid Services.

I wrote a post about health care, with a much more Ellie-centric theme, a few years ago. That was when I worked as statistician for ACCCHS, Arizona's state-administered Medicaid/Medicare program, monitoring program performance and quality of care.

11 March 2013

Compressed data for prayer, anagrams and digital rights management

I found an oddly contemporary-looking New York Times article that is in fact, quite vintage for the Internet. It begins with a review of a most peculiar e-commerce company:
doing business with Newprayer.com may require a leap of faith.
- Compressed Data: Beaming Prayers to God's Last Known Residence
via The New York Times Online, 31 August 1999.

Example of ecommerce in 1999
Last known location of the divine is
easier to find than this website
Image provided courtesy of archive.is
The Internet Fraud Watch for the National Consumers League was deluged with complaints about fraud on the Net, having received 7,700 last year and 6,000 through the first six months of 1999.
If they only knew what was to follow, in less than ten short years.

Digital rights management

The next article was about a new "pact" between Adobe and Xerox, to address the needs of companies
...seeking a way to prevent the rampant piracy that has plagued the digital music industry from overtaking digital publishing. The technology, called Content Guard, is to be announced at the Seybold 21st Century Publishing Conference in San Francisco.
When was the last Seybold 21st Century Publishing Conference, I wonder? Not for awhile. The proposed approach seems so straightforward! It would be
integrated... with Adobe's existing PDF format for distributing documents on line... publishers that have agreed to adopt the technology, include Thomson Learning, the National Music Publishers Association, and Haymarket Publications, a European business publisher.


Content Guard was expected to be superior as a form of digital rights management software, as it was
based on an industry standard: Java, an Internet programming language developed by Sun Microsystems.
I just received my n-th zero day patch for Java last week. Yet Java lived up to this part of its promise, and still does:
The flexibility of Java would allow users to read Xerox protected documents [and non-Xerox protected documents too] on various types of software operating systems using any of the standard Web browser programs.
I don't think Adobe had fully enabled the following functionality in PDF's viewed with Adobe Reader until much later; I have rarely seen it used, even though it is available:
Publishers, corporations or individuals could specify who had access to the document, set a time frame for protection and even designate the type of authentication (like a password or a fingerprint) needed to read the document.
Adobe introduced these features in 2009, with the exception of fingerprint authentication for most of us, for digital signatory and general purpose security rather than digital rights management purposes.

Anagrams for free

I'll end on a more positive note, rather than gloomy nostalgia. The wonders of natural language processing were just emerging into the larger population.
The letters that form the name Boeing can be rearranged to spell "big one." Time Warner can be converted to "mean writer." And the title of Rupert Murdoch's sexy London tabloid The News of the World is an anagram for "tender, hot flesh -- wow." These are just a few of the possibilities in business anagrams, a game being played by office workers throughout the English-speaking world.
The language in the following paragraph caught my attention for several reasons. First, the exact and accurate wording, to "contact the server", would be uncommon now in a daily newspaper.
To play, contact the Internet Anagram Server at www.wordsmith.org/anagram, which provides immediate answers, or another site called Anagram Genius Server at www.anagramgenius.com/server.html, which gives a more considered response and replies by e-mail after a few minutes or hours, depending on traffic volume.
Then there's the reminder of the absence of web apps, as the requested anagram is sent by e-mail, in minutes. Or hours.
At no charge, these sites will attempt to create anagrams from any word or phrase, not just company names. But somehow there's a special mischievous thrill...
Emphasis mine. If you want to find out what that thrill is, read the New York Times article, linked above. I only hope that the New York Times will remain extant, rather than joining so many worthwhile news and information services, preserved for us only through Internet archives.

I'm sorry. I tried. Gloom won.

25 December 2012

Summer days and nights of 2009

This video was recently featured on the HPC Wire YouTube channel. It is an animation of the summer weather of 2009, as only super computers can do! HPC refers to "High Performance Computing". Cray was one of several contributors to the project. I still think of Cray as THE super computer developer, though those days are probably past.

What's so special here?

A recent HPC Wire article about climate change explained why simulation at such a fine resolution (7-kilometer) was so difficult, because it required:
a special allocation of computing time on the Athena supercomputer at the National Institute for Computational Sciences (NICS)... For six months, the entire 18,048-core system was at the disposal of the team. Among the results ... were simulations that represented boreal summer climatology at 7-kilometer resolution
Notice shifting cloud cover and precipitation in shades of gray scale during the summer months of 2009. The quality is exceptional.

I appreciated that the production group chose NOT to use any music, nor narrative, during this 1 minute, 38 second animation. I wish that were more common, especially for brief, well-annotated videos like this!

Climate change perception v. evidence-based reality

I read a rather comprehensive technical paper that should be enough to convince anyone that something has changed, for the worst, in the Earth's climate: Distributions and Trends of Death and Destruction from Hurricanes, 1900–2008, Willoughby, H. (Jan 2012); Nat. Hazards Rev., 13(1), 57–64. This led to some thoughts that I wrote up, regarding climate change and New York City's physical infrastructure, in light of the recent storm, Sandy.

Finally, I find it difficult to ignore the odd perception gap between climate change denialists and the growing body of climate change evidence. I found an analysis of that discrepancy and its possible cause from an unexpected source: An article in Nature, "Why we are poles apart on climate change" by a Professor of Law, at Yale University School of Law. He wrote something a few months prior to this, a bona fide scholarly journal article, which had some distressing conclusions which I think are correct, though I don't exactly understand the cause, see The polarizing impact of science literacy and numeracy on perceived climate change risks Kahan et. al. (Apr 2012); Nature Climate Change 2, 732–735:
Members of the public with the highest degrees of science literacy and technical reasoning capacity were not the most concerned about climate change. Rather, they were the ones among whom cultural polarization was greatest.

HTML5 video

If possible, try to view this in full screen mode for optimal effect. The video supports up to 720p.

I suggest trying the YouTube HTML5 player. It is in beta, but works well, and has been available for nearly a year. Most videos seem better when viewed with HTML5 instead of Adobe Flash, whether YouTube or Vimeo. There is less of the dreaded "Flash Crash", although they can get laggy. I always enjoy the comparison!

03 December 2012

MintChip denouement

The Royal Canadian Mint is the official mint of the Canadian government. In March 2012, the Royal Mint announced that it would discontinue all future production of penny coins. A week later, the Toronto Star ran a news story, in which the Royal Mint introduced the first national digital currency in North America, the MintChip.

A Royal Canadian Mint spokesman provided the following description:
MintChip doesn’t plan to link to a person’s bank account or credit card information. And unlike BitCoin, a peer-to-peer hosted digital currency with a fluctuating value, MintChip is simply a new way to exchange Canadian dollars. Plus, it’s backed by the Canadian government. 
The MintChip doesn't satisfy criteria for what I would consider a bona fide currency. Rather, it seems more like a type of electronic payment network for the Canadian Dollar.

Golden prize

A rather intriguing contest, MintChip Challenge was announced in the same Toronto Star article. MintChip Challenge was an app developer contest sponsored by the Royal Canadian Mint, with top prizes to include the equivalent of CAD 50,000 of gold bars and coins, in gold bullion, i.e. 99.99% gold.

The top comment on the Toronto Star article offered this suggestion:
Did you know that one of the leading proposals for how to use MintChip is for purchasing bitcoin? Because of the irreversibility of MintChip transactions, this would solve a lot of issues. See paragraph 6 of MintChip Misses the Point of Digital Currency via Forbes.
MintChip Challenge generated much excitement. The 500 entry spots were filled in merely four days! Prize winners were to be announced on 25 October 2012.

What's up with MintChip? 

The official website hasn't provided much information. I was curious. Erstwhile gAt0mAl0 was curious too:
So what happened with MintChip – Canada’s digital currency? It has disappeared into the Bermuda Triangle of digital currency holes – a news blackout. 
The denouement of MintChip Challenge was distinctly anticlimactic. gAt0mAl0 explains more about the Canadian MintChip, and Bitcoins too. Alternatively, you may prefer to explore gAt0's rather impressive Bitcoin Mind map chart, featured in his prior post, Bitcoin and Forex Trading which I enjoyed much more than the entire MintChip mess, from start to muted finish.

04 August 2012

Craft work

This ornate butterfly is an anti-maccassar. It is one of many in a set of Lepidoptera-themed craft work. Clicking on the image will take you to the rest. It is not my work. I can knit. Poorly.
Crochet decoration
Crocheted butterfly
Although the image description says "crochet", I think this resembles embroidery or needlework, as it is so finely detailed. It is beautiful, especially those curled antennae.

Anti-maccassars are those little covers on the arms rests and backs of chairs. They aren't doilies. I tried to find a less arcane sounding word, to no avail. Alternative word suggestions are welcomed as comments!

03 August 2012

Short Storage Story

Napkin Story: How sMash works

IBM sMash data storage method
Via The IBM Curiosity Shop, CC/by-nc/2.0/

This is so sweet! I don't know what sMash (SMASH?) is. As a first guess, I would infer it to be a storage protocol rather than database-type software. Perhaps it is a new way to organize data, an alternative to DB2, CICS and IMS. Each use different block sizes and partition types, among other things.

I will try to get a definitive answer, and return1 with an update.

I hope to see new items in The IBM Curiosity Shop set, Napkin Stories on Flickr. It doesn't seem likely though. This friendly drawing was uploaded on 22 December 2010, yet no others have joined it in the interim.

Flickr-to-Blogger did not offer the option of "publish to draft", else I would have prepared this post more thoroughly.

08 July 2012

Statistical analysis of science fiction authors and fans

The classic science-fiction related excerpt that follows after the jump is neither up to-date nor analytically robust. I tidied it a bit, but to do a decent job would require re-running the data... not to mention collecting data with a more recent vintage. But it is entertaining, and the concept may be of use to others. To whom? Well, I have spent a fair amount of time on Stack Exchange sites recently.

Let me tell you all about it.

What is Stack Exchange?

Question and answer websites are popular in the online world. Stack Exchange is a free, mostly user-run question and answer site. It was co-founded and managed by Jeff Atwood a.k.a. @Coding Horror and Joel Spolsky, about whom I clearly know less, but who is no less worthy. EDIT: Joel now runs Stack Exchange, as The Coding Horror has departed.

The prototype version of the site was known as Stack Overflow, and continues to thrive. There are many stacks on Stack Exchange...  many mansions... well, you get the idea. Most are computing or analytically-themed e.g. programming, systems administration, website design, mobile applications development, mathematics and quantitative finance. Others are more eclectic, and thus of a more experimental nature. They are labelled as such, by a beta designation, and guided along by the cleverly designed and whimsically named Area51 Stack Exchange site.

Trajectory correction

Now that you've been enlightened by that tangential aside, I'll get to the point. I was thinking of Literature Stack Exchange in particular.

The problem at hand

Literature Stack Exchange was initially overrun by book-recommendation inquiries. This was unfortunate. Why? Because suggestions about subjective matters are nearly impossible to provide to friends and relatives, let alone an online forum of knowledge seekers. Fortunately, the issue has resolved itself for the time being, through better site administration.


The issue has resolved itself permanently, because the site was closed due to a general lack of interest in early May of this year. Didn't I mention something earlier, though, about how my father's house has many mansions? Yes. Well, the analogy can be extended. Stack Exchange has a thriving Science Fiction community, which enjoys a great deal of activity! So let us continue, along the same, still relevant theme.

Perhaps the following approach might provide inspiration for those seeking reading material recommendations?

Via io9 (2010):
Politics has not only reared its ugly head, but pushed much of its slimy body into the world of science fiction.

A Political-Scientific Mapping

SciFi tribute circa 1977
Correlation results

Classic science fiction writers and reader politics


11 June 2012

eDiscovery and demise of News of the World

A new use case for text analysis is emerging in the legal field. It is referred to as eDiscovery. Such methods are not widely accepted, let alone implemented as yet, but they are receiving increasing amounts of attention.

What is eDiscovery?

eDiscovery is a platform, combining algorithm, software and productivity tools. It is most obviously useful for expediting in-house legal document retrieval. I learned of the existence of eDiscovery quite recently. Inside Counselgives this definition as part of an 8 June 2012 post on the limitations of eDiscovery:
eDiscovery offers search methodologies to rein in time spent on electronic document review. One strategy is “computer assisted review,” also known as "predictive coding" or “predictive analytics.” Predictive analytics is the nonspecific term for a computer program that uses algorithms to sample and predict relevancy across large collections of electronically stored information.
Both terms, "predictive analytics" and "predictive coding", were  confusing to me. The terms are similar to ones used in quantitative analysis. They may almost be considered as applications of the same methodologies, but in a legal context. There is a greater emphasis on text though. There are other details which I haven't read enough about,  thus cannot hazard a better guess as yet.

Further refinement needed

According to a 2010 Duke University survey of major companies (via the same Inside Counsel article), emphasis all mine:
The expense of electronic discovery is the most rapidly increasing item in the average litigation budget... This growth in e-discovery expenses is even more alarming [because] there is no evidence that it has resulted in a corresponding increase in the volume of relevant or important material being produced in litigation.

Fun: An Analysis of 'Hackgate'

I read an unusually eclectic article, which pulled together many diverse and interesting threads of interest. It is about the recent near-demise of 'News of the World', (1 June 2012). 'News of the World' is the much publicized, and scandal-ridden Rupert Murdoch flagship publication. Here's the premise:
What if the analysis were to have been approached with an eDiscovery-enabled perspective?

23 May 2012

Especially useful curation

A list of uncommonly useful links and news items by an uncommonly astute person, Greg Linden (formerly of Amazon search in the early days) follows below. This is the best of all worlds: Having access to someone who has superior insights due to field of expertise, is reasonably articulate, and is willing to share without ulterior motive or bias.

I first heard of Greg Linden back in my days of using Google Buzz. At first, I thought he was a Linden of Second Life's Linden Lab! This isn't to say that he is my online friend or contact or anything like that. I miss Google Buzz. It was my introduction to Web 2.0 type online interaction, and was very positive, genuine.

Okay, that's enough pre-ambling from me. Have a look at those links and annotations.

Geeking with Greg: More quick links:
What has caught my attention recently: $1B for Instagram was silly and caused by fear ( [1] [2] [3] [4] ), but it is impressive ...

This would be worth paying for, if Greg Linden were to want to sell a  subscription newsletter for technology investing. That does not seem likely.

I stopped wondering "Why does he do this?!" awhile ago. Now I am quietly appreciative. I often forget entirely about visiting his weblog, for months at a time, as it is such a low-key and pleasantly ad-free corner of the internet!