What ‘The Office’ Gets Wrong About the Office

I start a new job next week and so I’ve been working on documenting all of my old tasks and projects in preparation for the transition. As I was going through old e-mails, I came across the introductory note my manager sent out to the department on my first day back in June 2004. Comparing it to the departure e-mail from my current manager, it’s amazing to see the changes in personnel over a seven-year span.

I prepared this chart using the distribution list from both e-mails, a drawing program, and a site that creates proportional venn diagrams. Only eight people are listed twice — including me and a person who left the company and has since returned. Some of the people who are only listed once have more tenure then me — they just may have gone to/come from another department. Still, it represents an interesting fact about the modern office. Change is constant.

Thick as a [LEGO] Brick

A few weeks ago, Samuel Arbesman wrote an article in Wired touching on the mathematical properties inherent in LEGO structures. In it, he discussed the results of a 10-year old study of natural and human-made networks that described how the number of distinct components in a network increased with the overall size of the network.

The study showed that the LEGO systems did indeed follow this rule. However, Arbesman noted that the relationship increased sublinearly, suggesting that LEGO systems were under some form of selection pressure (like the economics of production) that made it more expensive to grow the system and create new types of pieces. He was curious to see whether or not these findings would hold true with a more complete list of LEGO sets available today (n=389 in the 2002 study).

After using a webcrawler to pull the data for the available sets and their component pieces, I was presented with a list of over 6,800 individual toys or kits. Not all of these kits fit the criteria of the original study, which investigated sets that were designed to build somthing specific as opposed to generic collections of pieces.

Paring down this list turned out to be the most difficult part of this excercise. I ended up eliminating any set had words like “accessories,” “supplemental,”  or “universal building set” in the name. I also removed entire toy lines such as DUPLO, Clikits, and Primo/Baby which didn’t seem to fit in the standard LEGO system. Basically, I tried to include anything with a brick, plate, or tile that had a picture of a single object on the box. I ended up with about 3,750 sets … or about ten times the number in the original study.

So, do the results hold up with the new data? At first glance, it appears they do. Both the log-log and semi-log plots described in the study are reproduced here with the larger counts. Note that a power-law relationship still appears to fit the data better than a logarithmic relationship.



Once I had access to all of that cool LEGO data, of course, I couldn’t resist a few more visuals. The first thing I developed was an interactive chart that lets you navigate the size and complexity data to see specific kits. Check out the links for pictures and parts lists.

This display was interesting because the LEGO kits with the most pieces tended to be elaborate secret bases or fortresses while the LEGO kits with the most variety of pieces were cultural artifacts like the Taj Mahal or the Statue of Liberty. Ironically, the Death Star (which might be considered both a cultural icon and a fortress) fits neatly in the upper right corner.

The following charts look at the trend of unique pieces over time as well as the distribution of color over the distinct LEGO sets available (this includes all LEGO products, not just the specific “objects” used in the logarithmic plots above). Note both the increasing variety of the LEGO pieces and the move away from the traditional color palette. The mottled gray represents the “other” category.

It is interesting to note that the shift toward more compexity in both pieces and colors corresponds with the deal LEGO inked with Lucasfilm in 1999 that allowed the company to sell toys based on the “Star Wars” universe. These changes came at a time of turmoil for LEGO as it struggled to remain true to its roots while competing with a flood of specialty toys and video games. Licensing products from Lucasfilm was a big step for LEGO but one that seems to have paid some creative dividends … four of the top ten largest LEGO structures ever released commerically are spaceships from the “Star Wars” series.

This trend toward replicating such specific visions (LEGO has also licensed themes from Harry PotterToy Story, Pirates of the Caribbean, and others) explains some of the incredible variety of pieces now in circulation. Items from many of these new kits introduced many pieces used only once.

On the opposite end of the spectrum, the most commonly shared LEGO piece in the database is a black 1 x 2 plate (part number 3004). The other pieces in the top 10 are also very simple and very monochromatic. I found it interesting that all the colors in the top ten reflected the sequence of Berlin and Kay’s basic color terms (in which Stage I cultures have only the colors black (dark–cool) and white (light–warm) and Stage II adds Red).

One thing this database does not cover is the huge market for non-standard kits and free-form LEGO bricks. According to Chris Anderson’s Long Tail blog:

“… 90% of Lego’s products are not available in traditional retail. They’re only available in the catalogs and online … [o]verall, those non-retail parts of the business represent 10-15% of Lego’s annual $1.1 billion in sales. “

User-created structures represent an amazingly creative use of the standard set of parts available.  Check out this footbal stadium or this minifig-scaled Saturn V rocket. Some of these models were created using the old LEGO Factory/Design by Me software but some are done on the fly. It would be interesting to see if some of the above findings apply to these custom structures.

For more stats and a company timeline, check out this site.

Wisconsin Voters Banished to NULL Island

The top headline in my local paper this morning was “Glitch puts some Wisconsin voters in Africa” … an interesting thing to ponder over a bowl of Quaker Oatmeal Squares. I suppose this problem merits at least some attention given the heated political climate surrounding the state’s voter redistricting process. But headline news? Above the fold? Sounds like a slow news day to me.

Online, of course, the debate has already devolved into the standard round of mudslinging and name-calling so good luck trying to find out what’s going on from that crowd. The reporters themselves focused on the political fallout of the issue rather than an explanation so no help there either. I guess it’s up to the humble folks at Ideas Illustrated to offer up some insight!

The first clue to the problem can be found in the article’s pullout quote, which describes the voter’s location as the “coast of Africa” and not a specific country in Africa. The second clue can be found deep within the article when it is mentioned that clerks have recently made changes to the way voters are being entered into the voter registration database:

” … voters are [now] being entered into different districts by the physical location of their address in computerized maps. Previously, they were entered into different districts in the state voter database according to where their address fell in certain address ranges.”

These two hints point to a very common problem associated with geocoding, which is the process of converting a postal address to a set of map coordinates. Let’s backtrack. An online mapping tool like Google Maps uses specific geographic coordinates (latitude and longitude) to place a location on a map. However, because none of these physical locations are actually stored in a database anywhere, the tool needs to interpolate the coordinates from a vector database of the road network (i.e. a mathematically represented set of lines).

For example, if you look up the address for Trump Tower, you find that it is located at 725 Fifth Avenue in Midtown Manhattan. When you enter this address into Google Maps, the tool finds 5th Avenue on the underlying road grid and then uses an algorithm to determine that the “725″ address is somewhere between 56th and 57th streets. It will also determine which side of the street the address is located based on stored knowledge of the “odd” and “even” numbering pattern. In other words, it’s guessing.

Google Map detail of the area around Trump Tower

 

TIGER/Line® Shapefile detail of the same area

These guesstimates work pretty well in dense urban environments where there are a lot of cross streets to serve as reference points. In rural areas, the curvilinear streets and widely-spaced buildings make things a little more difficult. When the situation gets really muddled, some mapping tools essentially “punt” and enter a default set of coordinates. In the case of the Wisconsin voter addresses, these default coordinates are 0.00 degrees latitude and 0.00 degrees longitude. Where is this exactly? It is the intersection of the Prime Meridian and the Equator … which occurs just off the coast of Africa.

Geographers have actually given this place a rather fanciful name called NULL island (it is not, in fact, a real island). It even has its own web site and unofficial flag (below right).

So there are no nefarious schemes behind this situation … just normal, everyday data problems. The state clerks need to tell their IT guys to flag the errant voter addresses and then they can assign them to the appropriate districts by hand. Problem solved. However, they should be aware that interpolation is an imperfect process and, in addition to assigning blocks of voters to NULL island, the geocoding process may also assign voters to the wrong districts. This could be particularly true for people who live close to a district boundary. It might actually make sense to keep the old method around for backup.

Canine Cop in Constitutional Crisis

[There have been some interesting topics flitting about the blogosphere in the past weeks but I've been too busy with other stuff to comment. To eliminate some of the backlog, I've decided to try and do a few quick takes. First up: Franky the chocolate lab.]

There is a Florida legal case winding its way through the court system that pits the skills of a drug sniffing police dog named Franky against the Fourth Amendment rights of alleged marijuana grower Joelis Jardines. Back in 2006, Franky’s keen nose detected the smell of $700,000 in marijuana plants wafting out of a Dade County home while he and his handlers were standing on the front porch. Franky signaled the police who subsequently obtained a warrant, searched the home, found the pot, and arrested Jardines. The question is does a dog sniffing the air outside a privately-owned house represent an illegal search?

The variables make it interesting. Use of a thermal imaging device to look into the interior of someone’s home constitutes a search and is not legal without a warrant. However, the use of dogs in airports and other public places is allowed under the law because people in those locations do not necessarily have an expectation of privacy. (This is similar to recent arguments favoring the warrantless use of a GPS tracking device on a private vehicle.) Complicating matters is the fact that a dog is trained to detect only one thing (drugs) while a mechanical device like thermal imaging might show other things (like you sitting naked on the toilet).

All of this becomes more interesting if you start to think about the trends in miniaturization and computing power that could be applied to today’s surveillance drones. How soon before these things migrate from the skies of Afghanistan to the air space above your own neighborhood?

Other questions that spring to mind:

  • Does it matter that Franky’s talent is a natural one? Would the case be any different if this involved a drug sniffing machine?
  • What would happen if the police themselves were augmented in some way (genetically? cybernetically?) that would allow them to detect drugs without the aid of anything else? Will robocops get more legal leeway when it comes to searches just because of their “innate” talents?
  • How fast will police be able to get search warrants in the future? Will judges allow instantaneous decisions on these matters?
  • Does the fact that people are willing to provide detailed personal information to their social networks change our society’s expectations of privacy?

The U.S. Supreme Court will hear the case on Friday, January 13, 2012. Given the pace of development of modern technology, constitutional scholars could be reading about the exploits of Franky the dog for years to come.

 

Favre-a-Palooza

Now that we can safely say that Brett Favre has retired (notwithstanding rumors to the contrary), I thought it was time to pull out some data on the indecisive quarterback’s career touchdown passes. Stats on passes say a lot about the relationship between a quarterback and his receivers so I wanted to create a visual that captured some of these stories.

The chart below shows each touchdown pass that Brett Favre threw during his NFL career and displays it up by receiver (vertical axis), season (horizontal axis), average yardage per month (size of marker), and team (color of marker).

Packer Fans will immediately recognize the significance of some of the data points. For the rest of you, here are a few highlights:

  • Sterling Sharpe caught Brett Favre’s first touchdown pass as a Green Bay Packer in 1992 and continued to be the quarterback’s primary receiver for the next three years. The 5x All-Pro led the NFL in touchdown receptions in both 1992 and 1994 and would certainly have played a major role in the team’s subsequent success if he hadn’t suffered a career-ending neck injury at the end of the 1994 season.
  • Following Sharpe’s early exit from football, Favre was forced to distribute his passes among a broader range of players, chief among them wide receivers Robert Brooks and Antonio Freeman. These two players would serve as the primary pillars of the passing game throughout Favre’s most successful period with Green Bay.
  • During the 1996 season (the year the Packers won Super Bowl XXXI), Favre threw touchdowns to ten different receivers, a career high. His total touchdown pass yardage that year also reached a high water mark.
  • Following Favre’s two Super Bowl appearances, there was a noticeable dropoff in the number of new players catching touchdowns. It is not clear whether it was because the receiving core had stabilized or the coaches were focused on developing other aspects of the team but there were no fresh faces in the 1998 season and only two (Corey Bradford and Donald Driver) in 1999.
  • Favre did not have another pair of favorite “big play” receivers until his last two seasons with the Packers, when he had both Driver and Greg Jennings.
  • After Favre’s retirement from the Packers, he was introduced to an entirely new slate of receivers with the New York Jets in 2008. This situation was repeated in 2009 when he signed up with the Minnesota Vikings. He threw his final touchdown pass to Percy Harvin in December 2010.

Earnings and Unemployment by College Major

The Wall Street Journal recently published a table of income and unemployment data  that presented pay and employment rates for various college majors. The original study by Georgetown University’s Center on Education and the Workforce contained enough additional details that I thought it might be worth trying to incorporate the information into a Tableau visualization.

After a little data massaging, I created charts for both the high-level fields of study and the more detailed individual majors. Each level contains unemployment rates, income levels, and popularity of major measured by number of enrollees.

One of the first things you notice is that, despite frequent claims to the contrary, college graduates with a degree in Education have the lowest median earnings overall. The Education field also has the narrowest range of income and includes four of the ten majors with the lowest median earnings. On the plus side, fifteen of the sixteen Education majors have (or had at the time of the study) unemployment rates below 5.5% — the weighted average rate of unemployment for all majors in the study.

Graduates with an Engineering degree have the highest median earnings overall and a relatively low unemployment rate compared to other disciplines. In addition, seven of the ten majors with the highest median earnings were found in Engineering.

Other majors with good earnings potential included the usual suspects (Computers & Mathematics, Health, and Business) while the best employment prospects were found in Education, Health, Physical Sciences, and Agriculture & Natural Resources.

As for individual majors, the winners in my completely fictitious categories are as follows:

  • Most Popular Business Management & Administration takes this category with nearly 2.8 million grads holding this degree. The next two majors in line (also in the Business field) weren’t even close — trailing by over a million people.
  • Best Prospects -  Actuarial Science beat out four other fully-employed competitors by coming in with a median income of over $80K.
  • Worst ProspectsClinical Psychology tops this category with an estimated unemployment rate of nearly 20%. Yikes! I also noticed that a number of other majors in the Psychology field had unemployment rates above 10%, which means that intra-discipline career changes for people with this major would be difficult.
  • Most Deceptive - The “winner” here is Architecture, an outlier with the lowest median earnings and the highest unemployment rate of all of the Engineering majors. For this category, I wanted a relatively popular major with an uncommonly high unemployment rate … the kind of major that churns out grads and then strands them in the unemployment line. An educational Judas, if you will. (Full disclosure: I have an Architecture degree, but I can’t say I wasn’t warned.)
  • Hidden Gem – I’m going to call this one a tie between Petroleum Engineering and Pharmacy Pharmaceutical Sciences & Administration. Petroleum Engineering has a slight edge on median earnings ($127K vs. $105K) but the Pharma major has a lower overall unemployment rate (3.2% vs. 4.4%). You probably can’t go wrong with either one but keep on eye on the horizon … Petroleum Engineering is notoriously dependent on the boom/bust cycles of the oil and gas industry while workers in the pharmaceutical industry are facing major changes as companies try to adjust to globalization and increasing costs of product development. 

Have the Mainstream Media Jumped the Shark?

There was a recent article in Slate that asked why the mainstream media was having such a tough time figuring out the message of the Occupy Wall Street protestors. Now, I wouldn’t call myself a full-throated supporter of #OWS, but I do think that it’s pretty easy to understand why they’re PO’d. You know the drill: 14 million unemployedcrony capitalism, income inequality, and rising costs for just about everything, including health care and education.

So what’s the great mystery? Things are bad … and they haven’t been good for awhile. People are concerned about the future and their upset because the country’s leaders are so busy fighting each other that they aren’t even trying to find a comprehensive solution. I can only assume that it is the very complexity of the issues that are causing so much angst among the pundits and political commentators.

The mainstream media thrives on simple solutions. It has no idea whatsoever of how to report on a story that isn’t about easy fixes so much as it is about anguished human frustration and fear. The media prides itself on its ability to tell you how to clear your clutter, regrout your shower, or purge your closet of anything that makes you look fat—in 24 minutes or less. It is bound to be flummoxed by a protest that offers up no happy endings.

People on right side of the political spectrum have never been happy with the liberal bias they perceive in the mainstream media. If the political left is also starting to tune out these news outlets because of their inability to explore and explain serious issues, how long is it before these sources are abandoned in favor of something more thoughtful and informative?

Politicians Discover Data Science

During the 2008 U.S. Presidential campaign, the online design community devoted a lot of pixels to comparisons of the two candidate’s web sites (a few great examples here, here, and here). The overall consensus was that Obama won the war for eyeballs by emphasizing design, web usability, multimedia, and robust social networking. According to an in-depth study by the Pew Research Center’s Project for Excellence in Journalism, Obama’s online network was over five times larger than McCain’s by election day and his site was drawing almost three times as many unique visitors each week.

There is no doubt that the web has fundamentally transformed the way political campaigns are run. Voters are no longer tied to traditional media outlets for information and they can participate directly in a campaign in ways that were unimaginable only a few years ago. Adam Nagourney, columnist for the New York Times, summed it up nicely:

[The Internet has] rewritten the rules on how to reach voters, raise money, organize supporters, manage the news media, track and mold public opinion, and wage — and withstand — political attacks.

So, with the next campaign season gearing up, what technology-driven changes can we expect for 2012? If the rumblings are true, this election may see the ascendancy of data science as a formal part of the campaign toolkit.

In a recent CNN article, Micah Sifry wrote about the Obama campaign’s establishment of a “multi-disciplinary team of statisticians, predictive modelers, data mining experts, mathematicians, software developers, general analysts and organizers.” The article goes on to discuss the importance of data harmonization (a fancy term for master data management), geo-targeting, and integrated marketing.

Obama may be struggling in the polls and even losing support among his core boosters, but when it comes to the modern mechanics of identifying, connecting with and mobilizing voters, as well as the challenge of integrating voter information with the complex internal workings of a national campaign, his team is way ahead of the Republican pack.

All this has some GOP supporters concerned. Martin Avila, a Republican technology consultant, states in the same article that he doesn’t think that anyone on the opposing side fully understands the power of organizing and analyzing all of this data. According to Avila, the current GOP use of information technology is still largely shaped by its pre-Internet experience in broadcast advertising.

In some ways, this cavalier attitude toward the value of data shouldn’t come as a complete surprise. One trait that many members of the so-called “party of business” share with executives in the private sector is a strong attachment to a “gut based” approach to making decisions.

A recent Accenture Analytics survey of over 600 managers at more than 500 companies found that senior managers rarely used data-driven analysis when making key business decisions and instead relied heavily on intuition, peer-to-peer consultation, and other soft factors. According to the study, 50% of companies weren’t even structured in a way that would allow them to use data and analytical talent to generate enterprise-wide insight. In addition, those organizations that did make analytics-based decisions often depended on inconsistent, inaccurate, or incomplete data.

Savvy voters, like savvy customers, have come to expect a certain level of performance and consistency from the IT systems they use. This is bad news for businesses that still think that things like social media, data analytics, and master data management are gimmicks:

Organizations that fail to tackle the issues around data, technology and analytics talent will lose out to the high-performing 10 percent who have leveraged predictive analytics to become more agile and gain competitive advantage.

Creating a structured program for better targeting and more efficient communications seems like a no-brainer these days, but, for now, there doesn’t seem to be a lot of competition.

[UPDATE 1/30/2012: Slate recently published an article that talks about the different philosophies guiding the development of Democratic and Republican voter databases.

Catalist, an independent data initiative, is focused less on profit and more on becoming "an indispensable tactical resource for the American left" with a privately-funded data warehouse containing records of the entire voting-age population combined with other commercially available data. It's customers include many traditionally liberal groups who consider the Democratic National Committee's database insufficient. In response, the DNC has stepped up development of its own database, the Voting List Management Cooperative (or "Co-op"). In order to take advantage of the increased desire for voter information, the DNC has also developed statistical models that are particularly valuable for candidates.

Meanwhile, the Republican National Committee established the Data Trust, a private company filled to the brim with former RNC staffers and committee members. The goal of this organization is to create robust voter profiles that can be shared with political allies. However, because of concerns about outside influence, the RNC is modeling it more along the lines of the DNC's data co-operative instead of the more independent Catalist. The Data Trust development model is also less focused on data mining activities and more on basic data.]

Unemployment vs. Underemployment

The Bureau of Labor Statistics releases the results of two major surveys on the first Friday of every month (the Current Employment Statistics or CES and the Current Population Statistics or CPS). Although the amount of information in these two surveys is quite extensive, the general public is probably familiar with only a few specific metrics.

First and foremost among these is the unemployment rate, which represents the ratio of unemployed workers to the overall civilian labor force. As with anything involving the government, this simple number is more complex than it than it seems. For one thing, the BLS has no less than six different methods of calculating unemployment … and each one comes in a seasonally adjusted and unadjusted format. The standard unemployment rate — the one that makes all the headlines — is called U-3 and it is usually seasonally adjusted.

Many economists feel that U-3 is misleading because, over they years, it has slowly excluded many of the factors that used to go into how the U.S. reported unemployment. They prefer to use the “underemployment” rate or U-6, which is the BLS’s broadest measure of unemployment.

The basic definitions:

  • U-3 – Total unemployed persons, as a percent of the civilian labor force (the official unemployment rate).
  • U-6 – Includes those people counted by U-3, plus marginally attached workers (not looking, but want and are available for a job and have looked for work sometime in the recent past), as well as persons employed part time for economic reasons (they want and are available for full-time work but have had to settle for a part-time schedule).

Keeping all of these terms straight can be difficult for the average person, so — despite Stephen Few’s objections — I have created a pie chart that attempts to explain all of the various relationships. The central pie shows the  basic division of the working age population into the civilian labor force and people who are outside of the labor force. Each subsequent pie divides these categories into smaller and more specific subcategories. 

The calculations for U-3 and U-6 can then be represented as slices of the pie:


Right off the bat you can see that there is a problem with some of the various categories. For one thing, there is an entire group of people who are listed as Want a Job Now but aren’t working and aren’t counted as unemployed. This category includes people who have been out of work for over a year and have officially fallen out of the civilian labor force. Although the U-6 figure includes a portion of this group, many critics still feel that this practice understates unemployment.

Another way to show the calculation of the two metrics is graphically, using the color coding of the legend from the chart to show the details for each metric:

This excercise highlights another potential issue for measurement of the economy by showing the importance of the denominator (in this case, the Civilian Labor Force). Variations in this number have a tremendous effect on the outcome of both calculations. By reclassifying certain groups of unemployed (the Want a Job Now crowd), people are siphoned off from both the numerator and the denominator. The end result is a slight reduction of both the U-3 and U-6 rates. Not a big deal … unless you happen to be running for office.

King Bhumibol is a Noob

In an effort to suppress disparaging remarks about the monarchy, the government of Thailand has recently established an official agency called the Office of Prevention and Suppression of Information Technology Crimes. The sole purpose of this department is to enforce the country’s lèse-majesté laws by combing the Internet for anything offensive to King Bhumibol Adulyadej and his family and then either eliminating or blocking the offensive material.

Agency technicians have apparently blocked over 70,000 pages so far, including those with pictures of the king with a foot above his head (considered very rude) and those that misuse informal pronouns before the king’s name.

Punishment for such disrespect for authority can be harsh. Under Thai law, even the digital distribution of information that threatens the “good morals of the people”  will get you five years in prison. For anyone who insults or defames the royal family, sentences can stretch to 15 years.

I am always surprised at the lengths to which repressive regimes will go in order to “safeguard” the sensibilities of its citizens while trying to maintain the openness and flexibility of the Internet. I’m even more surprised at this particular effort to shield a grown man from the forms of mild online abuse and disagreements that confront other world leaders every day.

This kind of experience can certainly be frustrating. However, is white-washing the Internet really the answer? Is it even possible? Wouldn’t everyone’s time be better spent teaching the king to deal with a few negative comments rather than censoring the entire Web? I understand the desire for people to protect someone they love from getting hurt but, in the long run, such heavy-handed tactics will probably fail. The Internet is just too irrepressible.

In an old discussion of online ethics, Simon Waldman notes:

“I find the views expressed on many organizations’ sites repellent. But one of the greatest achievements of the Internet has been to create the greatest gallery of human opinion in history, and that is something we should marvel at, rather than shake our heads in dismay.”

Would the people of Thailand deny their king access to such a place?

BTW: Sawatdee-krap, Mr. Surachai