Finally back with a data post after a long time! I moved to San Francisco to join the Visual.ly team – took me almost a month to find my way around this city.
About the data set: This data is described by the website as follows –
The “Physician Payments Sunshine Provision” of the Health Care and Education Reconciliation Act of 2010 requires that medical device, pharmaceutical and biotech companies begin reporting gifts or payments starting January 1, 2012 with detailed specifications as to how payment data is to be submitted made available October 1st 2011.
Wow – so, essentially all the payments made by pharmaceutical companies and manufacturing device manufacturing companies to physicians. As gifts or otherwise.
The name of the physician or organization to which the payment was made was not disclosed by the Medical Data Informatics to me. It’s available. Also more details of the data set and tools used at the end!
Let’s formulate a set of questions which I’m interested to find out. The analysis today is centered around exploration.
Note: The more interesting the data set is – the more interesting exploration is. If a data set is generic it can be made more interesting by coupling it with another data set.
1. Which pharmaceutical and medical device manufacturing companies make out the most payments?
2. Which companies spend the most per “gift” or payment?
3. As a physician which state will give you a higher probability of receiving a payment from these companies?
1. Which pharmaceutical companies make out the most payments?
To answer this question I want to reduce the size of the raw data set. The way I will do this is break it down to large categories (types of payments, types of companies, state) and then convert it into smaller data sets.
Here is the answer to the first question:
As you see this is a sorted distribution of the 22 companies. The answers to this one –
Nearly 2 billion dollars in payments were made out to physicians by these 22 companies in those 4 years. That’s incredible. The leading companies are.
1. Zimmer – $750 million
2. Pfizer – $250 million
3. Eli Lilly – $160 million
Let’s not stop here but try and understand how the payments by these three payments are reported.
Looking into Zimmer most of the larger amounts are just not disclosed. Only the smaller amounts – less than a thousand dollars have the type classified- lodging, meals , gifts. Not much insight there.
In Pfizer – the larger payments are in a category called Pfizer sponsored research. Looks like (of course, contingent on them telling the truth) that Pfizer is spending a lot of it’s resources on research – hopefully a good thing.
In Eli Lily – most of the larger amounts are in a category called Healthcare Education. Again, hopefully a good thing.
A quick note on the display here- none of the charts today are interactive. The reason being I didn’t have access to a data visualization software which supports interactivity of really large data sets – to be published publicly. If you want to go down the interactive route here the best way is to use one of the JS frameworks.
2. Which companies spend the most per “gift” or payment?
I could have alternatively worded this as “As a physician which company will you be making the most out per payment you get”!
As you can see here are the top 3:
1. Zimmer – Almost 300k on an average per payment!
2. Medtronic – Over 100k per payment on an average.
3. Stryker- Nearly 100k per payment on an average.
More interesting stuff- These answers are interesting but we can drill down into even more interesting answers here. The first is from the caveat: these are average payments and by itself don’t mean much. Hypothetically, there can be lots of $1 payments or some really high ones. And from what I’ve seen practically – that’s usually the case. It’s always a lot more more interesting to drill into average numbers.
Exploring each category a bit more I find the highest payment ever made in this data set was by Zimmer – a sum of 18 million over six months in 2007! Wow, that’s insane. The type of this payment wasn’t even disclosed. I need to ask the folks at Medical data informatics who the organization/person was.
Going along the thread to illustrate how misleading averages can be: only 52 out of 2620 records are over 300k! This actually implies that if you were to bet on an unknown payment by Zimmer, you have less than 2% chance it will be above the average. Averages are wrong 98% of the time here!
The most interesting derivative for me from this question- Zimmer made 19 payments in 2007 – above 11 million dollars where the type was not disclosed!
3. As a physician which state should you belong to have a higher probability of receiving a payment from a company?
Most payments are made out to California, Texas and NY!
Additional exploration- I looked into payments declared as “gifts” and the largest was by Zimmer for just $258! I’m pretty cynical about those numbers reported by those companies! Even most companies travel expenses seemed fairly accurate- no nuggets of gold there.
Data set and tools used:
The data set is fairly huge – contains 550,000 rows of information of each payment made from Jan 1 2007 -Jan 1 2011. The data fields breakdown is as follows – The company name who made the payment, the year the payment was made, the period between which the payment was made, the city and state in which the physician was located and the estimated amount.
Tools used- The tools I used for this analysis were Tableau – the desktop version and Excel – pivot tables to summarize some of the initial data. Most tools and spreadsheet software will not handle either over 5Mb of data or over 100k rows.The best alternate way is to run this is R and work on the data from there. Or even reducing the scope and size in SQL and then feed it into another analysis tool.
Side Note: A huge thanks to Mark Hubbard and Julia from Medical Data Informatics for giving me access, answering my questions and patiently waiting till I settle down in this city.