Joe Kington bio photo

Joe Kington

I'm a geologist or a geophysist, take your pick. I try to do interesting things (mostly in python) in my spare time. There's a hound dog and a banjo on my porch.

Email Google+ LinkedIn Github Stackoverflow Last.fm Resume

I recently came across one of those “10 things about X” blog posts. However, this one happened to be about data science. It’s a list of 11 points pulled from reddit comments. (Queue kitten/bacon/narwhal comments…) It’s actually well worth a read, despite me poking fun at it.

What’s interesting, though, is that most of those statements apply equally to any technical field. Those that don’t (for example, deep learning and big data) can be substituted with the appropriate term du jour in the field.

For example, I’d argue they apply very well seismic interpretation in a business setting. The connotations change a bit, but the overall sentiment is identical.

Here’s my attempt at translating “11 facts about data science” into petroleum geology:

11 Facts About Data Science Seismic Interpretation


1. Data is never clean

Yep! And it’s never documented, either. More on this shortly…

2. You’ll spend a lot of time cleaning it

Indeed! Import. Export. Filter. Georeference. Digitize. Silly little awk scripts to make things talk to each other. Half the battle is just getting all of the data in one place. Additionally, for interpretation, you’ll probably spend most of your time cleaning your intermediate data products (fault and horizon surfaces).

3. There is no fully automated interpretation. You need to get your hands dirty.

This, this, and this again. It’s not about picking reflectors. Yes, you can automate that. That’s not the point. You’re trying to build a physically reasonable 4D mental model – not just how they are today, but how they got that way. To do that, you need to look at a lot of data. Often the best way to start is dive in and explore your data, figuring out the questions you need to ask as you go.

On the other hand, if you’re spending most of your time clicking a mouse and following reflectors, you’re doing it wrong. Don’t waste your time where things are unambiguous. Move on from where things are well imaged and spend your time on geologically complex areas. Autotracking is your best friend. Use it extensively, but use it to test geological hypotheses.

The job of autotracking is to pick reflectors. Your job as an interpreter is to interpret. What happened? Where? Why? What does that mean for this prospect? What does that predict “around the corner”? The great thing about modern datasets (note: not just seismic!) is that you can ask these questions and answer many of them relatively quickly. The key is to dive in, observe, ask questions of the data, and iterate.

4. 95% of tasks do not require fancy reprocessing.

(Everyone loves made-up statistics!) Use what’s well imaged to constrain what’s not. No matter how good your imaging is, your interpretation needs to make geologic sense. Focus on understanding the geology, not making a perfect image of it. That having been said, there are plenty of times were you really do need to pull out the big hammer and run multiple reverse time migrations with multiple different velocity models. Just make sure you can’t answer your question with simple structural or stratigraphic constraints projected from the areas where things are imaged before you pour millions into reprocessing.

5. 3D seismic is just a tool

(substitute your favorite data type or method for “3D seismic”)

Just because you don’t have 3D doesn’t mean you can’t do anything. Your regional 2D surveys give you a lot more information than a postage-stamp of a 3D survey. Use them. Use potential fields. Use onshore geology. Use every scrap of data you can find. Geology is as broad as it gets. You need every tool you can handle in your toolbox and you need to use multiple tools on everything.

6. You should embrace the Bayesian approach

In geology, your main concern should be a-priori information. You always interpret with a bias, and you always should (an “unbiased interpretation” will violate basic laws of physics). All “observations” are made with a bias. The key is to be aware of bias and test it. Use a-priori information (a.k.a. “bias”) and weight it accordingly. Test multiple working hypotheses. These are the core of the scientific method for “forensic” sciences such as geology. It’s also the core of the “Bayesian approach” in both a broader and statistical sense.

7. No one cares how you did it

Very true. Most technical presentations in industry are 1) far too long and 2) focus on the portions that don’t matter (the details of the method applied) to hide the parts that do (what does that mean for this prospect). However, I think a more constructive way to phrase this is “know your audience”. Focus on answering the question your audience cares about. You’ve spent all of your time building up a detailed 4D model of how everything formed. Your boss doesn’t care. At all. They “just want a number” (be it volumes, risk, porosity, net-to-gross, whatever). They’d far rather you lie to them and not give them a range of numbers, but they’ll tolerate the latter. A technical review board will probably care a great deal about how you got to those numbers, but there are other things that they won’t care about at all. On the other hand, if you’re documenting processing steps for a dataset, or something similar, you’ll need to focus on method details at the expense of business impact.

You’ll always hide 95% of what you did and make everything look far too simple. The key is to show the right 5%, depending on your audience.

8. Academia and business are two different worlds.

This is a frequent topic of discussion. There’s a different focus in industry, and “It’s an interesting problem” is not something you want to say. It’s vitally important to get to what matters and focus on information that will actually change the decision that’s being made. When you’re told “that’s too academic”, it usually means you didn’t explain your point well enough or used the wrong terminology. Always 1) ask about and 2) keep in mind the decision that your customer is trying to make. Frame everything in terms of that decision.

However, there’s a deeper difference in culture. In industry, asking questions or discussing a problem is strongly discouraged. People want answers. One answer, in fact. Furthermore, they want answers that are unrealistically optimistic and that never include uncertainty. Giving an order-of-magnitude estimate in academia can be very useful. Oftentimes, it’s the best that’s physically possible. Give the same estimate in industry, however, and you’ll be eviscerated when the “true” answer turns out to be off by a factor of two. Uncertainty and iteration are perceived as failure, while false (or naive) confidence is seen as leadership. Overall, I think it’s business that needs to learn from academia, rather than the other way around. Discussion of complex issues is not something to be despised and shouted down.

9. Presentation is key

No one can disagree with this. However, there’s more to it. Presentation may be key, but a clear message is every more important.

Most of us (or well, me, anyway) tend to focus too much on individual figures/slides instead of focusing on how things fit together. “Flow” is every bit as important as clarity of any given figure/slide. However, we hardly ever write an outline for a presentation before we start making slides and figures. Try it sometime. Everyone recommends outlines for writing, but far too few recommend it for presentations.

Clarity of message is king. Outlines can really help focus your presentation. To re-iterate an earlier point, hide 95% of what you did and really focus on the 5% that matters most. What points did you want to make? Do those come across above all else? Streamline your presentation until you’re confident that people will walk away knowing the most important points.

10. All models are wrong, but some are useful

I think this is every geoscience professor’s favorite quote. It’s used incessantly in geology and is often mis-attributed to various giants of the field. The saying is actually the title of a section from one of George E. P. Box’s (a famous statistician) papers Box, 1979. The text of that section is well worth a read, in addition to its better known title:

Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an “ideal” gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

For such a model there is no need to ask the question “Is the model true?”. If “truth” is to be the “whole truth” the answer must be “No”. The only question of interest is “Is the model illuminating and useful?”.

Over-simplified “back-of-the-envelope” calculations are often “outrageously effective”. (To steal another term du jour…) In the geosciences, we often get caught up in all of the different physical processes that can influence a question we’re trying to solve. We tend to mentally go through a list of different geological factors: “This would affect this, and that would… Oh! That would have an influence too!”. Using a “wrong” model and over-simplifying the calculation can often narrow down the possibilities when trying to solve a complex problem. Wrong models are still useful.

As an addendum to this, in geophysics we typically spend far too little time thinking about what happens if we change the model. The numerical part is easy. It’s easy to get caught up in. It’s very important to step back and think about the different physical mechanisms at work, rather than getting caught up in the best way to solve a particular PDE. Make sure you’re solving the right PDE (or have the right boundary conditions) before spending too much time worrying about details. Multiple models are often “illuminating and useful”. It’s crucial to step back and consider multiple alternatives.

11. Just because an interpretation is great doesn’t mean it will see the light of day

In industry, you get recognition for solving problems that have a direct influence on a business decision. (Ideally, anyway… Stay with me, here…) To get there, you usually wind up solving a lot of other problems on the way. Sometimes you wind up with a beautiful solution to the wrong problem. I’d go as far as to say, “most of the time”. A lot of really great interpretations serve only to advance your knowledge of the area. No one else will ever see it. Resist the urge to include work you did unless there’s absolutely no way to leave it out. Again, you’ll always hide 95% of what you do. Sadly, that includes 95% of your best work as well as 95% of your worst. In the longer run, though, those “elegant solutions to the wrong problem” are usually what makes you mentally connect the dots somewhere else for something that does get seen. If you put them in your back pocket and don’t dwell on whether or not the work was “wasted”, they’ll help you out somewhere down the road.

References

Box, G. E. P. (1979), Robustness in the strategy of scientific model building, in Launer, R. L.; Wilkinson, G. N., Robustness in Statistics, Academic Press, pp. 201–236. http://www.dtic.mil/get-tr-doc/pdf?AD=ADA070213