Here’s my summary of some talks at the recent PyCon JP in Tokyo. I was chairing the English track for both days, so the majority of talks are from there, but I caught a handful of Japanese presentations as well.

Using Machine Learning to Try and Predict Taxi Availability by @supercoderhari

  • Video
  • Slides
  • Jupyter notebooks
  • Gentle intro to machine learning in general
  • Singapore provides location data for all of its taxis - cool!
  • Goal: given past data, predict availability in the future
  • Jupyter can do interactive plots: I didn’t know this!
  • Try linear regression, polynomial regression, random forests, etc.

Geospatial Data Analysis and Visualization in Python by @wrongbat

How (and Why) We Speak in Unicode by @dproi

  • Video
  • An interesting dive into the history of Unicode
  • Separate concepts: character sets and encoding
  • Best practices: doing Unicode correctly in Py2 and Py3
  • Unicode sandwich: bytes on the outside, Unicode on the inside

This was a good talk, but it failed to mention Han Unification, which remains a controversial topic in Japan to this day. I’m not sure how people will solve this problem - perhaps they will just get over it? I regret not being able to attend the talk, because it would have been a great question to ask.

Pythonにおけるデバッグ手法 by @TakesxiSximada

  • Nice beginner-level introduction to debugging
  • Introduces print-debugging: print statements, Pdb, .set_trace(), PyCharm
  • Debugging approaches for various scenarios:
    • unit tests
    • Django
    • Celery
    • Jupyter: %debug magic

Personally, I prefer TDD: test-driven debugging. Basically, you find bugs by writing tests around your function. Unlike regular debugging, the big benefit is the effort you put in helps prevent future regressions.

For example, check out this broken FizzBuzz code:

def fizzbuzz(limit):
    for i in range(1, limit + 1):
        if i % 3 == 0:
            print('Fizz')
        elif i % 5 == 0:
            print('Buzz')
        elif i % 15 == 0:
            print('FizzBuzz')
        else:
            print(i)

Instead of peppering the code with debug statements, why not write tests for it? Start by decoupling the fizzbuzz from print to make it easier to test:

def fizzbuzz(limit):
    for i in range(1, limit + 1):
        if i % 3 == 0:
            yield 'Fizz'
        elif i % 5 == 0:
            yield 'Buzz'
        elif i % 15 == 0:
            yield 'FizzBuzz'
        else:
            yield '%d' % i

the test is then simple:

def test_fizzbuzz():
    expected = ('1 2 Fizz 3 4 Buzz 6 7 8 Fizz'
                '10 11 Fizz 13 14 FizzBuzz').split(' ')
    actual = list(fizzbuzz(15))
    assert expected == actual

If you run this through py.test, you get very helpful output:

bash-3.2$ py.test fizzbuzz.py -q
F
======== FAILURES ========
_____ test_fizzbuzz ______

    def test_fizzbuzz():
        expected = ('1 2 Fizz 4 Buzz Fizz 7 8 Fizz '
                    'Buzz 11 Fizz 13 14 FizzBuzz').split(' ')
        actual = list(fizzbuzz(15))
>       assert expected == actual
E       AssertionError: assert ['1', '2', 'F..., 'Fizz', ...] == ['1', '2', 'Fi..., 'Fizz', ...]
E         At index 14 diff: 'FizzBuzz' != 'Fizz'
E         Full diff:
E         ['1',
E         '2',
E         'Fizz',
E         '4',
E         'Buzz',...
E
E         ...Full output truncated (13 lines hidden), use '-vv' to show

fizzbuzz.py:17: AssertionError
1 failed in 0.05 seconds

Pythonで大量データ処理!PySparkを用いたデータ分析のきほん by @chie8842