README.md 6.63 KB
Newer Older
Gustavo Valiente's avatar
Gustavo Valiente committed
1
# pocket-tensor
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
2

Gustavo Valiente's avatar
Gustavo Valiente committed
3
pocket-tensor is an [arquolo's](https://github.com/arquolo) [Kerasify](https://github.com/moof2k/kerasify) fork designed for running trained Keras models from a C++ application on embedded devices.
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
4 5 6 7

## Design goals

* Compatibility with sequential networks generated by Keras 2.x using Tensorflow backend.
GValiente's avatar
GValiente committed
8
* Multithread CPU support.
9
* Low RAM usage.
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
10
* Easy to build and run (no external dependencies).
11
* Fast build times.
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
12 13 14 15

## Improvements over Kerasify

* Thanks to the awesome [libsimdpp library](https://github.com/p12tic/libsimdpp), tensor operations have been rewritten using SIMD instructions to improve prediction performance.
Gustavo Valiente's avatar
Gustavo Valiente committed
16
* Predictions run across multiple CPU cores.
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
17 18
* Memory (re)usage has been improved in order to reduce memory allocations.
* Apart from `float`, `double` precision tensors are supported (see `pt_tweakme.h` file).
GValiente's avatar
GValiente committed
19
* Tensor dimensions are rigorously validated on each layer to avoid wrong models usage.
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
20 21 22 23
* Besides GCC and Clang, Visual Studio compiler is properly supported.

## Hardware requirements

Gustavo Valiente's avatar
Gustavo Valiente committed
24
Since there's no GPU support, by default pocket-tensor requires the following CPU SIMD instruction sets:
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
25 26 27 28 29 30 31 32 33 34

* ARM: NEON with floating point support.
* x86: AVX.

Required SIMD instruction sets are specified in the `pt_tweakme.h` file, so they can be modified with ease.

## Software requirements

Since a copy of libsimdpp comes bundled with this library, there's no external dependencies required, so the only software requirements are a C++11-compatible compiler and CMake >= 3.4.  

Gustavo Valiente's avatar
Gustavo Valiente committed
35
pocket-tensor has been tested with these compilers: 
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
36 37 38 39

* GCC 4.9.
* MSVC 2017.
* Whatever Clang comes with Apple LLVM 9.1.0.
Gustavo Valiente's avatar
Gustavo Valiente committed
40
* Whatever Clang comes with Android Studio 3.1.3 (see Android section).
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
41 42 43 44 45 46 47 48 49 50 51

## How to build

A CMakeLists.txt is provided with this library, so in order to use it you only need to include this file in your CMake project.  

To build and run the unit tests, you need to generate them first:

```
python make_tests.py
mkdir tests_build
cd tests_build
Gustavo Valiente's avatar
Gustavo Valiente committed
52
cmake -DPT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
make
./tests/pocket-tensor-tests
```

## Usage

1) Use Keras to build (`model.compile(...)`) and train (`model.fit(...)`) your model as usual.

2) Now convert it to the Kerasify file format with `kerasify.export_model(model, 'example.model')`.

3) Finally load it in C++ (`pt::create("example.model")`) and use `model->predict(...)` to perform a prediction with your data.

The following example shows the full workflow:

```python
# make_model.py:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from kerasify import export_model

test_x = np.random.rand(10, 10).astype('f')
test_y = np.random.rand(10).astype('f')

model = Sequential()
model.add(Dense(1, input_dim=10))

model.compile(loss='mean_squared_error', optimizer='adamax')
model.fit(test_x, test_y, epochs=1)

print model.predict(np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]))

export_model(model, 'example.model')
```

```cpp
// main.cpp:

#include <iostream>
#include "pt_model.h"
#include "pt_tensor.h"

int main()
{
    // Initialize model:
    auto model = pt::create("example.model");
Gustavo Valiente's avatar
Gustavo Valiente committed
100
    // REQUIRE(model);
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
101 102 103 104 105 106 107 108

    // Create input tensor:
    pt::Tensor in(10);
    in.setData({0, 1, 2, 3, 4, 5, 6, 7, 8, 9});

    // Run prediction:
    pt::Tensor out;
    bool success = model->predict(std::move(in), out);
Gustavo Valiente's avatar
Gustavo Valiente committed
109
    // REQUIRE(success);
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
110 111 112 113 114 115 116 117 118 119 120 121 122
	
    // Print output:
    std::cout << out << std::endl;
    return 0;
}
```

## Supported layer types

The most common layer types used in image recognition and sequences prediction are supported, making many popular model architectures possible:

* Convolutions: `Conv1D`, `Conv2D`, `LocallyConnected1D`.
* Sequences related: `LSTM`, `Embedding`.
123
* Activations: `Linear`, `ReLU`, `ELU`, `SeLU`, `LeakyReLU`, `Softplus`, `Softsign`, `Tanh`, `Sigmoid`, `HardSigmoid`, `Softmax`.
Gustavo Valiente's avatar
Readme  
Gustavo Valiente committed
124
* Other: `Dense`, `Flatten`, `MaxPooling2D`, `BatchNormalization`, `ELU`.
125 126 127

## Performance

Gustavo Valiente's avatar
Gustavo Valiente committed
128
A benchmark application is included with this library. To build and run it:
129

Gustavo Valiente's avatar
Gustavo Valiente committed
130 131 132 133 134 135 136 137 138 139
```
mkdir benchmark_build
cd benchmark_build
cmake -DPT_BUILD_BENCHMARK=ON -DCMAKE_BUILD_TYPE=Release ..
make
./benchmark/pocket-tensor-benchmark
```

The prediction time of the following models has been measured on a PC with a Intel Core i7-6500U CPU @ 2.50GHz and on a Raspberry Pi 3:

GValiente's avatar
GValiente committed
140
### MNIST CNN
141 142 143

```python
model = Sequential()
GValiente's avatar
GValiente committed
144
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
145 146 147 148 149 150
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
Gustavo Valiente's avatar
Gustavo Valiente committed
151
model.add(Dense(10, activation='sigmoid'))
152 153
```

Gustavo Valiente's avatar
Gustavo Valiente committed
154 155 156 157 158 159
| Library            | PC elapsed time (μs) | RPi3 elapsed time (μs) |
| ------------------ | -------------------: | ---------------------: |
| Keras              |                 1470 |                  23363 |
| arquolo's Kerasify |                 3502 |                  64238 |
| frugally-deep      |                 1402 |                  29298 |
| pocket-tensor      |                 1049 |                  27329 |
160

GValiente's avatar
GValiente committed
161
### IMDB LSTM
162 163 164

```python
model = Sequential()
Gustavo Valiente's avatar
Gustavo Valiente committed
165
model.add(Embedding(20000, 128))
166 167 168 169 170
model.add(LSTM(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(128, return_sequences=False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
```

Gustavo Valiente's avatar
Gustavo Valiente committed
171 172 173 174 175 176 177
| Library            | PC elapsed time (μs) | RPi3 elapsed time (μs) |
| ------------------ | -------------------: | ---------------------: |
| Keras              |                10160 |                  89344 |
| arquolo's Kerasify |                 5378 |                  79060 |
| frugally-deep      |        Not supported |          Not supported |
| pocket-tensor      |                 3314 |                  67115 |

Gustavo Valiente's avatar
Gustavo Valiente committed
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
## Android

pocket-tensor supports Android apps (armeabi-v7a ABI only).

To add pocket-tensor to an Android project with C++ support, you must:

1) Enable ARM NEON instructions on the build.gradle project file (https://developer.android.com/ndk/guides/cmake):

```
android {
    ...
    defaultConfig {
        ...
        externalNativeBuild {
            cmake {
                arguments "-DANDROID_ARM_NEON=TRUE"
            }
        }
    }
}
```

2) Disable all ABIs except armeabi-v7a on the build.gradle project file (https://developer.android.com/studio/build/configure-apk-splits):

```
android {
    ...
    splits {
        abi {
            enable true
            reset()
            include "armeabi-v7a"
        }
    }
}
```

3) Include pocket-tensor on the CMakeLists.txt file of your native library:

```
add_subdirectory(/path/to/pocket-tensor pocket-tensor)
target_link_libraries(native-lib pocket-tensor)
```