FastAPI vs. Express.js vs. Flask vs. Nest.js Benchmark

01.01.2022

I wanted to verify FastAPI’s claims of having performance on par with Node.js. So I decided to conduct a benchmark test. For this, I used wrk, an HTTP benchmarking tool.

I also wanted to test it with a call to an endpoint that makes a call to a PostgreSQL database in order to simulate a more “realistic” scenario. In each case, I created an endpoint that returns one hundred rows of data from the database.

I tested the following combinations:

  • FastAPI + psycopg2
  • FastAPI + SQLModel
  • Flask + psycopg2 + flask run
  • Flask + psycopg2 + gunicorn
  • Express.js + pg
  • Nest.js + Prisma
  • Flask + psycopg2 + gunicorn (4 workers)
  • FastAPI + psycopg2 + gunicorn (4 workers)
  • FastAPI + SQLModel + gunicorn (4 workers)

The code for this test can be found here: https://github.com/travisluong/python-vs-nodejs-benchmark.

Here are the results:

FastAPI + psycopg2 + uvicorn

$ uvicorn fast_psycopg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    32.35ms    3.63ms  50.66ms   90.42%
    Req/Sec   154.76     13.60   202.00     78.00%
  3110 requests in 10.10s, 23.79MB read
Requests/sec:    308.01
Transfer/sec:      2.36MB

FastAPI + SQLModel + uvicorn

$ uvicorn fast_sqlmodel:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    54.50ms    9.41ms  99.36ms   78.56%
    Req/Sec    91.89     16.03   130.00     75.38%
  1842 requests in 10.10s, 11.28MB read
Requests/sec:    182.45
Transfer/sec:      1.12MB

Flask + psycopg2 + flask run

$ FLASK_APP=flask_psycopg flask run
$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.06ms    4.48ms  42.07ms   73.73%
    Req/Sec   354.51     19.96   404.00     70.50%
  7070 requests in 10.01s, 37.77MB read
  Non-2xx or 3xx responses: 2501
Requests/sec:    705.95
Transfer/sec:      3.77MB

Flask + psycopg2 + gunicorn (1 worker)

$ gunicorn -w 1 --bind 0.0.0.0:5000 flask_psycopg:app
$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    18.50ms  743.07us  22.31ms   84.33%
    Req/Sec   269.51      7.33   287.00     55.00%
  5386 requests in 10.04s, 46.47MB read
Requests/sec:    536.65
Transfer/sec:      4.63MB

Express.js + pg

$ node express_pg.js
$ wrk http://localhost:3000
Running 10s test @ http://localhost:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.19ms    1.04ms  20.60ms   89.87%
    Req/Sec     0.97k    89.72     1.08k    66.34%
  19521 requests in 10.10s, 151.39MB read
Requests/sec:   1931.99
Transfer/sec:     14.98MB

Nest.js + Prisma

$ npm start
$ wrk http://localhost:3000
Running 10s test @ http://localhost:3000/feed
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.49ms    2.37ms  37.77ms   84.94%
    Req/Sec   594.97     91.32   720.00     65.00%
  11860 requests in 10.02s, 91.98MB read
Requests/sec:   1184.11
Transfer/sec:      9.18MB

Flask + psycopg2 + gunicorn (4 workers)

$ gunicorn -w 4 --bind 0.0.0.0:5000 flask_psycopg:app
$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.66ms    1.71ms  26.50ms   90.09%
    Req/Sec   743.20     60.59     0.89k    65.00%
  14814 requests in 10.02s, 127.81MB read
Requests/sec:   1478.02
Transfer/sec:     12.75MB

FastAPI + psycopg2 + gunicorn (4 workers)

$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_psycopg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.86ms    7.75ms  39.84ms   78.87%
    Req/Sec   499.02    291.50     0.91k    55.50%
  9962 requests in 10.07s, 76.19MB read
Requests/sec:    989.50
Transfer/sec:      7.57MB

FastAPI + SQLModel + gunicorn (4 workers)

$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_sqlmodel:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.72ms   10.82ms  57.08ms   58.85%
    Req/Sec   286.39    112.97   515.00     56.00%
  5723 requests in 10.05s, 35.04MB read
Requests/sec:    569.46
Transfer/sec:      3.49MB

Conclusion

It looks like the minimalist Express.js + pg combo wins this benchmarking round, followed by Flask with 4 gunicorn workers and Nest.js + Prisma.

Flask with the “flask run” server had a large number of non-2xx or 3xx responses, as expected of a development server.

FastAPI + psycopg2 + uvicorn, on the other hand, seemed to lag behind Express.js + pg by over 6x.

FastAPI + SQLModel + gunicorn compared to Nest.js + Prisma is about a 2x difference.

Interestingly, Flask + psycopg2 + gunicorn beats out FastAPI + psycopg2 + gunicorn by almost 2x.

Is it safe to say that FastAPI is not on par with the Node.js frameworks in terms of performance? Or have I conducted the tests incorrectly? Perhaps I’m not leveraging FastAPI’s async functionality in the right way?

At the end of the day, it probably doesn’t matter too much which framework you choose. Just use whatever language you’re most productive in since developer time is usually more expensive than computing power.

Update 1/3/22

Thanks to Dmitry for pointing out that I should use the asyncpg library instead of psycopg2.

I also included a benchmark for the encode/databases library, which is advertised on the FastAPI website under the Async SQL section.

It appears that FastAPI is still behind Node.js in performance despite adding the async database drivers. The JSON serialization is a possible bottleneck. If anyone knows how to optimize this, please let me know!

Here are the updated results with asyncpg and databases.

FastAPI + asyncpg + uvicorn

$ uvicorn fast_asyncpg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    31.74ms    3.49ms  79.23ms   97.83%
    Req/Sec   158.01     13.26   181.00     85.00%
  3172 requests in 10.09s, 24.26MB read
Requests/sec:    314.52
Transfer/sec:      2.41MB

FastAPI + asyncpg + gunicorn (4 workers)

$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_asyncpg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.48ms    4.41ms  27.52ms   59.84%
    Req/Sec   478.83     84.78   666.00     60.50%
  9552 requests in 10.02s, 73.06MB read
Requests/sec:    952.99
Transfer/sec:      7.29MB

FastAPI + databases + uvicorn

$ uvicorn fast_databases:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    37.31ms    4.39ms  95.69ms   94.49%
    Req/Sec   134.30     12.86   151.00     77.50%
  2697 requests in 10.08s, 20.63MB read
Requests/sec:    267.69
Transfer/sec:      2.05MB

Update 1/4/22

I have confirmed that the JSON serialization was the bottleneck. The default serializer is much slower than ujson and orjson.

Here are the new results!

FastAPI + psycopg2 + uvicorn + orjson

$ uvicorn fast_psycopg:app
$ wrk http://localhost:8000/orjson
Running 10s test @ http://localhost:8000/orjson
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.06ms    1.77ms  30.71ms   87.39%
    Req/Sec   415.52     23.51   464.00     72.50%
  8317 requests in 10.05s, 63.61MB read
Requests/sec:    827.30
Transfer/sec:      6.33MB

FastAPI + asyncpg + uvicorn + orjson

$ uvicorn fast_asyncpg:app
$ wrk http://localhost:8000/orjson
Running 10s test @ http://localhost:8000/orjson
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.34ms    1.15ms  19.18ms   94.82%
    Req/Sec     0.95k    44.71     1.01k    78.50%
  18870 requests in 10.01s, 144.33MB read
Requests/sec:   1885.25
Transfer/sec:     14.42MB

FastAPI + asyncpg + uvicorn + ujson

$ uvicorn fast_asyncpg:app
$ wrk http://localhost:8000/ujson
Running 10s test @ http://localhost:8000/ujson
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.85ms    0.97ms  18.79ms   88.35%
    Req/Sec     0.86k    33.84     0.92k    82.00%
  17134 requests in 10.01s, 131.05MB read
Requests/sec:   1711.69
Transfer/sec:     13.09MB

FastAPI + asyncpg + gunicorn (4 workers) + orjson

$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_asyncpg:app
$ wrk http://localhost:8000/orjson
Running 10s test @ http://localhost:8000/orjson
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.64ms    3.04ms  47.94ms   98.38%
    Req/Sec     2.11k   733.24     3.31k    54.50%
  41995 requests in 10.01s, 321.20MB read
Requests/sec:   4193.72
Transfer/sec:     32.08MB

FastAPI + asyncpg + gunicorn (4 workers) + ujson

$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_asyncpg:app
$ wrk http://localhost:8000/ujson
Running 10s test @ http://localhost:8000/ujson
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.27ms  794.94us  21.49ms   69.94%
    Req/Sec     2.21k   316.77     2.79k    63.00%
  44028 requests in 10.00s, 336.75MB read
Requests/sec:   4401.04
Transfer/sec:     33.66MB

Final Conclusion

The winner is FastAPI + asyncpg + 4 gunicorn workers + ujson.

FastAPI is definitely fast, on par with Node.js, and lives up to the hype! Well, according to these benchmarks.

Just make sure you’re using the right libraries with it!

I realized there is another flaw in the benchmark. Node.js has a cluster mode, which I was unaware of. For new benchmarks and a complete ranking, check out part 2 of this benchmarking article:

https://medium.com/@travisluong/fastapi-vs-fastify-vs-spring-boot-vs-gin-benchmark-b672a5c39d6c

If you’re interested in learning more about FastAPI and other amazing tools, check out my Full Stack Tutorial:

https://medium.com/@travisluong/full-stack-next-js-fastapi-postgresql-tutorial-86f0af0747b7

Thanks for reading.