Skip to content
Advertisement

Speed of inserting to BigQuery – should this be batched in background?

I have simple URL shortener (PHP) where I’m tracking clicks. Every click is saved to database (mysql currently) as separate row because I need also info about device, country etc.

It works great but when table grows and grows queries becomes slow, so I want to migrate to BigQuery (I know they are different in many ways but I think it’s something I need?).

I know that selects can take seconds in BigQuery. BUT – I cannot find information how fast Inserts are? Can I use it just like with traditional database so enter -> save click to DB -> redirect url -> exit process or this insert is also slow and I need to develop (or use some existing solution/service for that) something to delay inserts or insert multiple at time with batch/streaming feature in BigQuery (I have hard time to understand purpose of it). Maybe enter -> put visit to Redis cache -> redirect -> exit and then save this to DB as separate job?

Advertisement

Answer

You can check the quota policy for streaming inserts. By default, BigQuery stream insertion allows 1GB per second per project, if insertID is not populated. With insertId, it will provide a default limit of 500,000 rows per second per project in the US and EU multi-regions. The BigQuery doc suggests that if you have an app that collects a large amount of data in real-time, streaming inserts can be a good choice.

Speed of BigQuery inserts depends on many dynamic factors and there is no benchmark information on speed. You may run a test on your own to find out the speed for your environment. In general, BigQuery streaming insert is meant for small real-time data update and it is light and fast. Batch or Load, on the other hand, accepts file uploads and it is meant for larger and heavier updates. You will need to save all new rows in a file and then upload the file to BigQuery.

With respect to your second question there is a tradeoff between speed and volume. You can go for either option or a hybrid, depending on your requirements. It is best to run a test to find out which option works best for your environment. If you are doing this inside app flow , then updates are real-time, but you may not be able to do too many updates due to quota limits. If you do this in the background, the updates are less real-time, but you can take advantage of batch upload and insert a lot more rows into BigQuery.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement