⭐ Hightlights

  • Non transaccional database. It is an analitycal database.

  • Distributed database (support multiple machines with a lot of amount of data).

  • Column oriented database.

Use cases

  • Real-time analytics.
  • Logs, events, and traces.

ANSI SQL vs. ClickHouse SQL

  • Syntax is nearly identical (ClickHouse provides addicional data types and many special functions).
  • Optimice queries for ClickHouse (only SELECT the columns you need and take advantage of the columnar storage).

Row vs Column Oriented Databases

🔗 Row vs Column Oriented Databases

Understanding columnar storage

Row oriented database

Column oriented database

Read from parquet file

ClickHouse can read from parquet files.

SELECT count()
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/pypi/2023/pypi_0_7_34.snappy.parquet');

🔗 ClickHouse documentation

🔗 Parquet files documentation

MergeTree

Insert operations create table parts which are merged by a background process with other table parts. This system allows ClickHouse to efficiently handle large amounts insertions of data.

MergeTree

Query for inspecting active parts of a table:

select * from system.parts
where active=1
and table = 'uk_price_paid';

🔗 MergeTree Documentation

alt text

SharedMergeTree engine is only available on the cloud. To find it on the docs you should go to 'Cloud' section. SharedMergeTree docs.

Data Types

System table system.data_type_families provides an overview of all available data types.

🔗 Data Types documentation

LowCardinality(T)

Changes the data type of a column to a dictionary of values. It works with String, FixedString, Date, DateTime, and numbers excepting Decimal.

The efficiency of using LowCardinality data type depends on data diversity. If a dictionary contains less than 10,000 distinct values

Primary Keys

Primary Keys explanation

Primary Keys candidates

Choose the columns that are most likely to be used in queries for a primary key. The first column in the primary key is the most important, so the order of the columns is important. Put the column with the lowest cardinality first (don't worry about low cardinality and date. Date is a great thing to put in a primary key).

Choose message as a primary key it's a bad idea. Arbitrary strings are terrible inside a primary key and you probably never filter by a long string:

Primary key bad idea

Additional primary keys

There's no concept of a key, like a foreign key. It's not a relational database.

You're not going to do a bunch of joins. You have to flatten the data when you add it to ClickHouse.

You can do joins in ClickHouse, but that's not the typical thing to do every day.

🔗 Course. ClickHouse Developer On-demand: Module 9. Joining data

Interting data

Insert data

Move from self-managed to ClickHouse Cloud

Use remoteSecure function.

alt text


📺 Video


🔗 Links