Search between a Gas Station

Posts

Showing posts with the label aws-glue

Glue crawler exclude patterns

Glue crawler exclude patterns I have an s3 bucket that I'm trying to crawl and catalog. The format is something like this, where the SQL files are DDL queries ( CREATE TABLE statements) that match the schema of the different data files, i.e. data1 , data2 , etc.) CREATE TABLE data1 data2 s3://my-bucket/somedata/20180101/data1/stuff.txt.gz s3://my-bucket/somedata/20180101/data2/stuff.txt.gz s3://my-bucket/somedata/20180101/data1.sql s3://my-bucket/somedata/20180101/data2.sql s3://my-bucket/somedata/20180102/data1/stuff.txt.gz s3://my-bucket/somedata/20180102/data2/stuff.txt.gz ... I just want to catalog data1 , so I am trying to use the exclude patterns in the Glue Crawler - see below - i.e. *.sql and data2/* . data1 *.sql data2/* Unfortunately the crawler is still classifying everything within the root path of s3://my-bucket/somedata/ . I can live with having data2 cataloged; I'm most concerned/annoyed by the sql files. s3://my-bucket/somedata/ data2 sql Anyone have experi...