Data Engineer - £350PD - Remote
Data Engineer - 350PD - RemoteRequired Technical SkillsData Pipeline and ETLDesign, build, and maintain robust ETL/ELT pipelines for structured and unstructured dataHands-on experience with AWS Glue and AWS Step FunctionsImplementation of data validation, data quality frameworks, and reconciliation checksStrong error handling, monitoring, and retry strategies in production pipelinesExperience with incremental data processing patterns (CDC, watermarking, upserts)AWS Data ServicesAmazon S3: data lake architectures, partitioning strategies, lifecycle policiesDynamoDB: data modeling, secondary indexes, streams, and performance optimizationAmazon Redshift: foundational querying, integrations, and performance considerationsAWS Lambda for scalable data processing and orchestrationAmazon EventBridge for event-driven and decoupled data pipelinesVector Databases and EmbeddingsStrong understanding of vector database concepts, indexing strategies, and performance trade-offsDesign and implementation of embedding generation pipelinesOptimization techniques for semantic search and retrieval accuracyEffective chunking strategies for document ingestion and processingExperience with CockroachDB deployment and management is beneficialDocument ProcessingExperience with PDF parsing libraries such as PyPDF2, pdfplumber, and AWS TextractIntegration of OCR solutions (AWS Textract, Tesseract) for scanned documentsExtraction of document structure (headings, tables, sections)Metadata extraction, normal
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!