Skip to main content

Extracting a single field from a very long json file

Trivial task, but still can save some time to somebody, so I am gladly sharing.
A friend of a mine has an huge json file, and she had to extract all unique value for a field called "title".  The file was too big to be processed from a notepad or an excel.

With those comands, I was able to obtain a clean, unique and sorted list list of all the content.

grep -o -E '"title":"[^"]+",' tmp.json | sort |uniq > output.txt

sed -i 's/"title":"//g' output.txt

sed -i 's/",//g' output.txt


Unknown said…
You can do the same with just one command:
sed -n "s/^.*\"title\":\"\([^\"]*\)\",.*/\1/p" tmp.json | uniq | sort > output.txt
Francesco De Collibus said…
yes, true as well

Popular posts from this blog

Multiple controllers with Spring Boot

Remember, when you want to have multiple controllers with Spring Boot, you should always name them differently in the annotation, otherwise they will not work So these two together will NOT work (or just one of them will work) These two instead WILL work.

Can't use the newest npm when node is installed with brew (MacOs)

Even though I installed and reinstalled node with brew, with last versione 11.8.0, apparently it kept using the last version fdecollibus$ npm install --global gatsby-cli npm WARN npm npm does not support Node.js v11.8.0 npm WARN npm You should probably upgrade to a newer version of node as we npm WARN npm can't make any promises that npm will work with this version. npm WARN npm Supported releases of Node.js are the latest release of 4, 6, 7, 8, 9. npm WARN npm You can find the latest version at /usr/local/Cellar/node/11.8.0/bin/gatsby -> /usr/local/Cellar/node/11.8.0/lib/node_modules/gatsby-cli/lib/index.js + gatsby-cli@2.4.8 npm -v gave me back version 5.6.0. I've therefore noticed that  /usr/local/lib/node_modules had wrong permissions assigned to root:wheel. A chown did not fix the problem: I've had to manually delete the content of the folder (cd / usr/local/lib/node_modules and -careful - type rm -rf ...